Photo Optical Recognition/OCR - Class Review

Last updated on:9 months ago

Photo OCR is widely used in character classification. Meanwhile, ceiling analysis is also a practical method to help you know how to launch your effort.

Problem description and pipeline


  • Text detection
  • Character segmentation
  • Character classification

Photo OCR pipeline

graph LR
A(Image) -->B(Text detection)
    B --> C(Character segementation)
    C --> D(Character recognition)

Sliding windows

Getting lots of data and artificial data synthesis

The distortion introduced should be a representation of the type of noise/distortions in the test set
Usually does not help to add purely random/meaningless noise to your data

Discussion on getting more data

  • Make sure you have a low bias classifier before expanding the effort
  • How much work would it be to get 10x as much data as we currently have

Artificial data synthesis, collect/label it yourself, crowdsource.

Ceiling analysis

What part of the pipeline to work on next?
Ceiling analysis is a method used to estimate which element of a pipeline machine learning system has a strong influence on the prediction. Likewise, it allows us to estimate which element has a weak influence, and therefore, to limit the effort to improve its performance as it yields no significant change in the final result.

Component Accuracy
Overall system 72%
Text detection 89%
Character segmentation 90%
Character recognition 100%

Class exercises

Within classes

Q1: Suppose you are training a linear regression model with m examples by minimizing:

$$J( \theta) = \frac{1}{2m} \sum^m_{i=1} ((h_\theta (x^{i}) - y{(i)}) ^2)$$

Suppose you duplicate every example by making two identical copies of it. That is, where you previously had one example ($x^{(i)}$, $y^{(i)}$) you now have two copies of it, so you now have 2m examples. Is this likely to help?


Q1: Suppose you are running a sliding window detector to find text in images. Your input images are 1000x1000 pixels. You will run your sliding windows detector at two scales, l0x10 and 20x20 (i.e., you will run your classifier on lots of 10x10 patches to decide if they contain text or not; and also on lots of 20x20 patches), and you will “step” your detector by 2 pixels each time. About how many times will you end up
running your classifier on a single 1000x1000 test set image?

With a stride of 2, you will run your classifier approximately 500 times for each dimension. Since you run the classifier twice (at two scales), you will run it 2 * 500 * 500 = 500,000 times.

Q2: Suppose you perform ceiling analysis on a pipelined machine learning system, and when we plug in the ground-truth labels for one of the components, the performance of the overall system improves very little. This probably means: (check all that apply)


[1] Andrew NG, Machine learning

[2] Roncancio, H., Hernandes, A.C. and Becker, M., 2013, January. Ceiling analysis of pedestrian recognition pipeline for an autonomous car application. In 2013 IEEE Workshop on Robot Vision (WORV) (pp. 215-220). IEEE.