Carrying out error analysis - Class review
Last updated on：a year ago
With practical strategies to carry out error analysis, your can efficiently iterate your models.
Comparing to human-level performance, humans are quite good at a lot of tasks, so long as ML is worse than humans, you can:
- Get labelled data from humans.
- Gain insight from manual error analysis: why did a person get this right?
- Better analysis of bias/variance.
Human-level error as a proxy for Bayes error
Surpassing human-level performance
Problems where ML significantly surpasses the human-level performance
Improving your model performance
The two fundamental assumptions of supervised learning
- You can fit the training set pretty well
- The training set performance generalizes pretty well to the dev/test set
- Get ~100 mislabelled dev set examples
- Count up how many are dogs
- Evaluate multiple ideas in parallel
eg. Ideas for cat detection
- Fix pictures of dogs being recognized as cats
- Fix great cats (lions, panthers, etc…) being misrecognized
- Improve performance on blurry images
DL algorithms are quite robust to random errors (eg. systematic errors) in the training set.
|Overall dev set error||10%|
|Errors due to incorrect labels||0.6%|
|Errors due to other causes||9.4%|
eg. Goal of dev set is to help you select between two classifiers A&B
If you don’t trust your dev set anymore to be correctly telling you whether this classifier is better than this because 0.6% of these mistakes are due to incorrect labels. Then there’s a good reason to go in and fix the incorrect labels in your dev set.
Correcting incorrect dev/test set examples
- Apply the same process to your dev and test sets to make sure they continue to come from the same distribution
- Consider examining examples your algorithm got right as well as ones it got wrong
- Train and dev/test data may now come from slightly different distributions
Mismatched training and dev/test set
Bias and variance with mismatched data distributions.
Training-dev set: same distribution as a training set, but not used for training.
For human error, the way you get this number is you ask some humans to label their rear-view mirror speech data and just measure how good humans are at this task.
For training error, you can take some rear-view mirror speech data, put it in the training set so the neural networks learn on it as well, and then measure the error on that subset of the data.
- Carry out manual error analysis to try to understand the difference between training and dev/test sets (eg. noisy - car noise)
- Make training data more similar; or collect more data similar to dev/test sets
- Artificial data synthesis
本博客所有文章除特别声明外，均采用 CC BY-SA 4.0 协议 ，转载请注明出处！