How to debug your machine learning system

Last updated on:13 days ago

Sometimes it’s difficult for a newcomer to build their first machine learning system. So, I want to write down some notes from my ML classes to give me some cues to construct the system.

General advice for debugging a learning algorithm

  • Get more training examples (not always works)
  • Try smaller sets of features
  • Try getting additional features
  • Try adding polynomial features
  • Try decreasing/increasing lambda

Debugging your spam classifier

General advice

  • Collect lots of data
  • Develop sophisticated features based on email routing information (from email header)
  • Develop sophisticated features for the message body, features about punctuation
  • Develop a sophisticated algorithm to detect misspelling

It is difficult to tell which of the options will be most helpful.

Error analysis

Recommended approach

  • Start with a simple algorithm that you can implement quickly implement and test it on your cross-validation data

  • Plot learning curves to decide if more data, more features, etc. are likely to help

  • Error analysis: Manually examine the examples (in cross validation set) that your algorithm made errors on. See if you spot any systematic trend in what type of examples it is making errors on.

Error analysis may help decide if this is likely to improve performance. The only solution is to try it and see if it works.

Error matrices for skewed classes

  • Accuracy = (true positives + true negatives) / (total examples)

  • Precision = (true positives) / (true positives + false positives)

  • Recall = (true positives) / (true positives + false negatives)

  • F1 score (F score) = $2\frac{PR}{P+R}$

Trading off precision and recall.

Data for machine learning

  • Use a learning algorithm with many parameters; neural network with many hidden layers — low bias
  • Use a relatively large training set – low variance

Reference

[1] Andrew NG, Machine learning