Classification vs. localisation vs. semantic segmentation vs. instance segmentation

Last updated on:14 days ago

I am confused about segmentation, localisation, and tracking from time to time. Therefore, I want to take some notes and summarise the differences among them.


  • Classification: the act or process of putting people or things into a group or class.

  • Localisation: the act or process of finding out exactly where something is.

  • Semantic: connected with the meaning of words and sentences.

  • Instance: a particular example or case of something.

  • Tracking: to find somebody/something by following the marks.


The task of object classification requires binary labels indicating whether objects are present in an image.

Object detection / Localisation

Object detection is also regarded as object localisation. Detecting an object entails stating that a thing belonging to a specified class is present and localising it in the image.

Semantic segmentation

Labelling semantic objects in a scene requires that each pixel of an image be labelled as belonging to a category, such as a sky, chair, etc. In contrast to the detection task, individual instances of objects do not need to be segmented.

Instance segmentation

It is instance-level segmentation. In other words, every object will be extracted with a specific ID (sheep1, sheep2) and category (sheep, human).

The task is to segment individual object instances.


Online object detection / localisation or real-time instance segmentation.


[1] Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P. and Zitnick, C.L., 2014, September. Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740-755). Springer, Cham.

[2] Oxford Learner’s Dictionaries