Semantic segmentation with U-Net - Class review

Last updated on:8 months ago

Semantic image segmentation predicts a label for every single pixel in an image with appropriate class labels.

Semantic segmentation with U-Net

Semantic segmentation: Locating objects in an image by predicting each pixel as to which class it belongs to.

Motivation for U-Net

per-pixel class labels

Output: segmentation map
U-Net uses an equal number of convolutional blocks and transposed convolutions for down-sampling and up-sampling.

Deep learning for semantic segmentation

Output: $h \times w \times n$, where n = number of output classes

Transpose convolution

Output dimension: $s(n_h - 1) + f_h - 2p \times s(n_w - 1) + f_w - 2p$, where n is input size without channel size, f is kernel size, and then p is padding, s is stride.

Motivation: turn a small input into a bigger output. Ignore padding region, add overlap values together

U-Net Architecture

Skip connections are used to prevent border pixel information loss and overfitting in U-Net. Draw it which looks like a U.

Reference

[1] Deeplearning.ai, Convolutional Neural Networks