Paper

Deep Residual Learning for Image Recognition

Deep Residual Learning for Image Recognition

Introduction

Untitled

When deeper networks are able to start converging, a degradation problem has been exposed: with the network depth increasing, accuracy gets saturated and then degrades rapidly.

But, such degradation is not caused by overfitting, and adding more layers to a suitably deep model leads to higher training error → underfitting!

A deeper model can emulate a shallower model: copy layers from shallower model, set extra layers to identity. But, this problem suggests that a deeper model might have difficulties in approximating identity mappings by multiple nonlinear layers.

→ Residual Networks

Residual Networks

Untitled

Architecture

A residual network is a stack of many residual blocks

Regular design, like VGG: each residual block has two 3x3 conv

Network is divided into stages: the first block of each stage halves the resolution (with stride-2 conv) and doubles the number of channels

$H\times W\times C \rightarrow H/2 \times W/2 \times 2C$

Untitled

Untitled

Like GoogLeNet, no big fully-connected-layers: instead use Global Average Pooling (GAP) and a single linear layer at the end

Untitled

Untitled

Residual Networks: Bottleneck Block

Untitled