Paper

Very Deep Convolutional Networks for Large-Scale Image Recognition

Architecture

VGG Design rules:

All convolutions are 3x3 with stride 1, pad 1

All max poolings are 2x2 with stride 2

After pool, double # of channels

Network has 5 convolutional stages:

Stage 1: conv-conv-pool

Stage 2: conv-conv-pool

Stage 3: conv-conv-pool

Stage 4: conv-conv-conv-[conv]-pool

Stage 5: conv-conv-conv-[conv]-pool

(VGG-19 has 4 conv in stages 4 and 5)

Untitled

All conv are 3x3 stride with stride 1, pad 1

Conv(5x5) vs. 2 Conv(3x3)

Two 3x3 conv has same recpetive field as a single 5x5 conv, but has fewer parameters and tekes less computation!
All max pool are 2x2 stride 2 / After pool, double #channels

Conv layers at each spatial resolution take the same amount of computation!

(HxW 반으로 줄이고 C 2배로 늘린 뒤에 Conv 하는거랑 전이랑 연산이 같음)

Much bigger network, Simpler structure (stable gradient)

Untitled