End-to-End Object Detection with Transformers
Set Prediction
모델을 통해 예측한 set of box predictions 과 실제 이미지에 존재하는 object 들을 매칭해서 최적의 조합을 찾고, 손실 함수를 계산하는 과정을 거침
Step1 Hungarian matching (optimal assignment)
: Find best permutation (of box prediction)
$\hat \sigma = \argmin_{\sigma\in\mathfrak{S}N} \sum^N_i \mathfrak L{\text{match}}(y_i, \hat y_{\sigma(i)})$
$\mathfrak S _N$: All kinds of permutation, $\sigma$: permutation
$\mathfrak L_{\text{match}}(\cdot)$: a pair-wise matching cost, $y_i$: Ground Truth, $\hat y _{\sigma(i)}$: Prediction with index σ(i)
계산 예시:
Assume $\mathfrak L_{\text{match}}(y_i, \hat y_{\sigma(i)})= (y_i-\hat y_{\sigma(i)})^2$
$\mathfrak L_{\text{match}}(y_i, \hat y_{\sigma(i)})=-\mathbb I_{\{c_i \neq \emptyset\}}\hat p {\sigma(i)}(c_i)+\mathbb I{\{c_i \neq \emptyset\}} \mathfrak L_{\text{box}}(b_i, \hat b_{\sigma(i)})$
$\mathfrak L_{\text{box}}(b_i, \hat b_{\sigma(i)}) = \lambda_{\text{iou}} \mathfrak L_{\text{iou}}(b_i, \hat b_{\sigma(i)})+\lambda_{L1}||b_i-\hat b_{\sigma(i)}||_1$
$\mathbb I$: Indicator function, $c_i \neq \emptyset$ : Not No object
$\hat p _{\sigma(i)}$: confidence score, $c_i$: ground truth,
예시:
→ $\hat p(c)= 0.6$
Step2 Hungarian Loss for training a NN
$\mathfrak L_{\text{Hungarian}}(y, \hat y)=\sum_{i=1}^N[-\log \hat p_{\hat \sigma(i)}(c_i)+ \mathbb I_{\{c_i \neq \emptyset\}} \mathfrak L_{\text{box}}(b_i, \hat b_{\hat\sigma(i)})]$
Step1 → 매칭 찾기
Step2 → 매칭된 Loss 최소화
Query outputs are fed to FNNs for final prediction
Classification + BBox regression
Decoder의 outputs을 입력으로 받아 각 class와 bbox 값 예측
class ouput은 각 class에 대한 확률값, bbox output은 중심 좌표, 너비, 높이 (x,y,h,w)
그림밖에없는뎁쇼?
?