Skip to content

Yolov5 Optimization Knowledge distillation (KD)

Knowledge distillation (KD)¶

What is Knowledge Distillation?¶

Goal¶

To Improve student’s accuracy with the help of a teacher model

Methods¶

We can divide KD methods into two categories by the learning targets

Distll logits (learning the output distribution of the teacher model)
- ex: Deep Mutual learning, Soft-logits
Distll features (learning the intermediate value such as feature maps of the teacher model)
- ex: FitNet, Distilling-Object-Detectors

How to choose student model¶

Student’s and teacher’s structure should be closely related

Reduce block number (Resnet block, transformer block)
Reduce hyperparameters (# of filters, size of filters, stride … etc)
Use more efficient structure (replace conv with depthwise separable conv)
Reduce model layers (後面幾層)
Low level vs high level (semantic) features

Pros and cons¶

Pros:

Easy to implement
Training time is acceptable
Accuracy is stable

Cons:

Need to choose an appropriate student

Code Tutorials¶

Distill logits on Yolov5

Distill features on Yolov5