# Knowledge distillation (KD)
## What is Knowledge Distillation?
### Goal
To Improve student’s accuracy with the help of a teacher model

### Methods
We can divide KD methods into two categories by the learning targets
- Distll logits (learning the output distribution of the teacher model)
- ex: Deep Mutual learning, Soft-logits
- Distll features (learning the intermediate value such as feature maps of the teacher model)
- ex: FitNet, Distilling-Object-Detectors
## How to choose student model
Student’s and teacher’s structure should be closely related
- Reduce block number (Resnet block, transformer block)
- Reduce hyperparameters (# of filters, size of filters, stride … etc)
- Use more efficient structure (replace conv with depthwise separable conv)
- Reduce model layers (後面幾層)
- Low level vs high level (semantic) features
## Pros and cons
Pros:
- Easy to implement
- Training time is acceptable
- Accuracy is stable
Cons:
- Need to choose an appropriate student
## Code Tutorials
[Distill logits on Yolov5](./logits.md)
[Distill features on Yolov5](./feature.md)