Teacher Forcing vs Scheduled Sampling vs Normal Mode#
There are 3 ways of training:
Normal mode#
This mode predicts the next token based on the sentence the model is generating. The benefit of this method is that it knows what to say even if the sentence being generated is rubbish. (Which can’t be said for models trained for teacher-forcing)
Teacher forcing#
This mode predicts the next token based on the correct input. The benefit of this method is that 1. it is trained on the correct labels (normal mode’s label is generated by itself, not necessarily accurate always), and 2. it tends to prevent gradient explosion (especially in the case of RNN).
Scheduled sampling#
If you find the above two ways too extreme, this is a compromise. It sometimes uses normal mode (the half-finished sentence the model is generating), and sometimes uses teacher-forcing (using the correct sentence to predict the next token)