Saddle point#

What is a saddle point?#

A saddle point is a point where all the slopes and derivatives are all zero, but is not the minimum (or maximum) of the loss function. When the parameter is very close to the saddle point, the gradient gets extremely close to zero, and may slow down training. We call this phenomenon “stuck in the saddle point”, though visually speaking, it should be called sitting on the saddle.

When do saddle points appear?#

Saddle points appear where all the dimension has zero derivative. Usually this means that there’s a local minimum or local maximum (which is less likely, as gradients should point away from maximums). However, there are also chances that this is a saddle point.

Why not to worry about saddle points?#

Saddle points exist, no doubt. However, encountering one is very unlikely especially for very large nets. To have a chance of being stuck in a saddle point, we have to cross out fingers and hope that all (not some) parameters are stuck in their maximum. Sounds unlikely, right? Even if some parameters are stuck in their maximum, it usually does not matter when 99.999% of the parameters are in their minimum. (That maximum has to be huge!) With larger nets, it’s even less likely that we encounter saddle points that do affect our training. So don’t fear!