Gradients#

Why are gradients useful?#

Gradients are useful because it gives us information about the surface around the parameter. Gradient always points upwards, it tells us the direction in order to maximize the result.

There are many optimization methods rely on the existence of gradients. The most prominent one is the one used in deep learning, gradient descent. That’s why we need differentiable functions, functions that we can take gradients of, in deep learning in general.

Gradient descent.#

In deep learning models are optimized with a technique called gradient descent. Basically what gradient descent tries to do is to move along the opposite direction of the gradient, which always points up, so moving against this direction reduces the loss value.

The simplest form of gradient descent is:

\[ \theta' = \theta - \eta \frac{d f}{d \theta} \]

where \( \theta' \) denotes the new value of \( \theta \), and \( \eta \) is what we call learning rate or step size. It’s usually a small, positive number.

How to determine if a function is differentiable?#

There are so many functions in the world. Most of them we don’t know the gradients of. In those cases, we simply can’t take the gradients of those functions (at least in a computer program).

However, if we can approximate the function well enough with a differentiable function, then we can suddenly take the gradients from the function. For instance, sin and cos function are approximated using Taylor’s expansion series.

However, there are functions that we just don’t know how to calculate the gradients for. In the context of deep learning, those functions are just never used. So no worries.

The rule of thumb is, if a function is differentiable, is composed of differentiable functions, or is approximated by a composed differentiable function, then it’s differentiable. Chances are you can use them in deep learning. Other functions, no luck.