Value#

Value function in RL#

Value function in RL refers to how much reward an agent will get until it arrives at the end state. For example, suppose you get one point each time you celebrate your birthday. And you have a 50 % chance of living to 100 years old, and 50 % of living to 110 years old. Then the value function would be \( 0.5 \times (100 - x) + 0.5 \times (110 - x) \), with \( x \) being your current age.

Warning

Incoming math!

Formally, value function can be determined by weighted sum of all rewards, and calculated by a recursive function:

\[G_s = R_a + \gamma \sum_a \pi_a G_{s + a}\]

where \( s \) is the current state, \( a \) an action that this state can take, \( \pi_a \) the probability of taking action \( a \), \( s + a \) the next state after taking action \( a \), and finally, \( G_s \) is the value function at state \( s \). \( 0 \le \gamma \le 1 \) is the decay factor because distant rewards are less valuable (more uncertainty).

Since the value function is defined as the rewards an agent receives along its life, value function at the end state is \( 0 \).

\[G_{end} = 0\]

Where \( end \) is the end state.

Relation between value function and rewards.#

Rewards are observed through actions with the environment, and value functions are defined as the expected summation of rewards until termination state is encountered.

Why value functions are important?#

Value functions provide an easy way to rank a state. A state with a higher value is expected to receive more rewards in the future, so it’s better. Intuitively, arriving at a state with a higher value means that there would be more rewards in the future, and that’s the foundation of RL: RL aims to maximize the rewards an agent would get, in other words, tries to modify an agent’s behavior so that it can be in a state where values are higher.