Reinforcement Learning#
Note
We will use RL to refer to reinforcement learning.
What is RL?#
RL is a branch of machine learning, focusing on interacting with things. RL was mainly developed by observing animal/human behavior, so it has a lot in common with how humans make decisions. In RL, an agent makes an action that changes an environment, and receives rewards in the process. So for example, RL can be used to model how a person, agent, decides to have curry for dinner, action, which causes some carbon footprint on earth, environment, and feels happy about it, reward. In other words, RL can be used to model problems that are interactive, about things changing, and how an action will impact future behavior, and making the right decisions. Oh, and eating curry isn’t that bad for the planet earth.
Reinforce?#
I agree that it’s a bad name. RL in its early days referred to updating a model, that’s initially random, and reinforce/enhance the actions that yield good rewards.
Markov Decision Process#
Note
Markov Decision Process is also called MDP.
RL is designed to optimize the rewards out of an MDP. An MDP consists of several parts we previously mentioned:
Agent#
An agent is a person or a computer or an animal, anything that makes the world around it change.
State#
An agent interacts with an environment. And state is used to describe that environment. If the agent modifies the environment, we say that the state of the environment is changed.
Action#
An agent makes an action to change the environment, which is, an agent makes an action to transition between states.
Reward#
Reward is obtained when making actions. Rewards are used to measure how good or bad an action is.
So basically what RL tries to solve is to have a good agent, that takes reasonable actions between states, and try to get the most rewards.
Important terms in RL.#
Value#
Value function refers to the total of rewards an agent will get before it dies (enters a terminated state).
Policy#
A policy refers to how an agent makes a decision.