Reinforcement Learning

Reinforcement Learning#

Note

We will use RL to refer to reinforcement learning.

What is RL?#

RL is a branch of machine learning, focusing on interacting with things. RL was mainly developed by observing animal/human behavior, so it has a lot in common with how humans make decisions. In RL, an agent makes an action that changes an environment, and receives rewards in the process. So for example, RL can be used to model how a person, agent, decides to have curry for dinner, action, which causes some carbon footprint on earth, environment, and feels happy about it, reward. In other words, RL can be used to model problems that are interactive, about things changing, and how an action will impact future behavior, and making the right decisions. Oh, and eating curry isn’t that bad for the planet earth.

Reinforce?#

I agree that it’s a bad name. RL in its early days referred to updating a model, that’s initially random, and reinforce/enhance the actions that yield good rewards.

Markov Decision Process#

Note

Markov Decision Process is also called MDP.

RL is designed to optimize the rewards out of an MDP. An MDP consists of several parts we previously mentioned:

Important terms in RL.#

Value #

Value function refers to the total of rewards an agent will get before it dies (enters a terminated state).

Policy #

A policy refers to how an agent makes a decision.

Reinforcement Learning

Contents

Reinforcement Learning#

What is RL?#

Reinforce?#

Markov Decision Process#

Agent #

State #

Action #

Reward #

Important terms in RL.#

Value #

Policy #

Reinforcement Learning

Contents

Reinforcement Learning#

What is RL?#

Reinforce?#

Markov Decision Process#

Agent#

State#

Action#

Reward#

Important terms in RL.#

Value#

Policy#

Agent #

State #

Action #

Reward #

Value #

Policy #