The solution for a reinforcement learning problem can be achieved using the Markov decision process or MDP. Hence, MDP is used to formalize the RL problem. It can be said as the mathematical approach to solve a reinforcement learning problem. The main aim of this process is to gain maximum positive rewards by choosing the optimum policy.
MDP has four elements, which are:
- A set of finite states S
- A set of finite actions A
- Rewards
- Policy Pa
In this process, the agent performs an action A to take a transition from state S1 to S2 or from the start state to the end state, and while doing these actions, the agent gets some rewards. The series of actions taken by the agent can be defined as the policy.