Markov Decision Processes (MDPs)
Markov Decision Processes (MDPs) serve as a powerful mathematical model for decision-making scenarios where outcomes are uncertain. The MDP framework encapsulates several key components:
- S: A set of states that represent all possible scenarios in the environment.
- A: A set of actions that the decision-making agent can take.
- T(s, a, s′): The transition function that defines the probability of moving from one state to another given an action.
- R(s, a, s′): The reward function that assigns a numerical reward to transitions between states based on actions taken.
- γ (gamma): A discount factor, which determines how future rewards are valued compared to immediate ones, with a range between 0 (prioritizes immediate rewards) and 1 (values future rewards equally).
The primary goal of MDPs is to discover an optimal policy π(s), a mapping from each state to an action designed to maximize the expected utility (or reward) over time. Two prevalent methods for solving MDPs are Value Iteration and Policy Iteration:
- Value Iteration involves updating the values of states iteratively based on expected future rewards, applying the Bellman Equation to calculate maximum utilities across states.
- Policy Iteration gradually enhances the chosen policy by evaluating it and adjusting based on value functions.
MDPs have broad applications in robotics (for navigational planning), inventory control, game-playing AI, and healthcare systems, making them fundamental in AI planning and decision-making tasks.