Q-Learning
Q-Learning is a fundamental algorithm in reinforcement learning that helps an agent learn how to choose optimal actions in a given state without requiring a model of the environment. By using the concept of the action-value function, Q-Learning updates its value estimates based on the rewards it receives and the maximum expected future rewards. The update rule for Q-Learning is given by:
$$
Q(s, a) \leftarrow Q(s, a) + \alpha (r + \gamma \max_{a'} Q(s', a') - Q(s, a))
$$
Where:
- $\alpha$ is the learning rate, controlling how much new information overrides old information.
- $\gamma$ is the discount factor, determining the importance of future rewards.
- $r$ is the received reward after taking action $a$ in state $s$.
- $s'$ is the resulting next state after the action.
Q-Learning is advantageous because it allows the agent to learn the optimal policy simply by exploring its environment and learning from the consequences of its actions instead of needing a predefined policy.