Practice Exploration Strategies: Ε-greedy, Softmax (9.4.4) - Reinforcement Learning and Bandits
Students

Academic Programs

AI-powered learning for grades 8-12, aligned with major curricula

Professional

Professional Courses

Industry-relevant training in Business, Technology, and Design

Games

Interactive Games

Fun games to boost memory, math, typing, and English skills

Exploration Strategies: ε-greedy, Softmax

Practice - Exploration Strategies: ε-greedy, Softmax

Learning

Practice Questions

Test your understanding with targeted questions

Question 1 Easy

What does ε represent in the ε-greedy strategy?

💡 Hint: Think about how often the agent randomizes its action.

Question 2 Easy

What is the main advantage of the ε-greedy strategy?

💡 Hint: Consider the implications of trying new options.

4 more questions available

Interactive Quizzes

Quick quizzes to reinforce your learning

Question 1

What does the ε in ε-greedy represent?

A fixed action
Probability of exploration
An exploitative strategy

💡 Hint: What percentage is usually used for exploration?

Question 2

True or False: The softmax strategy always chooses the highest expected reward action.

True
False

💡 Hint: Is it a fixed choice every time?

Get performance evaluation

Challenge Problems

Push your limits with advanced challenges

Challenge 1 Hard

In a scenario with three arms of a bandit, with expected rewards of [1, 2, 4] and ε = 0.2, calculate the probability of selecting each action using ε-greedy.

💡 Hint: Divide ε properly among the arms.

Challenge 2 Hard

Given the following expected rewards for four actions: [3, 4, 7, 8], compute their softmax probabilities with τ=1.

💡 Hint: Be careful with your calculations while applying softmax.

Get performance evaluation

Reference links

Supplementary resources to enhance your learning experience.