Reinforcement learning | H3ABioNet ML Glossary

Edit me

The learning agent

The agent comprises (1) a policy and (2) a learning algorithm, i.e. anything that doesn’t belong to the environment. The policy is a function approximator (eg. deep neural network, polynomials or lookup tables ) that maps given actions to observations. Learning iteratively updates the policy, using as inputs the agent’s actions, observations and rewards, with the objective of finding a policy that maximizes the cumulative reward, over a large number of iterations . Policies can be learned via methods termed as critics and actors. The actor selects actions, while the critic criticizes them ref. RL agents can use critics only (value-based agents), actors only (policy-based agents), or both of the methods (actor-critic agents) to inform the agent’s policy. More generally, value-based agents work more optimally with discrete actions, and can be more computationally expensive for continuous actions; policy-based agents are simpler and are more suited for sets of continuous actions; actor-critic agents can be used with both discrete and continuous sets of actions.

RL agents can be implemented by a variety of algorithms, some examples being:

State-action-reward-state-action (SARSA)
Policy gradient (PG),
Deep Q-network, Deep deterministic policy gradient (DDPG)
Twin-delayed deep deterministic policy gradient (TD3),
Proximal policy optimization
Asynchronous Actor-Critic Agents (A3C)
Soft actor-critic (SAC)
Q-Learning
Deep Q-learning network (DQN)
Custom agents

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Tags: