Edit me

The learning environment

The RL environment is dynamic, and is defined by a set of states , which the agent can observe and interact with. It is often represented as a simulation, but can also be a real physical system . In essence, the environment is typically defined as a Markov decision process (MDP) for which an exact mathematical model is unknown and can be potentially complex . The Markovian property in RL is related to the fact that, given a current state and set of actions, the next state is oblivious to all previous states and actions .





Tags: