The learning environment
The RL environment is dynamic, and is defined by a set of states , which the agent can observe and interact with. It is often represented as a simulation, but can also be a real physical system . In essence, the environment is typically defined as a Markov decision process (MDP) for which an exact mathematical model is unknown and can be potentially complex . The Markovian property in RL is related to the fact that, given a current state and set of actions, the next state is oblivious to all previous states and actions .

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.