

Prediction only: RL is used to learn the value function for the policy followed.The Prediction Problem and the Control Problem. RL methods are employed to address two related problems: The goal of an RL algorithm is to selectĪctions that maximize the expected cumulative reward (the return) Actions are selected according to a policy The value of a given state is defined by the averaged future reward which can be accumulated by selecting actions from There are subsequent states that can be reached by means ofĪctions. State has a changeable value attached to it. States and in visiting a state, a numerical reward will beĬollected, where negative numbers may represent punishments. The Algorithmic level (Machine-Learning perspective)įormulated as class of Markov Decision Problems

In general we are following Marr's approach (Marr et al 1982, later re-introduced by Gurney et al 2004) by introducing different levels: the algorithmic, the mechanistic and the implementation level. 5.2 (Temporal) Credit Assignment Problem.4.2 The Implementation-level (Neuroscience).2 Overview: from the algorithmic level to the neuronal implementation.1.2 The mechanistic level (Neuronal Perspective).1.1 The Algorithmic level (Machine-Learning perspective).
