Task and model schematic.

figure

posted on 2022-08-05, 17:37 authored by Luca Manneschi, Guido Gigante, Eleni Vasilaki, Paolo Del Giudice

The environment corresponds to the random movement of a group of dots on a screen, which is represented as a uni-dimensional noisy signal s(t) (black line), sampled at discrete time steps Δt = 10 ms from a Gaussian distribution of mean μ and variance σ². The task requires the subject to guess the sign of μ, by moving a lever to the right (positive sign) or to the left (negative sign); the subject can ‘choose when to choose’, within a maximum episode duration T_max. The learning agent integrates the signal over different timescales τ (, blue lines); the agent integrates a constant input (depicted in red as a constant from the start of the episode) over the same timescales (, yellow-red lines) to simulate an internal clock mechanism estimating the passage of time. In both cases, the darker the colour the longer the corresponding timescale. At each time instance, the weighted sums of the integrators (far right) are fed into a decision layer (the actor) that computes the probability of choosing ‘left’ and ‘right’, thus terminating the episode, or to ‘wait’ to see another sample of s(t). If the subject gives the correct answer (the guessed sign coincides with the actual sign of μ) within the time limit, a reward is delivered; otherwise, nothing happens. In any case, a new episode starts. The agent learns by observing the consequences (obtained rewards) of its actions, adapting the weights assigned to the and . During learning, the model estimates at each step t the total future expected reward V(t) (the critic) for the current episode as a linear summation of the integrators. Learning of the parameters is accomplished through a standard actor-critic reinforcement learning model, where the reward delivered by the environment is used to update the V value function, which is then used to update the actor’s parameters (see S1 Text for more details)

Task and model schematic.

History

Usage metrics

Categories

Keywords

Licence

Exports