LEARNING STATE SPACE TRAJECTORIES IN RECURRENT NEURAL NETWORKS: A PRELIMINARY REPORT

We describe a procedure for finding BE/Bw^ were £ is an arbitrary functional of th temporal trajectory of the states of a continuous recurrent network and * .. are the weights of that network. An embellishment of this procedure involving only computations that go forward in time is also described. Computing these quantities allows one to perform gradient descent in the weights to minimize E, so our procedure forms the kernel of a new connectionist learning algorithm. Abstract We describe a procedure for finding dE/dwij where £ is an arbitrary functional of the temporal trajectory of the states of a continuous recurrent network and wij are the weights of that network. An embellishment of this procedure involving only computations that go forward in time is also described. Computing these quantities allows one to perform gradient descent in the weights to minimize £, so our procedure forms the kernel of a new connectionist learning algorithm.


SUBJECT TERMS (Continue on reverse if mctsssry *nd identify by block number)
connectionism, learning algorithm, trajectories following, minimizing f-unctionals

ABSTRACT (Continue on reverse if necesury snd identify by block number)
We describe a procedure for finding BE/Bw^ were £is an arbitrary functional of th temporal trajectory of the states of a continuous recurrent network and *.. are the weights of that network. An embellishment of this procedure involving only computations that go forward in time is also described. Computing these quantities allows one to perform gradient descent in the weights to minimize E, so our procedure forms the kernel of a new connectionist learning algorithm. We describe a procedure for finding dE/dwij where £ is an arbitrary functional of the temporal trajectory of the states of a continuous recurrent network and wij are the weights of that network. An embellishment of this procedure involving only computations that go forward in time is also described. Computing these quantities allows one to perform gradient descent in the weights to minimize £, so our procedure forms the kernel of a new connectionist learning algorithm.
at where j is the total input to unit i, y t is the state of unit z, T, is the time constant of unit /, a is an arbitrary differentiable function 1 , w l y -are the weights, and the boundary conditions y(to) and driving functions I are the input to the system. See figure 2 for a graphical representation of this equation.

<r(O = (1 + «-<)-», in which case <r'(O = <T(OO -
Consider £(y), an arbitrary functional of the trajectory taken by y between t 0 and ti. 1 Below, we develop a technique for computing dE(y)/dw tJ and dE(y)/$T lt thus allowing us to do gradient descent in the weights and time constants so as to minimize £. The computation of dE/dw tJ seems to require a phase in which the network is run backwards in time, but a trick for avoiding this is also developed.

The Equations
Let us define (2) In the usual case where £ is of the form £(y) = J^f(y{f),t)dt this means that = 5/(y(0, 0 /dyM-Intuitively, ait) measures how much a small change to yi at time t effects £ if everything else is left unchanged. We also define where y ( '•'^ is the same as y except that dfyjdt has a Dirac delta function of magnitude <J added to it at time f. Intuitively, z,(0 measures how much a small change to y t at time t effects £ when the change to yi is propagated forward through time and influences the remainder of the trajectory.

WiJ(T ' (Xj(t))zj(t
where the (1 -At/Ti)zi(t) term is due to the linear influence > t (r) has upon y,-(f+40, the Ysj term is due to ^e effect that changing yi(t) has upon the other y/r+^f) through their nonlinear coupling, and the AteM term is due to the effect that changing y L between times t and t + At has directly upon £. By rewriting (5) as 40 assuming this to be of the form z,(0 = z,(/ + 40 -Atdzi/dt(t + 40, and taking the limit as At -• 0 we obtain a differential equation, dt Let at£ = 0 where jK^'*' 0 is the same as y except that w^ is increased by £ from r through t\. Again examining figure 2, we see that the appropriate difference equation for v is