Online-Learning Algorithms

Online-Learning Algorithms#

braintrace provides online-learning algorithms based on eligibility-trace propagation. They all share one interface: wrap a model, compile its graph, then call the learner as a drop-in replacement for the model’s forward pass — gradients are accumulated forward in time instead of by BPTT.

Two correctness classes appear below. Exact algorithms compute the same total gradient as BPTT (just forward); they match a BPTT oracle element-wise. Approximate algorithms deliberately drop or factor part of the computation and match BPTT only in the regime their math guarantees.

One-Call Entry Point #

compile() is the recommended starting point. It constructs an algorithm for a model and eagerly builds its eligibility-trace graph, returning a ready-to-update learner in a single call.

compile

Construct an online-learning algorithm for model and eagerly build its eligibility-trace graph, returning a ready-to-update learner.

Base Classes #

The abstract bases shared by every algorithm. ETraceAlgorithm is the root; ETraceVjpAlgorithm adds the VJP-based machinery that the concrete D-RTRL / ES-D-RTRL / SNN algorithms build on. EligibilityTrace is the state these algorithms carry across time.

`ETraceAlgorithm`	The base class for the eligibility trace algorithm.
`ETraceVjpAlgorithm`	The base class for the eligibility trace algorithm supporting the VJP gradient computation (reverse-mode differentiation).
`EligibilityTrace`	The state for storing the eligibility trace during the computation of online learning algorithms.

D-RTRL — Parameter Dimension (exact)#

Decoupled Real-Time Recurrent Learning with a diagonal approximation of the hidden-to-hidden Jacobian. Memory complexity \(O(B \cdot |\theta|)\), where \(B\) is the batch size and \(|\theta|\) the number of parameters.

\[\boldsymbol{\epsilon}^t \approx \mathbf{D}^t \boldsymbol{\epsilon}^{t-1} + \operatorname{diag}(\mathbf{D}_f^t) \otimes \mathbf{x}^t\]

\[\nabla_{\boldsymbol{\theta}} \mathcal{L} = \sum_{t' \in \mathcal{T}} \frac{\partial \mathcal{L}^{t'}}{\partial \mathbf{h}^{t'}} \circ \boldsymbol{\epsilon}^{t'}\]

`ParamDimVjpAlgorithm`	Online gradient algorithm with diagonal approximation and parameter-dimension complexity.
`D_RTRL`	The Diagonal RTRL (D-RTRL) online gradient computation algorithm.

D_RTRL is the concrete, ready-to-use subclass of ParamDimVjpAlgorithm.

ES-D-RTRL — Input/Output Dimension (exact)#

The Event-Synchronized D-RTRL algorithm factorizes the eligibility trace into input and output components with exponential smoothing, reducing memory to \(O(B(I + O))\), where \(I\) and \(O\) are the input and output dimensions.

\[\boldsymbol{\epsilon}^t \approx \boldsymbol{\epsilon}_{\mathbf{f}}^t \otimes \boldsymbol{\epsilon}_{\mathbf{x}}^t\]

\[\boldsymbol{\epsilon}_{\mathbf{x}}^t = \alpha \boldsymbol{\epsilon}_{\mathbf{x}}^{t-1} + \mathbf{x}^t\]

\[\boldsymbol{\epsilon}_{\mathbf{f}}^t = \alpha \operatorname{diag}(\mathbf{D}^t) \circ \boldsymbol{\epsilon}_{\mathbf{f}}^{t-1} + (1 - \alpha) \operatorname{diag}(\mathbf{D}_f^t)\]

`IODimVjpAlgorithm`	Online gradient algorithm with diagonal approximation and input-output-dimension complexity.
`pp_prop`	Online gradient algorithm with diagonal approximation and input-output-dimension complexity.

pp_prop is the concrete subclass of IODimVjpAlgorithm; ES_D_RTRL is an alias for pp_prop.

SNN Online-Learning Algorithms #

Paper-faithful algorithms tailored to spiking neural networks, all ETraceVjpAlgorithm subclasses. These are approximate (except where a regime makes them exact); know the regime before relying on their gradients.

`EProp`	Eligibility Propagation (e-prop) for recurrent spiking networks.
`OSTLRecurrent`	OSTL 'with-H' regime — RTRL-exact single-layer factorization.
`OSTLFeedforward`	OSTL 'without-H' regime — feedforward / no recurrent Jacobian.
`OTPE`	Online Training with Postsynaptic Estimates for spiking networks.
`OTTT`	Online Training Through Time for spiking neural networks.
`OSTTP`	Online Spatio-Temporal Learning with Target Projection.

Trace helpers reused across the SNN algorithms — a frozen random-feedback projection, an output-side low-pass filter, and a leaky presynaptic accumulator:

`FixedRandomFeedback`	Frozen random feedback matrix with a stop-gradient guard.
`KappaFilter`	Low-pass output-side filter used by EProp.
`PresynapticTrace`	Leaky presynaptic accumulator used by OTTT and OTPE-Approx.

Algorithm Comparison #

Algorithm	Memory	Computation	Best For
`D_RTRL`	\(O(B \cdot \|\theta\|)\)	\(O(B \cdot I \cdot O)\)	RNNs, general-purpose
`ES_D_RTRL`	\(O(B(I + O))\)	\(O(B \cdot I \cdot O)\)	Large SNNs, memory-constrained
`EProp`	\(O(B \cdot \|\theta\|)\)	\(O(B \cdot I \cdot O)\)	SNNs with κ-filtered / random-feedback learning signals
`OSTLRecurrent` / `OSTLFeedforward`	depends on regime	depends on regime	`OSTLRecurrent` (‘with-H’, D-RTRL) keeps the recurrent Jacobian; `OSTLFeedforward` (‘without-H’, pp_prop) drops it.
`OTPE`	\(O(B \cdot I \cdot O)\) (full) / \(O(B(I+O))\) (approx)	\(O(B \cdot I \cdot O)\)	Deep SNNs; F-OTPE trades rank for memory
`OTTT`	\(O(B \cdot I)\)	\(O(B \cdot I \cdot O)\)	Very large SNNs; presynaptic λ-trace only
`OSTTP`	\(O(B \cdot \|\theta\|)\)	\(O(B \cdot I \cdot O)\)	Target-projection via fixed random feedback

Online-Learning Algorithms

Contents

Online-Learning Algorithms#

One-Call Entry Point#

Base Classes#