Online-Learning Algorithms#

braintrace provides online-learning algorithms based on eligibility-trace propagation. They all share one interface: wrap a model, compile its graph, then call the learner as a drop-in replacement for the model’s forward pass — gradients are accumulated forward in time instead of by BPTT.

Two correctness classes appear below. Exact algorithms compute the same total gradient as BPTT (just forward); they match a BPTT oracle element-wise. Approximate algorithms deliberately drop or factor part of the computation and match BPTT only in the regime their math guarantees.

One-Call Entry Point#

compile() is the recommended starting point. It constructs an algorithm for a model and eagerly builds its eligibility-trace graph, returning a ready-to-update learner in a single call.

compile

Construct an online-learning algorithm for model and eagerly build its eligibility-trace graph, returning a ready-to-update learner.

Base Classes#

The abstract bases shared by every algorithm. ETraceAlgorithm is the root; ETraceVjpAlgorithm adds the VJP-based machinery that the concrete D-RTRL / ES-D-RTRL / SNN algorithms build on. EligibilityTrace is the state these algorithms carry across time.

ETraceAlgorithm

The base class for the eligibility trace algorithm.

ETraceVjpAlgorithm

The base class for the eligibility trace algorithm supporting the VJP gradient computation (reverse-mode differentiation).

EligibilityTrace

The state for storing the eligibility trace during the computation of online learning algorithms.

D-RTRL — Parameter Dimension (exact)#

Decoupled Real-Time Recurrent Learning with a diagonal approximation of the hidden-to-hidden Jacobian. Memory complexity \(O(B \cdot |\theta|)\), where \(B\) is the batch size and \(|\theta|\) the number of parameters.

\[\boldsymbol{\epsilon}^t \approx \mathbf{D}^t \boldsymbol{\epsilon}^{t-1} + \operatorname{diag}(\mathbf{D}_f^t) \otimes \mathbf{x}^t\]
\[\nabla_{\boldsymbol{\theta}} \mathcal{L} = \sum_{t' \in \mathcal{T}} \frac{\partial \mathcal{L}^{t'}}{\partial \mathbf{h}^{t'}} \circ \boldsymbol{\epsilon}^{t'}\]

ParamDimVjpAlgorithm

Online gradient algorithm with diagonal approximation and parameter-dimension complexity.

D_RTRL

The Diagonal RTRL (D-RTRL) online gradient computation algorithm.

D_RTRL is the concrete, ready-to-use subclass of ParamDimVjpAlgorithm.

ES-D-RTRL — Input/Output Dimension (exact)#

The Event-Synchronized D-RTRL algorithm factorizes the eligibility trace into input and output components with exponential smoothing, reducing memory to \(O(B(I + O))\), where \(I\) and \(O\) are the input and output dimensions.

\[\boldsymbol{\epsilon}^t \approx \boldsymbol{\epsilon}_{\mathbf{f}}^t \otimes \boldsymbol{\epsilon}_{\mathbf{x}}^t\]
\[\boldsymbol{\epsilon}_{\mathbf{x}}^t = \alpha \boldsymbol{\epsilon}_{\mathbf{x}}^{t-1} + \mathbf{x}^t\]
\[\boldsymbol{\epsilon}_{\mathbf{f}}^t = \alpha \operatorname{diag}(\mathbf{D}^t) \circ \boldsymbol{\epsilon}_{\mathbf{f}}^{t-1} + (1 - \alpha) \operatorname{diag}(\mathbf{D}_f^t)\]

IODimVjpAlgorithm

Online gradient algorithm with diagonal approximation and input-output-dimension complexity.

pp_prop

Online gradient algorithm with diagonal approximation and input-output-dimension complexity.

pp_prop is the concrete subclass of IODimVjpAlgorithm; ES_D_RTRL is an alias for pp_prop.

SNN Online-Learning Algorithms#

Paper-faithful algorithms tailored to spiking neural networks, all ETraceVjpAlgorithm subclasses. These are approximate (except where a regime makes them exact); know the regime before relying on their gradients.

EProp

Eligibility Propagation (e-prop) for recurrent spiking networks.

OSTLRecurrent

OSTL 'with-H' regime — RTRL-exact single-layer factorization.

OSTLFeedforward

OSTL 'without-H' regime — feedforward / no recurrent Jacobian.

OTPE

Online Training with Postsynaptic Estimates for spiking networks.

OTTT

Online Training Through Time for spiking neural networks.

OSTTP

Online Spatio-Temporal Learning with Target Projection.

Trace helpers reused across the SNN algorithms — a frozen random-feedback projection, an output-side low-pass filter, and a leaky presynaptic accumulator:

FixedRandomFeedback

Frozen random feedback matrix with a stop-gradient guard.

KappaFilter

Low-pass output-side filter used by EProp.

PresynapticTrace

Leaky presynaptic accumulator used by OTTT and OTPE-Approx.

Algorithm Comparison#

Algorithm

Memory

Computation

Best For

D_RTRL

\(O(B \cdot |\theta|)\)

\(O(B \cdot I \cdot O)\)

RNNs, general-purpose

ES_D_RTRL

\(O(B(I + O))\)

\(O(B \cdot I \cdot O)\)

Large SNNs, memory-constrained

EProp

\(O(B \cdot |\theta|)\)

\(O(B \cdot I \cdot O)\)

SNNs with κ-filtered / random-feedback learning signals

OSTLRecurrent / OSTLFeedforward

depends on regime

depends on regime

OSTLRecurrent (‘with-H’, D-RTRL) keeps the recurrent Jacobian; OSTLFeedforward (‘without-H’, pp_prop) drops it.

OTPE

\(O(B \cdot I \cdot O)\) (full) / \(O(B(I+O))\) (approx)

\(O(B \cdot I \cdot O)\)

Deep SNNs; F-OTPE trades rank for memory

OTTT

\(O(B \cdot I)\)

\(O(B \cdot I \cdot O)\)

Very large SNNs; presynaptic λ-trace only

OSTTP

\(O(B \cdot |\theta|)\)

\(O(B \cdot I \cdot O)\)

Target-projection via fixed random feedback