Online-Learning Algorithms#
braintrace provides online-learning algorithms based on eligibility-trace
propagation. They all share one interface: wrap a model, compile its graph,
then call the learner as a drop-in replacement for the model’s forward pass —
gradients are accumulated forward in time instead of by BPTT.
Two correctness classes appear below. Exact algorithms compute the same total gradient as BPTT (just forward); they match a BPTT oracle element-wise. Approximate algorithms deliberately drop or factor part of the computation and match BPTT only in the regime their math guarantees.
One-Call Entry Point#
compile() is the recommended starting point. It constructs an algorithm
for a model and eagerly builds its eligibility-trace graph, returning a
ready-to-update learner in a single call.
Construct an online-learning algorithm for |
Base Classes#
The abstract bases shared by every algorithm. ETraceAlgorithm is the
root; ETraceVjpAlgorithm adds the VJP-based machinery that the
concrete D-RTRL / ES-D-RTRL / SNN algorithms build on. EligibilityTrace
is the state these algorithms carry across time.
The base class for the eligibility trace algorithm. |
|
The base class for the eligibility trace algorithm supporting the VJP gradient computation (reverse-mode differentiation). |
|
The state for storing the eligibility trace during the computation of online learning algorithms. |
D-RTRL — Parameter Dimension (exact)#
Decoupled Real-Time Recurrent Learning with a diagonal approximation of the hidden-to-hidden Jacobian. Memory complexity \(O(B \cdot |\theta|)\), where \(B\) is the batch size and \(|\theta|\) the number of parameters.
Online gradient algorithm with diagonal approximation and parameter-dimension complexity. |
|
The Diagonal RTRL (D-RTRL) online gradient computation algorithm. |
D_RTRL is the concrete, ready-to-use subclass of
ParamDimVjpAlgorithm.
ES-D-RTRL — Input/Output Dimension (exact)#
The Event-Synchronized D-RTRL algorithm factorizes the eligibility trace into input and output components with exponential smoothing, reducing memory to \(O(B(I + O))\), where \(I\) and \(O\) are the input and output dimensions.
Online gradient algorithm with diagonal approximation and input-output-dimension complexity. |
|
Online gradient algorithm with diagonal approximation and input-output-dimension complexity. |
pp_prop is the concrete subclass of IODimVjpAlgorithm;
ES_D_RTRL is an alias for pp_prop.
SNN Online-Learning Algorithms#
Paper-faithful algorithms tailored to spiking neural networks, all
ETraceVjpAlgorithm subclasses. These are approximate (except where a
regime makes them exact); know the regime before relying on their gradients.
Eligibility Propagation (e-prop) for recurrent spiking networks. |
|
OSTL 'with-H' regime — RTRL-exact single-layer factorization. |
|
OSTL 'without-H' regime — feedforward / no recurrent Jacobian. |
|
Online Training with Postsynaptic Estimates for spiking networks. |
|
Online Training Through Time for spiking neural networks. |
|
Online Spatio-Temporal Learning with Target Projection. |
Trace helpers reused across the SNN algorithms — a frozen random-feedback projection, an output-side low-pass filter, and a leaky presynaptic accumulator:
Frozen random feedback matrix with a stop-gradient guard. |
|
Low-pass output-side filter used by EProp. |
|
Leaky presynaptic accumulator used by OTTT and OTPE-Approx. |
Algorithm Comparison#
Algorithm |
Memory |
Computation |
Best For |
|---|---|---|---|
|
\(O(B \cdot |\theta|)\) |
\(O(B \cdot I \cdot O)\) |
RNNs, general-purpose |
|
\(O(B(I + O))\) |
\(O(B \cdot I \cdot O)\) |
Large SNNs, memory-constrained |
|
\(O(B \cdot |\theta|)\) |
\(O(B \cdot I \cdot O)\) |
SNNs with κ-filtered / random-feedback learning signals |
|
depends on regime |
depends on regime |
|
|
\(O(B \cdot I \cdot O)\) (full) / \(O(B(I+O))\) (approx) |
\(O(B \cdot I \cdot O)\) |
Deep SNNs; F-OTPE trades rank for memory |
|
\(O(B \cdot I)\) |
\(O(B \cdot I \cdot O)\) |
Very large SNNs; presynaptic λ-trace only |
|
\(O(B \cdot |\theta|)\) |
\(O(B \cdot I \cdot O)\) |
Target-projection via fixed random feedback |