Recasting do calculus for time-symmetric brains

Standard causal inference assumes a privileged arrow of time: causes precede effects, and the mathematical machinery of do calculus is built around this asymmetry. Yet many candidate principles for brain function, from predictive coding to active inference, begin with a picture of a bayesian brain that encodes hypotheses about both past and future latent causes. To recast causal calculus under time symmetry, one must separate two notions that are often conflated: the thermodynamic or statistical asymmetry of the environment and the representational structure used by a system that can reason simultaneously about retrodiction and prediction. A brain can maintain beliefs over temporal trajectories as wholes—over histories and futures—while still interacting with an environment in which energy dissipation and entropy growth define a macroscopic arrow of time.

In the classical framework, an intervention is modeled by replacing the structural equation for a variable with a fixed value and cutting all incoming arrows to that variable. This “surgery” explicitly breaks the statistical regularities that would otherwise be induced by upstream causes, and do calculus provides algebraic rules for relating interventional distributions to observational ones. Under time symmetry, however, interventions must be reframed as constraints on entire paths rather than as local, one-time modifications of upstream variables. Instead of asking what would happen “if we set X at time t,” we ask what trajectories of states are compatible with a constraint imposed at time t, given probabilistic laws that are symmetric under time reversal. This leads naturally to path-space measures, in which both boundary conditions (priors over initial states and constraints over final states) shape the probability of intermediate events.

From this perspective, the basic objects of causal reasoning are no longer solely directed acyclic graphs over instantaneous variables but probability measures over temporally extended configurations. A time-symmetric causal calculus treats interventions as changes in boundary conditions on these configurations. One can model a brain as encoding joint beliefs over past causes and future consequences, with priors defined not just at an initial time but potentially at both temporal boundaries or at salient event points. In such a framework, “conditioning on data” and “enforcing an intervention” are both realized as updates to constraints on trajectories, but they differ in whether they respect or override the endogenous dynamics. The temporal asymmetry familiar from classical interventions then appears as a special case, corresponding to constraints applied only at the past boundary and propagated forward.

Retrocausality, in this setting, does not mean that physical signals literally propagate backward in time in violation of relativistic constraints. Rather, it indicates that optimal inferences about what happened earlier can depend sensitively on information obtained later, and that a system capable of near-real-time inference over trajectories will continuously revise beliefs about both past and future as new evidence arrives. A time-symmetric calculus must therefore distinguish between ontological order—how events are embedded in physical time—and epistemic order—how evidence arrives and modifies beliefs. The rules of do calculus are expanded so that counterfactual manipulations can target not only upstream nodes but also downstream or even temporally surrounding events, provided they are treated as modifications of the trajectory measure rather than as literal backward-in-time causes.

Mathematically, one route to time-symmetric causal calculus is to augment the standard factorization of a dynamical model with both forward and backward Markov kernels. In conventional dynamical causal models, distributions factor as a product of forward transition probabilities from earlier to later times, possibly conditioned on exogenous noise variables. Under time symmetry, an equivalent backward factorization is introduced, such that the same path distribution can be generated by propagating information from the future backward. Interventions then correspond to selectively altering one or more of these kernels or to imposing additional constraints on states at particular times. The calculus specifies how such alterations change the joint path probabilities and, crucially, how those changes ripple through both forward and backward factorizations.

This structure provides a natural home for a bayesian brain that integrates prediction and postdiction within a single generative model. The brain’s internal model can be seen as a time-symmetric generative process that assigns probabilities to full sensorimotor trajectories, with priors encoding both structural regularities of the environment and organism-specific constraints. Observations arriving at an intermediate time slice induce Bayesian updates that sharpen not only predictions of upcoming sensory input but also reconstructions of latent causes in the recent past. A recast causal calculus makes explicit that such bidirectional inference is not a violation of causality but a rational response to time-symmetric generative assumptions.

In this framework, classical interventionist questions—“What if I had acted differently?” or “What will happen if I impose this constraint now?”—are reformulated as operations on trajectory measures in which an agent’s own actions occupy a special role. One can interpret actions as self-imposed interventions on particular variables in the joint path, constrained by internal policies and energetic costs. Under time symmetry, these self-interventions select among possible trajectories in a way that depends on both anticipated future outcomes and inferred past states. The recalibrated calculus must therefore allow for policies that are functions of beliefs about entire trajectories, not just current states, and must specify how modifying those policies alters counterfactual distributions over both past and future events.

A related consequence of recasting causal calculus under time symmetry is the re-interpretation of conditional independencies. In ordinary graphical models, d-separation criteria provide a simple graphical test for when one set of variables is independent of another given a conditioning set, assuming a forward-directed graph. When dynamics are represented symmetrically in time, conditional independencies may be defined over pairs of forward and backward messages or over augmented state variables that bundle past- and future-directed information. The effective “shielding off” of information about remote events may then require conditioning on both upstream and downstream variables. A suitable extension of d-separation must capture how knowledge about future observations can screen off or reactivate dependencies among past events, while still avoiding logical contradictions or cycles in the underlying ontology.

Once causal calculus is cast in time-symmetric form, it becomes possible to characterize explicitly how familiar interventionist results emerge in the macroscopic limit where thermodynamic irreversibility and environmental noise favor a stable arrow of time. Forward-only do calculus can be viewed as a coarse-grained description that arises when backward kernels become negligible or intractable, so that optimal inference can be well-approximated by forward prediction alone. At finer temporal or spatial scales—such as those relevant for fast neural processes—this approximation may break down, and a time-symmetric treatment of interventions and counterfactuals may be necessary to account for observed patterns of neural activity and behavior. This layered view retains the empirical successes of traditional causal inference while embedding it within a more general, time-symmetric formalism suited to brains that constantly revise their understanding of both where they came from and where they are going.

Neural architectures for time-symmetric inference

Time-symmetric inference demands neural architectures that do not treat “past” and “future” as qualitatively different targets of computation, but as two directions along a single latent trajectory. Instead of a feedforward cascade that sends sensory inputs up a hierarchy to yield predictions of future states, a time-symmetric architecture maintains coupled forward and backward flows of information. The forward stream carries beliefs and prediction errors from earlier to later time points; the backward stream carries constraints and postdiction signals from later to earlier ones. At any given moment, neural activity represents a compromise between these two directions: a locally consistent estimate of a temporally extended cause that explains both what has already been observed and what is anticipated.

One concrete architectural motif is a bidirectional chain in which each “time slice” of the world is encoded by a local neural population that is recurrently connected to its neighbors in both directions. Rather than interpreting these connections as mere short-term memory, they implement a distributed representation of trajectories. Forward connections implement a generative model over how latent causes evolve: from a hypothesized state at time t, the forward connections predict the distribution of states at time t+1. Backward connections implement a complementary, approximately inverted model: given constraints at time t+1, they propagate likelihood-based messages about which states at time t are consistent with those constraints. Neural dynamics then perform an approximate smoothing operation, combining forward prediction with backward postdiction to yield time-symmetric inferences about the entire local segment of the trajectory.

Within this architecture, classical hierarchical organization is preserved but extended in time. Each level of the hierarchy encodes not only latent features but also their temporal continuations, with higher levels spanning longer temporal windows. At each level, forward and backward temporal messages interact with top-down and bottom-up spatial messages, so that the same neural circuitry simultaneously negotiates spatial and temporal constraints. For example, a visual hierarchy might maintain beliefs over the motion of an object’s trajectory that are informed by both the recent path and the expected future continuation, refining both as new evidence arrives. The outcome is a “four-way” inference scheme: bottom-up likelihoods, top-down priors, forward-in-time dynamics, and backward-in-time constraints all converge on shared representational units.

Predictive coding offers a natural substrate for such architectures, provided it is generalized beyond strictly forward-in-time prediction. In classical predictive coding, each cortical area maintains a representation of causes and exchanges prediction and error signals with neighboring areas. Under time symmetry, these same circuits are equipped with dual temporal roles: they encode expectations about how causes will unfold and retrodictions about how they must have unfolded to yield current and anticipated evidence. Neurons that currently signal prediction error can be understood as encoding mismatches between a local estimate of the trajectory and both its past-constrained and future-constrained continuations. Adjusting activity to minimize this joint error implements a gradient descent on a time-symmetric objective function that integrates information from both temporal directions.

Recurrent architectures such as reservoir networks and gated recurrent units can also be reinterpreted as primitive substrates for time-symmetric inference if they are trained with loss functions defined over entire sequences rather than one-step-ahead prediction alone. When a network is optimized to reconstruct past inputs and predict future ones simultaneously, its internal states are pushed toward encoding information that is sufficient for both retrodiction and prediction. If training additionally includes counterfactual queries—such as evaluating how a change at an intermediate time would alter earlier and later observations—the recurrent dynamics implicitly approximate an internal do calculus over trajectories. In such systems, gradient-based learning shapes connectivity so that a small change in activity at one time point propagates coherently both forward and backward through the network, reflecting how the model believes the world would respond to interventions distributed in time.

Another promising motif is the use of paired forward and backward “controller” circuits that operate over the same sensory and motor representations. The forward controller computes policies and predictions given current beliefs about states and goals; the backward controller infers which past states and actions would be most consistent with an observed or desired outcome. Coupling these controllers through shared latent variables allows the system to approximate full Bayesian smoothing in real time: as soon as a new outcome becomes observable, backward signals update beliefs about preceding hidden causes, which in turn update forward predictions and policies. Architecturally, this can be realized through interleaved layers of neurons specialized for forward dynamics and neurons specialized for backward inference, with plasticity rules that ensure consistency between the two.

Time-symmetric inference becomes particularly tangible in architectures that explicitly represent boundary conditions. For example, some neural populations can be tuned to encode “initial context” (what is known or assumed about the recent past), while others encode “terminal constraints” (goals, predicted or desired future states). Intermediate populations then reconcile these boundary-specific constraints with ongoing sensory evidence. In this view, actions are generated by circuits that adjust the future boundary conditions—by selecting goals or subgoals—while parallel circuits revise beliefs about the past boundary conditions in light of unexpected outcomes. The whole system forms a neural analog of path-space reasoning, where activity patterns evolve toward configurations that satisfy both boundary encodings as well as consistency with the dynamics encoded in recurrent connections.

Neural coding schemes may also adapt to support time-symmetric computation. Rather than encoding only instantaneous features (such as position or orientation), neurons may respond preferentially to trajectory fragments or “motion templates” that implicitly bind past and future. Cells sensitive to specific optic flow patterns, for instance, may represent not just where an object is now but which kind of path it is most likely following and will continue to follow. In temporal association areas, neurons that fire to particular sequences of events can be interpreted as implementing basis functions over trajectories, enabling the brain to approximate complex, non-Markovian path distributions using a finite set of recurrently activated patterns. When such trajectory cells participate in both forward predictions and backward constraints, the network effectively implements a compact, distributed representation of path-level hypotheses.

Learning in such architectures naturally aligns with a time-symmetric form of causal inference. Synaptic modifications must credit or blame activity not just based on immediate outcomes but also on how that activity contributes to the fit of an entire trajectory to observed and desired constraints. Biologically plausible approximations may use eligibility traces that persist across time, allowing delayed neuromodulatory signals—such as dopamine bursts associated with unexpected rewards or punishments—to reshape synapses that were involved in earlier, causally relevant events. Under a time-symmetric interpretation, these eligibility traces are not merely forward-looking remnants; they also serve as substrates for reassigning credit retrospectively as new outcomes arrive, enabling the network to revise its model of which internal states best explained the whole sequence.

Crucially, these architectures must not conflate epistemic revision with physical retrocausality. Backward-propagating signals in neural tissue do not imply that spikes travel backward in time, but that current activity encodes updated beliefs about earlier states. Implementationally, retrocausal-like computation is realized through recurrent loops and feedback pathways that carry information from downstream layers, late sensory areas, or evaluative circuits back to earlier processing stages. The resulting dynamics allow earlier representations to be overwritten or refined in light of later evidence, so that what the network “thinks” happened at time t evolves as more data becomes available. This continual revision is what time-symmetric brain architectures are specifically designed to support.

From an algorithmic standpoint, these neural designs approximate smoothing and counterfactual reasoning in state-space models. The forward connections approximate filtering: estimating the current state from past observations. The backward connections approximate smoothing: refining estimates of past states using future observations. When policies are included and treated as latent variables subject to hypothetical modification, the same circuitry can approximate counterfactual queries about what would have happened under different actions. By embedding both directions of inference in a single recurrent substrate, the brain can implement a flexible, online approximation to a time-symmetric form of do calculus without ever constructing explicit graphical models in symbolic form.

Throughout, priors and prediction remain central organizing principles. Priors are no longer restricted to initial states but can be placed over whole paths, goals, and temporal symmetries. Predictions are not just next-step forecasts but constraints on how entire trajectories should behave given those priors and the organism’s value structure. Neural architectures for time-symmetric inference thus blend predictive machinery with retrodictive and counterfactual capabilities, enabling a bayesian brain to exploit information from all temporal directions to guide perception, memory, and action.

Bidirectional causality in perception and action

Perception and action are often described as occupying opposite ends of a causal chain: sensory inputs drive internal states, which in turn drive motor outputs. Under this view, perception “reads” the world while action “writes” to it, with a clean separation between upstream causes and downstream consequences. A time-symmetric view disrupts this picture by treating perception and action as two coupled ways of constraining a single sensorimotor trajectory. The bayesian brain does not first perceive and then act; it jointly infers a trajectory of hidden causes and selects those actions that make the overall trajectory most probable under its generative model and preferences. Bidirectional causality, in this sense, is not a metaphysical claim about forces flowing backward in time, but a structural property of how beliefs about past, present, and future are co-determined by perception and action.

In perception, bidirectionality shows up as a continuous interplay between bottom-up evidence and top-down constraints that include expectations about future sensations. A purely feedforward system would treat incoming signals as fixed data and compute their most likely causes based only on past context. In a time-symmetric brain, current perceptual estimates are also anchored to anticipated future inputs and the consequences of planned actions. For example, when tracking a moving object that temporarily disappears behind an occluder, the visual system maintains a percept of a coherent trajectory even in the absence of visual evidence. This percept depends on priors about object persistence and motion, but it is also modulated by predictions of where and when the object will reappear and how the organism intends to interact with it. The percept at each moment becomes the fixed point of a negotiation between constraints coming from both earlier and later times.

Action reveals the other side of this coupling. Traditional control theory models actions as outputs computed from current state estimates that then alter future states. In a time-symmetric setting, actions are selected so that entire predicted sensorimotor paths satisfy both past-consistency and future-desirability constraints. When an agent decides to catch a ball, it does not merely respond to the ball’s current location but implicitly chooses a trajectory of joint positions and forces that, if realized, would make both past visual evidence and a future successful catch jointly probable. Actions thus play the role of self-imposed interventions on the trajectory: they change which paths are accessible, but they are chosen under a generative model that evaluates how those changes would propagate both forward into future outcomes and backward into revised interpretations of prior events, such as how fast the ball was actually thrown.

This leads to a refinement of how causal inference is implemented in embodied agents. Instead of asking exclusively how external causes produce sensory effects, the system must also consider how its own potential actions change the mapping from hidden causes to observations. In classical do calculus, an intervention on a variable severs incoming causal links and fixes its value, allowing us to compare hypothetical worlds with different actions. For a time-symmetric brain, implementing such interventions requires reasoning over path space: altering the distribution over action sequences and evaluating how these modifications would restructure both the expected future and the inferred past. The same neural circuits that perform perceptual smoothing over trajectories must therefore also support counterfactual evaluations of alternative policies, assigning causal credit not just to what was done, but also to what could have been done differently.

Bidirectionality in perception is particularly evident when later evidence reshapes earlier percepts. Postdictive phenomena, such as the flash-lag or color phi illusions, show that what a subject reports as having seen at an earlier moment can depend on stimuli presented tens or hundreds of milliseconds later. Under a time-symmetric causal model, these effects arise because the percept is an estimate of a short trajectory segment, not an instantaneous snapshot. As new data arrive, the brain revises its best-fitting trajectory, which retroactively alters the inferred causes of earlier samples. From the subject’s point of view, the “cause” of their earlier percept includes both light that has already hit the retina and expectations about how the scene should unfold next. The notion of cause here is epistemic and path-based: the percept is caused by the best global explanation of the local temporal neighborhood, not by a single past frame.

Action exhibits a corresponding retroactive revision of causality. After executing a movement and observing its outcome, the agent updates beliefs about its own motor commands, bodily state, and environmental dynamics leading up to the outcome. Suppose a person reaches for a cup, knocks it over, and notices that it was more slippery than expected. The experience of “having misjudged the force” is not just a judgment about the present error; it involves reassigning causal weight to earlier microstates: grip strength, friction coefficients, and joint trajectories. Through this lens, sense of agency and responsibility emerge from a smoothing process over internal actions and external consequences. The brain infers which parts of the trajectory were most causally responsible for the outcome, potentially downgrading or upgrading contributions from earlier motor signals in light of later evidence.

In both perception and action, prediction errors propagate bidirectionally. A discrepancy between expected and observed sensory feedback does not simply drive future corrections; it also modifies beliefs about past states and actions that led to the mismatch. If an unexpected tactile sensation occurs during a reach, it may be explained either by an unmodeled external contact or by a mis-specified limb position earlier in the movement. A time-symmetric error-correction process weighs these alternatives, adjusting both the forward estimates (where the limb is going) and the backward estimates (where it must have been) to minimize a global trajectory-level discrepancy. This stands in contrast to strictly feedforward error correction, where errors can only update downstream variables and never revise upstream ones.

The coupling of perception and action under time symmetry is especially clear in active perception. To disambiguate uncertain sensory input, organisms move: they saccade to new parts of a scene, palpate objects, or shift vantage points. Each such move is chosen not merely to maximize future information but to reshape the entire trajectory’s explanatory power. When deciding where to look next, an agent evaluates how alternative saccades would alter future observations and how those altered observations would, in turn, restructure the interpretation of what was just seen. A saccade is thus selected because, under the agent’s model, it yields a trajectory in which both past and future evidence can be jointly explained with low uncertainty. The “cause” of the decision can be traced to a joint compatibility constraint across time, not only to a local mismatch at the present moment.

This perspective also recasts the link between motor commands and sensory reafference. Efference copies of outgoing motor signals are classically seen as predictors of expected sensory feedback, enabling cancellation of self-generated sensations. In a time-symmetric framework, these signals do more: they define a provisional future boundary condition for the sensorimotor trajectory, encoding a hypothesis about what the body will be doing and sensing. Sensory reafference then tests not only whether the future matches that hypothesis, but also whether the inferred past leading up to the action remains coherent. When a movement unfolds differently than expected, the system may respond either by revising its belief about current dynamics or by reinterpreting prior states (for instance, that the body was already off balance) so that the full trajectory, including the surprise, remains explainable.

Bidirectionality extends to valuation and decision-making. Rewards and punishments, when encountered, initiate backward-propagating inferences that reshape beliefs about which preceding actions and states were causally responsible. Eligibility traces provide a substrate for this: they maintain a decaying memory of past neural activity that might be credited or blamed once the outcome is known. Under a time-symmetric view, these traces implement a coarse approximation to smoothing over action trajectories, allowing later evaluative signals to reach back and modulate synapses that contributed to earlier decisions. As the system acquires more experience, it incrementally constructs a path-dependent causal model in which both perception and action are tuned so that desirable outcomes are embedded in trajectories judged to be internally consistent and externally reliable.

Importantly, such bidirectional causality must remain consistent with the macroscopic arrow of time. Signals still propagate forward along axons; muscles contract after neurons fire, not before. What becomes bidirectional is the internal causal bookkeeping that the bayesian brain performs over its own history of observations and actions. Retrocausality, in this operational sense, is the capacity of later evidence and outcomes to reshape attributions of cause and effect over earlier perception–action cycles. The neural machinery that gives rise to this capacity is built from recurrent and feedback connections, as well as learning rules that adjust synapses based on temporally extended contingencies. The effective result is a system in which perception and action are mutually constraining components of a single, time-symmetric causal narrative about the organism’s interaction with its environment.

Integrating this view with formal tools like do calculus highlights how counterfactuals in perception and action are deeply intertwined. To ask, “What would I have perceived if I had moved my eyes differently?” or “Would the cup still have fallen if I had grasped it more gently?” is to manipulate the joint distribution over sensorimotor trajectories, not just isolated nodes in a causal graph. Under a time-symmetric calculus, such queries require altering both forward dynamics (how different actions affect future outcomes) and backward inferences (how those alternative outcomes would alter beliefs about past sensory sampling and internal states). The answers to these questions are encoded not in isolated percepts or motor commands but in the full family of trajectories that the brain judges compatible with its model of the world and its own role within it.

Graphical models for time-symmetric brain dynamics

Graphical models designed for time-symmetric brain dynamics must represent not only instantaneous variables but entire trajectories, together with the constraints that arise from both past and future evidence. Instead of a single directed acyclic graph unrolling forward in time, the relevant object is a path graph whose nodes encode states at discrete time points and whose edges are equipped with both forward and backward transition kernels. Each time slice is linked to its neighbors by pairs of factors: one encoding the conditional probability of moving forward given current state and actions, the other encoding the conditional probability of having arisen from previous states given current configurations and observations. The resulting structure is no longer a simple DAG but a factor graph over paths, where directed edges coexist with symmetric constraints that embody a time-reversal-invariant generative law.

Within this framework, the bayesian brain can be modeled as maintaining a joint distribution over latent trajectories, actions, and observations that factorizes through such a bidirectional graph. A common construction is to introduce latent state variables for each time step, observation variables anchored to those states, and policy variables that modulate state transitions. Forward factors express how states and observations are expected to evolve under a given policy; backward factors encode how the same states would be inferred when future observations or desired outcomes are treated as soft boundary conditions. This dual factorization allows the brain to perform filtering, smoothing, and planning within a unified graphical model: forward messages propagate priors and prediction, while backward messages carry postdictive and goal-related constraints that reshape inferences about both earlier and later states.

To handle time symmetry explicitly, graphical models may augment each state node with two companion variables: a forward-directed representation and a backward-directed representation. The forward representation summarizes information from past evidence and earlier actions; the backward representation summarizes constraints from future evidence and anticipated goals. Local compatibility factors enforce consistency between these two directions at each time slice, so that the effective belief about a state is obtained by combining both messages. This split representation makes explicit how a single neural population might implement simultaneous prediction and retrodiction: it encodes the intersection of what is plausible given the past and what is required given the future, as expressed in the graphical model’s local factors.

In such models, edges cannot be interpreted merely as one-way causal arrows in the traditional sense. Instead, they represent conditional dependencies under a time-symmetric generative law, while causal direction emerges from how interventions are defined on the graph. An intervention is modeled as a modification of specific factors that correspond to self-controlled variables, such as actions or certain internal states, without altering the underlying physical couplings that link states across time. Under a path-based extension of do calculus, intervening on a segment of the trajectory means replacing the relevant policy- or state-factors with fixed or parametrically controlled alternatives and recomputing the joint distribution over the remaining variables, both upstream and downstream in time. The graphical structure ensures that such changes propagate coherently: altered factors modify forward messages toward the future and backward messages toward the past, reshaping the entire trajectory-level belief.

Classical d-separation is not directly applicable once both forward and backward kernels are present, because cycles appear if edges are treated as purely directed. A more appropriate separation criterion operates on an undirected moralized version of the path graph or on an augmented representation in which each temporal direction is a separate layer. In the layered representation, forward edges connect state variables across increasing time indices, backward edges connect across decreasing indices, and cross-layer factors enforce consistency at each time slice. Conditional independence then depends on whether all paths in this augmented graph are blocked by the conditioning set, taking into account that blocking may require including both a node and its temporal dual. For example, future observations may screen off earlier states from even later outcomes when both the forward and backward representations of the intermediate state are conditioned upon, capturing how new evidence can neutralize previously inferred dependencies in smoothing.

Graphical models for time-symmetric brain dynamics also benefit from incorporating explicit boundary nodes that encode information about initial context and terminal goals. Initial-boundary nodes carry priors over early states and policies; terminal-boundary nodes encode desired or expected outcomes at later times. Factors connect these boundary nodes to the trajectory’s endpoints, thereby shaping the path distribution even before intermediate evidence is considered. When new data arrive mid-trajectory, belief propagation updates not only internal states but also the effective influence of boundary conditions: some goal states may become unreachable given observed constraints, while certain initial-context hypotheses become implausible. The model thus represents behavior as the result of reconciling two sets of anchors—what was likely at the start and what is desired or observed at the end—subject to dynamical laws captured in the interior of the path graph.

Incorporating motor control into these models typically involves introducing action nodes that influence state transitions but are themselves governed by policy factors. Under time symmetry, actions are treated as random variables that can be observed, latent, or counterfactually manipulated. Policy factors connect action nodes to relevant state and goal nodes, specifying how probable each action sequence is under a given control strategy. When running inference, the model can solve simultaneously for the most plausible hidden states given observed actions and for the most plausible actions given observed states and rewards. This dual role is what enables the same graphical structure to support both retrospective causal inference about why an outcome occurred and prospective planning about which actions will best realize future boundary conditions.

To support online computation, the path graph is often instantiated in a rolling-window form. Rather than representing arbitrarily long trajectories explicitly, the model maintains a sliding segment of time within which both forward and backward messages are updated as new observations arrive and old ones fade in relevance. Factors near the edges of the window absorb the influence of the unmodeled past and future through summary nodes that act as effective boundary conditions. For a brain, these summary nodes correspond to slowly changing contextual beliefs, such as stable environmental statistics or enduring goals. The rolling-window representation allows time-symmetric reasoning to remain computationally tractable: messages are updated within a bounded subgraph, while the broader temporal context is compressed into a small set of boundary variables.

A key innovation in time-symmetric graphical models is to treat many observed quantities, including those traditionally considered “causes” or “effects,” as soft constraints rather than hard assignments. Instead of clamping an observation node to a fixed value and ignoring measurement noise, one introduces likelihood factors that express how probable each observation is under different state trajectories. Similarly, goal states are implemented as soft preferences via utility or desirability factors that bias the distribution toward paths realizing certain outcomes but do not completely eliminate alternatives. This soft-constraint formulation reflects the brain’s uncertainty about both sensory input and value structure, and it permits gradual trade-offs between competing explanations and competing goals as messages flow through the graph in both temporal directions.

When formalizing retrocausality at the level of representation, the graphical model does not introduce literal backward-in-time edges in the physical sense; instead, it incorporates factors in which later observations influence the posterior over earlier states by means of backward messages. These backward messages can be interpreted as encoding the likelihood of hypothetical past states under the condition that certain future data are realized. They are computed algorithmically via belief propagation or variational inference, traversing the graph opposite to the forward generative direction. In neurobiological terms, this corresponds to feedback connections and recurrent loops carrying postdictive signals that revise earlier representations. The graph thus captures how information from the future light cone of an event can legitimately influence the brain’s belief about that event without implying any violation of physical causality.

Variational formulations of time-symmetric graphical models allow complex trajectories to be approximated with tractable families of distributions. One common strategy is to posit a recognition model with its own graphical structure—often a simpler chain or tree—that approximates the true posterior over paths in the full model. Parameters of this recognition graph are adjusted to minimize a time-symmetric free-energy functional that includes contributions from both forward and backward prediction errors. Under this lens, cortical networks implement the recognition graph, while hypothetical latent dynamical systems correspond to the generative graph. Learning then aligns the two, so that the recognition graph can rapidly infer both past and future latent variables given partial observations, effectively internalizing a time-symmetric do calculus over trajectories.

Incorporating discrete structural uncertainty, such as alternative hypotheses about causal connectivity or environmental regimes, requires a higher-level graphical layer. This layer introduces model-selection variables that choose among different transition structures, observation mappings, or policy classes. Each choice induces a different subgraph for the trajectory dynamics, with its own pattern of forward and backward factors. As evidence accumulates, belief propagation updates both the continuous trajectory variables and the discrete structure variables, allowing the system to revise not only its estimates of what happened and what will happen but also its assumptions about how states are linked causally across time. For the bayesian brain, this corresponds to learning new “rules of the game” while simultaneously interpreting ongoing experience, captured explicitly in the hierarchy of graphs.

Temporal abstraction and hierarchical control find a natural representation in multi-scale time-symmetric graphical models. Higher-level nodes summarize coarse-grained trajectories over longer timescales, while lower-level nodes capture fine-grained dynamics. Bidirectional messages flow both within each scale and between scales: higher-level backward messages convey long-horizon goals and constraints, while lower-level backward messages encode more immediate postdictions. Forward messages at higher levels represent slow contextual drifts and strategic intentions, and at lower levels they represent rapid sensorimotor predictions. This multiscale arrangement enables the brain to reconcile slow-changing beliefs about the world and self with fast-changing evidence, all within a unified graphical apparatus that remains formally symmetric in time.

These graphical models make it explicit how priors and prediction shape brain dynamics at the path level rather than at isolated time points. Priors are encoded as factors over entire segments or motifs of trajectories, reflecting expectations about smoothness, periodicity, or stereotyped action patterns. Prediction corresponds to the forward propagation of these priors through the graph, generating anticipatory structure in beliefs about upcoming states and observations. When future evidence arrives that contradicts these predictions, backward messages update not only immediate predecessors but the full chain of states and actions that led to the mismatch, redistributing probability mass over alternative trajectories. The resulting dynamics embody a time-symmetric form of causal inference, in which the graphical model continuously renegotiates the most plausible and most desirable sensorimotor paths consistent with both past data and future constraints.

Implications for learning, prediction, and control

Learning in a time-symmetric setting is most naturally framed at the level of trajectories rather than isolated states. Instead of tuning parameters so that a model predicts the next sensory sample as accurately as possible from the past, the system adjusts its internal dynamics so that entire sensorimotor paths become probable under the joint influence of past evidence, future outcomes, and control objectives. This trajectory-centric view reshapes classical notions of credit assignment, exploration, and consolidation. Synaptic updates must encode how small changes in internal parameters would alter the probability of whole paths, given that beliefs about those paths are themselves the result of smoothing that blends information from both earlier and later times.

Under such a scheme, the objective function is a time-symmetric free energy or pathwise log-likelihood that includes both forward and backward components. The forward component measures how well the model’s dynamics and observation mappings explain data when propagated from earlier states toward later ones. The backward component measures how well the same parameters support coherent retrodictions from later evidence back to earlier states. Learning seeks parameters for which these two components agree, i.e., where forward and backward inferences over paths converge to similar beliefs. This alignment means that the system can rely on either direction of inference, or their combination, when updating its understanding of the environment and itself.

From the perspective of causal inference, this learning process amounts to discovering a set of structural equations and transition kernels that remain stable under a wide class of interventions extended over time. Classical do calculus concerns how changes to one variable’s structural equation propagate through a directed acyclic graph. In a time-symmetric brain, analogous questions are asked about path distributions: how does changing the policy over actions, or altering the assumed reliability of a sensory channel during a particular interval, modify beliefs about both earlier and later variables? Parameter updates are evaluated not only by their effect on predictive accuracy but also by how robustly they preserve the coherence of these interventional predictions across time, ensuring that counterfactual trajectories remain mathematically well behaved.

This has direct implications for how an organism learns the consequences of its own actions. When an action is taken, subsequent observations inform not only the mapping from that action to future states, but also the inferred state from which the action was launched. If the observed outcome diverges from what was expected, the brain must decide whether to revise its belief about the environment’s dynamics, about its own body configuration at the time of the action, or about the reliability of the sensory feedback itself. In a time-symmetric calculus, learning rules are designed so that these alternatives are adjudicated at the trajectory level: parameters are updated so as to minimize global inconsistency across paths, rather than merely local prediction error at single time steps.

Eligibility traces and temporally extended synaptic tags play a crucial role in approximating this pathwise learning. Traditional reinforcement learning interprets eligibility traces as forward-looking devices that keep a memory of past activity until reward information is available. In a time-symmetric interpretation, these traces encode a bidirectional commitment: they mark synapses that might be responsible for both past-consistent and future-consistent explanations of upcoming outcomes. When a reward or punishment is finally observed, neuromodulatory signals can modify synapses in proportion to how strongly they were implicated in the smoothed trajectory that best explains the full episode. Learning thus becomes a process of continuously refining which micro-level neural events are treated as causally efficacious for entire sequences, not just for immediate successors.

Priors and prediction take on a richer role when specified over paths. Instead of priors being defined solely over initial states or static parameters, they can encode expectations about regularities across time: smoothness of motion, rhythmic patterns, stereotyped reaching trajectories, or typical conversational exchanges. Prediction then means drawing probable paths from these priors and projecting them forward as provisional hypotheses about upcoming experience and behavior. When actual observations deviate from these pathwise predictions, the resulting errors are not assigned exclusively to the latest time step but are back-propagated across the entire segment, adjusting both the priors over trajectory motifs and the inferred states at each time point. Over developmental timescales, this process allows the bayesian brain to internalize the temporal structure of its environment and to deploy that structure for both fast perception and flexible control.

Time-symmetric learning has important consequences for how the brain handles ambiguity and noise. Because inferences about any moment draw on information from both sides in time, learning algorithms tend to favor parameters that distribute uncertainty across trajectories rather than localizing it at arbitrary points. For example, if a sequence contains a brief occlusion of a moving object, a purely forward learner might attribute high uncertainty to the occlusion period alone, with sharp confidence before and after. A time-symmetric learner, by contrast, will adjust beliefs around both the onset and offset of the occlusion, potentially smoothing the entire trajectory of inferred positions and velocities. This leads to internal models that are more tolerant of missing data and that generalize better to novel configurations of partial information.

Prediction in such a system becomes inherently counterfactual. When projecting forward, the model does not merely generate the most probable continuation of the current path; it implicitly considers how anticipated outcomes will reshape its eventual retrodictions. That is, a predicted future is evaluated partly by asking: if this were to occur, would it make my overall trajectory, including what I have already observed, more or less coherent? This constraint can bias prediction toward futures that “explain” the past in an efficient way, akin to a principle of narrative compression. For instance, in social interactions, an agent may predict that another’s ambiguous gesture will turn out to be friendly rather than hostile if that interpretation yields a simpler, more self-consistent story about the interaction history. Such predictive biases emerge naturally in a time-symmetric framework where past and future are optimized jointly.

Control policies, likewise, must be evaluated in terms of their impact on entire paths. Classical optimal control typically optimizes a cost function over future states while treating the past as fixed. In time-symmetric control, candidate policies are scored according to how they reshape both expected futures and inferred pasts. A policy that successfully achieves a goal but requires implausible reinterpretations of prior sensory evidence—for example, assuming that several recent observations were grossly misleading—may be ranked lower than an alternative policy that achieves slightly less reward but preserves a more coherent explanation of the entire episode. This introduces an epistemic dimension to action selection: agents prefer actions that maintain or enhance the internal consistency of their world model, not just those that maximize external payoff.

Formally, such policies can be derived from a path integral over controlled trajectories, weighted by both utility and epistemic terms. Utility terms capture the desirability of particular outcomes, as in standard reinforcement learning or optimal control. Epistemic terms penalize trajectories that would require large updates to the model’s parameters or that would be highly surprising under entrenched priors. Time-symmetric do calculus then provides a way to evaluate counterfactual policies: one can compute how altering the policy on a finite segment of time would change the distribution over entire paths and, consequently, the expected combination of reward and epistemic cost. The brain’s approximate implementation may rely on sampling, message passing, or gradient-based methods over internal representations of candidate trajectories.

This coupling between learning and control is central to active learning in a time-symmetric brain. When choosing exploratory actions, the agent does not just ask which behaviors will yield informative future observations; it also considers how those observations will disambiguate its current uncertainties about the past. A saccade might be selected because it will help determine whether an earlier fleeting stimulus was a face or a random texture. A probing movement of the hand might be chosen to clarify whether an unexpected resistance felt moments ago was due to a hidden object or to a transient change in muscle stiffness. Exploration thus targets both prospective and retrospective uncertainty, optimizing the expected reduction in ambiguity over entire segments of experience.

Memory consolidation and offline replay offer additional venues where time-symmetric learning manifests. During rest or sleep, neural systems can simulate alternative trajectories that are consistent with stored episodes but differ in details of actions, contexts, or outcomes. Some replays may run forward, others backward, and many may involve complex recombinations. These internally generated paths allow the brain to refine its estimates of causal structure without immediate sensory input, effectively performing counterfactual time-symmetric inference. By adjusting parameters so that both forward and backward reconstructions of these simulated episodes become more accurate and less costly under the generative model, the system improves its capacity to support rapid online smoothing and control when it returns to interaction with the environment.

Time-symmetric frameworks also reshape how errors and surprises are interpreted. A large mismatch between expected and observed outcomes need not be treated as a simple local anomaly; it may be taken as a signal that some portion of the inferred trajectory—possibly well before the surprise—requires revision. Learning rules that operate on whole paths will therefore sometimes respond to a late error by substantially changing early state estimates, reassigning causal responsibility for the eventual outcome. Over repeated experiences, this mechanism can produce qualitatively different adaptation patterns than forward-only approaches. For example, an agent might learn to avoid unstable configurations earlier in a movement sequence, even if instability is only ever directly sensed near the end, because smoothing consistently implicates those early configurations as the root causes of frequent failures.

At the algorithmic level, many of these ideas can be grounded in existing techniques from smoothing and inverse optimal control, but extended to allow for retrocausality in the epistemic sense. Variational methods can be used to jointly infer both the latent trajectories and the control policies that best explain observed behavior, in a way that is symmetric with respect to time reversal of the data. Learning then fine-tunes the generative and recognition models so that, regardless of where the temporal origin is placed, the inferred causes and chosen actions remain consistent with the same underlying structure. This invariance to direction of analysis is a hallmark of a fully time-symmetric learning system and provides a guiding principle for designing artificial agents that emulate the flexibility of biological cognition.

In prediction and control tasks with strict real-time constraints, full smoothing over arbitrarily long trajectories is infeasible. A practical compromise is to operate with sliding temporal windows and hierarchical timescales. Within a short window surrounding the present, the system performs near-symmetric inference, allowing recent past and near future to interact richly. At longer lags, it approximates the influence of remote events through slowly changing summary variables or higher-level latent states. Learning rules are adapted accordingly: parameters governing fast dynamics are updated based on local windows, while parameters governing slower, more abstract regularities are updated using statistics aggregated across many windows. This multi-scale arrangement lets the brain reap many of the benefits of time-symmetric learning without incurring prohibitive computational costs.

All of these considerations feed back into how one designs algorithms and architectures for artificial agents inspired by the bayesian brain. Instead of training models solely on next-step prediction or forward control, one can optimize them for bidirectional coherence: the same parameters must support accurate reconstruction of past states from future observations and accurate forecasting of future states from past ones, under a consistent causal model. Interventions used during training—such as clamping actions, altering reward schedules, or introducing structured perturbations to sensory input—are then evaluated not only by their effect on future performance but also by how they shape the agent’s retrospective explanations. Agents so trained are better equipped to answer counterfactual questions about their own behavior, to revise their sense of agency in light of new information, and to maintain stable world models even in environments where evidence arrives in temporally fragmented or delayed forms.

Recasting do calculus for time-symmetric brains

Neural architectures for time-symmetric inference

Bidirectional causality in perception and action

Graphical models for time-symmetric brain dynamics

Implications for learning, prediction, and control

Driving considerations for people with functional seizures

How technology is changing concussion care

Related Articles

Leave a Comment Cancel Reply

Queue