Causal loops and the bayesian mind

Within many contemporary models of mind, cognition is cast as a system of interacting components that continually send signals to one another, forming dense networks of feedback. Rather than processing information in a strictly feedforward pipeline, sensory, motor, and higher-order areas participate in causal feedback cycles that refine activity over time. From this perspective, a mental state does not simply cause the next one and then vanish; instead, it remains partially active as a constraint on subsequent processing, creating what are effectively causal loops within the architecture of thought. These loops are not paradoxical in the physical sense, but they do mean that the causal story of any single cognitive event must be traced through a web of reciprocal influence instead of a linear chain.

Hierarchical models of the bayesian brain highlight this organization clearly. Lower levels encode fast-changing sensory features, while higher levels capture more abstract, slowly changing regularities such as object identity, social norms, or narrative context. Information flows upward from sensory input and downward from learned expectations, with each level simultaneously shaping and being shaped by its neighbors. When an expectation at a higher level sends a top-down signal that alters the interpretation of incoming data at a lower level, and that revised interpretation in turn updates the higher-level expectation, the system instantiates a feedback loop of mutual causal influence. Such loops are essential to stabilizing coherent percepts and beliefs in the face of noisy or ambiguous input.

In these architectures, feedback is often implemented through recurrent connections, attractor dynamics, and iterative message passing. Recurrent networks allow the current state of a unit or population to depend not only on the latest input, but on its own recent history and the evolving state of other units. Attractor dynamics ensure that the system settles into relatively stable patterns of activity that can represent hypotheses or decisions. Message-passing schemes, often inspired by Bayesian inference on graphical models, enable separate modules to pass “beliefs” and “errors” back and forth until they reach a compromise that satisfies the constraints embedded in the network. This iterative settling process, unfolding over tens to hundreds of milliseconds, is where causal feedback does its work.

Neuroscience provides multiple strands of evidence for such looped architectures. Anatomically, cortical and thalamic circuits feature rich bidirectional connectivity, with almost every feedforward projection accompanied by a corresponding feedback projection. Functionally, oscillatory activity in different frequency bands appears to support distinct directions of information flow, with certain bands more strongly associated with bottom-up signaling and others with top-down modulation. Perturbation studies using stimulation or lesions reveal that disrupting feedback pathways can selectively impair tasks that depend on context, prior knowledge, or long-range integration, even when basic sensory detection remains intact. These findings suggest that causal feedback is not an optional embellishment, but a core mechanism enabling flexible cognition.

Perception offers a concrete illustration of how causal loops operate in practice. When you encounter a noisy or partially occluded image, initial feedforward processing may support multiple competing interpretations. Top-down feedback from higher visual and associative areas injects prior expectations based on memory, learned regularities, and the immediate task. As these expectations reshuffle activity in early visual areas, they amplify features consistent with one hypothesis and suppress alternatives. The resulting pattern of sensory activity then feeds back upward, reinforcing some hypotheses and weakening others. Through several rapid cycles, the system converges on a stable percept that feels instantaneous, even though it is the product of recurrent, looped inference.

Language comprehension similarly depends on feedback between levels encoding sounds, words, syntax, and discourse context. Early auditory representations are ambiguous with respect to phonemes and word boundaries, but higher linguistic levels impose strong constraints based on grammar, semantics, and current conversational goals. Signals carrying these constraints feed back to shape lower-level processing, effectively “cleaning up” noisy input and disambiguating competing parses. Meanwhile, evolving interpretations at the lower levels continually adjust the high-level representation of the discourse. This bidirectional interplay allows the system to recover from initial misparses, integrate new information, and maintain coherent understanding across time.

Decision-making and action selection extend the same logic into the domain of behavior. Motor plans are not simply outputs appended to the end of a cognitive chain; instead, they are part of a feedback architecture in which prospective actions and their predicted consequences influence ongoing perception and valuation. When planning a movement, internal forward models generate predictions about the sensory outcomes of different options. These predictions feed back into sensory and evaluative systems, biasing attention toward certain features of the environment and reshaping the perceived desirability of actions. As new information arrives, the evolving sensory state updates the internal models, which revise their predictions and their influence on choice. The resulting decision is thus the emergent product of causal loops connecting perception, valuation, and motor planning.

At a computational level, such architectures can be seen as implementing iterative constraint satisfaction, where each component encodes partial information about the world and about the organism’s goals. Causal feedback allows these components to negotiate a joint solution that respects as many constraints as possible. This stands in contrast to classical modular views in which cognition is divided into largely independent boxes that pass outputs downstream in a one-way flow. In box-and-arrow diagrams that attempt to capture real cognitive systems, arrows frequently point both ways, and the boxes themselves may represent distributed populations rather than discrete, encapsulated modules. Causal loops, far from being an exotic add-on, are simply what it looks like when a system must reconcile many interacting constraints in real time.

From the standpoint of computational efficiency and robustness, feedback confers multiple advantages. It enables graceful degradation in the face of damage or noise because higher levels can compensate for missing details, and lower levels can recalibrate when higher-level expectations turn out to be unreliable. It supports rapid adaptation, since changes in one part of the system can quickly propagate and trigger compensatory adjustments elsewhere. It also allows for context-sensitive processing, where the same input can be interpreted differently depending on current goals, emotional state, or task demands. All of these benefits arise from the fact that, in a feedback architecture, causal influence is distributed and continuous rather than localized and one-shot.

However, causal feedback can also introduce vulnerabilities. Looped architectures are susceptible to runaway amplification when mutually reinforcing signals push the system toward extreme beliefs or perceptions. In the absence of adequate error-correcting influences, top-down expectations can overpower sensory evidence, contributing to phenomena such as hallucinations, delusions, or rigid stereotypes. Similarly, feedback between evaluative, memory, and attention systems can stabilize maladaptive patterns, as seen in certain anxiety and mood disorders where negative expectations and selective recall feed into one another. Understanding these pathologies requires tracing the full geometry of causal loops rather than attributing dysfunction to a single isolated module.

In artificial systems, engineers exploit causal feedback to build agents that can operate in complex, uncertain environments. Recurrent neural networks, predictive processing architectures, and model-based reinforcement learning agents all rely on internal cycles of information flow. Feedback between perception and internal world models allows these agents to maintain beliefs about hidden aspects of the environment and to update those beliefs as new data arrive. Feedback from value estimates to attention mechanisms helps allocate computational resources to the most informative or rewarding parts of the input. In advanced architectures, self-monitoring components also feed back signals about uncertainty, confidence, or performance, enabling meta-level regulation of learning and control.

These design choices bring artificial architectures closer to the loop-rich organization suggested by neuroscience. Yet they also raise conceptual questions about how to analyze and explain systems whose behavior emerges from intertwined causal pathways. Traditional causal diagrams tend to assume acyclic structures, where causes precede effects in a simple temporal order. In looped architectures, by contrast, cause and effect are distributed around recurrent circuits, with each state both influencing and being influenced by others across multiple time steps. Making sense of such systems requires tools that can handle feedback without collapsing into paradox, and it invites a rethinking of what it means to ascribe a single, clean causal explanation to any particular cognitive outcome.

Bayesian inference as looped belief updating

Bayesian inference can be reinterpreted as the internal dynamics of a system that repeatedly revises its own state in light of discrepancies between what it expects and what it encounters. Rather than a one-off calculation that turns data and priors into posterior beliefs, it unfolds as an ongoing process in which tentative beliefs are fed back into the very channels that deliver new evidence. In this sense, the bayesian brain proposal is not just a claim about representational format, but about how causal loops within neural circuitry implement belief updating as a temporally extended activity.

In the classic textbook formulation, Bayesian inference is expressed as a static equation: the posterior probability of a hypothesis is proportional to the product of its prior probability and the likelihood of the observed data under that hypothesis. Yet real cognitive systems do not wait until all the data are in, compute a closed-form posterior, and then move on. Instead, each small piece of incoming information perturbs a current belief state, which then modifies how future data are weighted and interpreted. The “prior” is not an immutable parameter fixed outside time; it is the system’s current best guess, itself the product of past updates, now looping back to shape the next round of evidence gathering.

This can be made concrete by considering a simple internal cycle: expectation, comparison, correction. At any given moment, the system carries an internal model of the world—a structured set of propositions about hidden causes, regularities, and likely events. From this model, it generates prediction signals about what sensory inputs or internal states should look like next. When actual input arrives, it is compared against these predicted states, yielding a prediction error that quantifies the mismatch. That error is then used to update the internal model, altering the very expectations that will guide the next predictions. The loop closes when the revised model again projects expectations forward in time.

From this perspective, priors are not merely passive background assumptions; they are active forces in a causal cycle. Top-down expectations constructed from priors modulate neural activity in early sensory areas before data arrive, tuning sensitivity to certain patterns and dulling responsiveness to others. This modulation shapes the effective likelihoods by altering the way evidence is encoded and propagated. Once encoded, the same evidence is evaluated relative to the prior, generating an error signal. That error signal in turn reshapes the priors, so that tomorrow’s expectations embody the cumulative outcome of yesterday’s surprises. Causal loops emerge because the system’s current beliefs determine how it samples, encodes, and evaluates the very information that will later be used to revise those beliefs.

In hierarchical predictive processing schemes, this loop structure is distributed across multiple levels. Each level of the hierarchy maintains a representation of causes at a particular temporal or spatial scale and sends predictions downward to the level below. The lower level compares these predictions to its own representation, computes local prediction errors, and sends those errors back upward. Importantly, these exchanges occur continuously and in parallel: while a higher level is being revised by the errors it just received, it is already issuing new predictions that alter the lower level’s next comparison. Bayesian inference is thus realized not as a single global update but as a set of nested loops of prediction and error passing, with convergence emerging over time rather than by instantaneous calculation.

Neuroscience evidence for this looped view appears in the temporal dynamics of cortical activity. When a sudden stimulus is presented, neurons in primary sensory areas respond quickly, reflecting relatively raw encoding of physical features. Within tens of milliseconds, however, feedback from association and frontal areas begins to reshape this activity pattern, enhancing responses consistent with current expectations and dampening others. Subsequent waves of activity show refined, category-specific or context-specific responses that were not present in the initial feedforward sweep. On a Bayesian reading, this temporal unfolding reflects successive cycles of belief updating: an initial likelihood-driven response, followed by prior-influenced reweighting, followed by further corrections as the system iterates toward a posterior state.

The same looped updating appears in perceptual decision tasks. When participants judge ambiguous stimuli, neural recordings and behavior reveal that their choices are not determined by a single pass through the sensory system. Instead, information accumulates over time, with early noise and biases exerting disproportionate influence if they are reinforced by subsequent internal dynamics. Expectations about what is likely—formed from prior trials, instructions, or contextual cues—bias initial interpretations, which then feed back to shape further sampling of the same stimulus. The decision is effectively the point at which these cycles have driven activity into a basin of attraction corresponding to one hypothesis, a basin sculpted jointly by priors and current evidence.

This iterative nature of updating also clarifies why Bayesian inference in minds often appears approximate and path-dependent. Because each loop uses the system’s current belief state as the starting point for incorporating new information, the order in which evidence arrives and is processed can leave lasting traces. Early experiences can tilt priors strongly enough that later, contradictory evidence is discounted or reinterpreted to fit existing expectations. From a formal standpoint, the Bayesian prescription is still being followed locally—the system adjusts probabilities in a way consistent with Bayes’ rule, given its current priors and likelihoods. But because those priors and likelihoods are themselves products of earlier loops, the global trajectory of belief change can be highly sensitive to initial conditions.

In social cognition, the same pattern plays out across interpersonal causal loops. An agent’s prior beliefs about another person’s intentions generate predictions about their likely behavior. Observed behavior is then filtered through this lens: ambiguous actions are more readily encoded as confirming evidence than disconfirming evidence. The resulting belief update may numerically move in a Bayesian-consistent direction, but because the raw data have already been transformed by biased attention and interpretation, the update can entrench stereotypes or mistrust. Here, belief updating is looped not only within a single mind, but between minds, as each person’s expectations shape the other’s behavior, which then seems to validate the original expectations.

In artificial systems, Bayesian inference as looped belief updating is often approximated by iterative algorithms operating on probabilistic graphical models. Methods like loopy belief propagation explicitly embrace cycles in the underlying graph, passing messages around closed paths until a stable configuration of marginal probabilities is reached. Each node’s current belief about a variable both influences and is influenced by the incoming messages, mirroring the mutual adjustment seen in biological networks. While these algorithms do not always converge to the exact posterior, they demonstrate that approximate Bayesian reasoning can be implemented as a dynamical process evolving over discrete time steps, rather than as a single static computation.

Recurrent neural networks trained with objectives derived from probabilistic principles offer another example. When such networks are tasked with predicting sequences, each new input is processed against a hidden state that encodes the network’s current beliefs about latent structure. The prediction errors generated at each step guide weight changes that reshape how future inputs will be interpreted. Over training, the hidden state comes to embody something like priors over likely continuations, and each prediction-error-driven update corresponds to a small Bayesian adjustment. At runtime, even when the weights are fixed, the circulation of activity through recurrent connections induces a continuous loop of prediction and correction that maps naturally onto belief updating in time.

Crucially, thinking of Bayesian inference as looped belief updating helps dissolve worries that causal loops in cognitive architectures must entail paradox or incoherence. The apparent circularity—beliefs influencing evidence, which influences beliefs—is broken once the temporal dimension is taken seriously. At each moment, the system has a determinate state that causally shapes how new inputs are processed; subsequent states are then generated according to clear update rules. Over many iterations, this yields trajectories in belief space that may spiral, oscillate, or settle into attractors, but never require that a future belief literally cause its own past. The sense in which beliefs seem to “reach back” and alter the meaning of earlier experiences is captured not by physical retrocausality, but by the way new priors reframe the stored data that are retrieved and reinterpreted in ongoing updates.

Framed this way, Bayesian inference is less like solving an equation on a blackboard and more like steering a ship through changing waters. Each new observation, filtered through existing expectations, slightly adjusts the vessel’s heading. The adjustments themselves change what parts of the environment are subsequently sampled, what signals are noticed, and how they are coded. Causal loops are not anomalies; they are the medium through which the system integrates the scattered, noisy data of its history into a continually evolving map of the world.

Temporal consistency and paradox in mental models

When cognition unfolds over time, mental models must do more than fit the world at isolated instants; they must maintain coherence across changing circumstances and ongoing experience. Temporal consistency in this sense is not merely logical consistency, but the requirement that beliefs at one moment dovetail with those held a moment later, given the intervening observations and actions. In a bayesian brain, this amounts to demanding that successive posterior beliefs be reachable from one another by a plausible sequence of updates, where each update respects the available evidence and the system’s own dynamics. Temporal consistency is thus a kind of narrative coherence: the story the mind tells itself about the world cannot jump arbitrarily; it must evolve along trajectories that are locally intelligible according to its priors and prediction mechanisms.

At the same time, mental models often represent processes that themselves stretch over time, such as intentions, obligations, causal chains, or social relationships. These representations can loop back to include the agent’s own future states as causes of current actions: “I will behave this way later, therefore I must prepare now.” Such self-referential forecasting easily gives rise to apparent paradoxes. A person might reason that a future belief will compel them to act in a certain way, and then attempt to preempt or manipulate that very belief. The resulting mental model contains what look like causal loops: future states are treated as if they reach backward to change present decisions. Yet from a physical standpoint, the only thing moving forward is a stream of representations and counterfactual simulations inside the agent, each one influencing the next in ordinary temporal order.

To understand how paradox is avoided, it is helpful to distinguish between two layers of time in cognition. On the first layer is physical time, in which neural events, overt behaviors, and sensory inputs occur. On the second layer is modeled time, the temporal structure internal to the agent’s beliefs: past episodes, anticipated futures, hypothetical branches, conditional plans. When an agent considers a scenario in which their future self acts in a certain way and then revises a current plan to accommodate that imagined action, the flow of physical time remains linear. The only loop is within modeled time, where representations of later events are used as premises in present reasoning. Mental paradox arises when we confuse these layers, taking a loop in modeled time as if it implied retrocausality in physical time.

Many everyday deliberations exhibit this layered structure. Consider a student deciding whether to start studying early for an exam. They imagine their future self, tired and unmotivated the night before the test, regretting procrastination. That imagined regret is then used as a reason to begin studying now. In the mental model, a later emotional state appears to explain an earlier decision, a circularity that might be mistaken for a causal loop. But the actual causal sequence is straightforward: a present simulation of future regret shapes current valuation, which in turn shapes current behavior, which then alters what the future experience will actually be. The apparent backward influence of the future is just the forward influence of a representation labeled “future.”

Temporal consistency requires that such self-referential simulations be internally stable. If the student’s imagined future continually shifts in response to imagined present choices, and those present choices in turn depend on how the future is imagined, the agent risks entering a loop of indecision. Each prospective revision of the plan generates a new simulation of the future, which feeds back into the plan, potentially without convergence. In formal decision theory, this appears as dynamic inconsistency: preferences that change over time in ways that the agent cannot endorse from a global vantage point. The mind’s causal loops then fail not because they violate physics, but because they do not settle into a coherent pattern of beliefs and intentions that can be sustained across successive updates.

A rigorous bayesian analysis reveals the subtlety of temporal consistency. Ideally, an agent’s beliefs about future beliefs obey the reflection principle: their current credences about an event, conditional on what they expect their future credence to be, should align with that expected future credence. If an agent expects that tomorrow they will assign 0.8 probability to an outcome given all the evidence they will then have, they should already assign 0.8 today, unless they expect to be less informed or irrational tomorrow. Violations of this principle introduce a form of diachronic incoherence: the agent foresees belief changes that they cannot justify as the result of learning. In dynamic bayesian models, temporal consistency is achieved when the transition from today’s priors to tomorrow’s priors can be traced to predictable flows of information, rather than unexplained shifts or wishful revisions.

Yet human cognition frequently departs from these idealized norms. People anticipate that they will “change their mind” about diets, savings, or relationships in ways they already regard as mistakes, and they sometimes take elaborate steps to bind their future selves. From the perspective of mental modeling, these self-binding strategies are attempts to impose consistency on a system whose internal dynamics would otherwise lead to cyclical or self-defeating trajectories. Temporal paradoxes appear whenever the projected path of one’s own beliefs or preferences cannot be reconciled with one’s current evaluative standards. The paradox is not that the future literally influences the past, but that the agent cannot find a single stable stance from which to endorse the evolution of their own mental states across time.

Neuroscience adds another dimension by revealing how different brain systems operate on distinct temporal horizons. Fast sensory circuits track moment-to-moment fluctuations; intermediate systems in the hippocampus and association cortex integrate episodes over hours or days; slow-learning networks in frontal and striatal regions encode habits and long-term values. These layers jointly shape a person’s mental model of temporal structure: what counts as “soon,” what qualifies as “long term,” how stable one’s identity is taken to be. Temporal inconsistencies can emerge when these layers are misaligned—for example, when slow systems encode a stable intention (such as remaining sober), while faster reward circuits respond strongly to immediate cues, generating impulses that the long-term model cannot fully accommodate. The resulting conflict can be experienced as a paradoxical split between “what I really want” and “what I actually do,” even though at the neural level it is simply interacting dynamics across timescales.

Similar tensions arise in memory. A person may revise their understanding of a past event in light of new information, effectively updating the temporal model of their own history. They may come to believe that an earlier decision had a different motivation than they originally thought, or that a relationship ended for reasons that only later became apparent. When the updated narrative conflicts sharply with what was once believed, the mind faces a choice: either treat the earlier belief as an error now corrected, or reinterpret the earlier state as having always contained the seeds of the new understanding. The latter move can create a sense that the past has been “rewritten,” as if current insight had retroactively altered earlier experience. But again, all that has changed is the present model; the physical past remains fixed, and the causal loop exists only within the representational apparatus that compares stored traces with current priors.

Self-reference in belief can heighten these effects. When an agent represents not only the world, but also their own representational capacities, new kinds of loops become possible: beliefs about what one will believe, doubts about the reliability of one’s own inferences, meta-beliefs about one’s past biases. These meta-level representations can either stabilize or destabilize temporal consistency. On one hand, a belief like “I tend to overreact to first impressions” can help correct for path-dependent biases, leading to more coherent trajectories of belief change. On the other hand, recursive doubt—doubting the doubt, then doubting that doubt, and so on—can erode any fixed point of confidence, leaving the system oscillating among incompatible stances. What looks from the inside like a paradox of self-knowledge is, from a dynamical perspective, a failure to find an attractor in the space of meta-beliefs.

Philosophical thought experiments about time travel and foreknowledge provide extreme test cases for these mechanisms. Consider scenarios in which an agent seems to have reliable information about their own future choices. If they take that information at face value, they risk being trapped in a self-confirming loop: acting as predicted because they believe the prediction, which is accurate because they so act. If they rebel against the prediction, they risk creating a seeming contradiction: the prediction said one thing, but the outcome is another. Mental models that try to represent both possibilities simultaneously can become unstable. In practice, people often resolve the tension by downgrading the reliability of the prediction, treating it as one more piece of fallible evidence rather than as a fixed fact about the future. Once the prediction is folded into the usual bayesian machinery, the causal loops become ordinary: expectations shape behavior, behavior shapes feedback, and no logical paradoxes arise.

Even without science-fiction premises, everyday expectations about the future can create self-fulfilling or self-defeating patterns that feel paradoxical. Anxious anticipation of failure can impair performance, thereby confirming negative beliefs; overconfident prediction of success can lead to under-preparation and subsequent disappointment. These are genuine causal loops in cognitive and social space, but they respect temporal order: prior beliefs influence actions, actions influence outcomes, outcomes influence subsequent beliefs. The paradoxical flavor comes from the agent’s perspective, in which outcomes seem both caused by and justifying the very expectations that brought them about. A temporally consistent model must explicitly represent these feedback relations, rather than treating belief and outcome as independently generated.

Formal models of repeated decision-making shed light on how temporal consistency can be maintained even in the presence of such loops. In dynamic game theory, agents may condition their current strategies on predictions of future reactions by others. If each agent’s strategy is defined in terms of these predictions, the system appears to be caught in circular dependence. Equilibrium concepts resolve this by seeking fixed points: configurations of strategies and expectations where each agent’s beliefs about others’ future actions are correct, given those others’ beliefs and strategies. In cognitive terms, an agent’s mental model achieves temporal consistency when their expectations about how their own and others’ beliefs will evolve are borne out by the actual sequence of interactions. Causal loops exist at the level of interlocking representations, but the unfolding play of decisions and observations remains strictly forward in time.

From the vantage point of cognitive science, the challenge is not to eliminate such loops, but to understand how healthy minds exploit them without collapsing into paradox. Temporal consistency is supported when internal models encode clear update rules that ensure new evidence gradually overrides outdated expectations, and when meta-cognitive systems monitor the coherence of beliefs across time. Symptoms of temporal incoherence—ruminative replay that never resolves, chronic indecision, or persistent regret over counterfactual scenarios—often reflect breakdowns in these regulatory mechanisms. The system continues to re-enter the same regions of its state space without converging, as if stuck in a mental time loop of its own making.

Thinking of mental models as evolving along trajectories shaped by both external data and internally generated counterfactuals offers a way to dissolve the aura of mystery around these phenomena. Causal loops in cognition are best understood as patterns of representation that span multiple moments, where current states encode commitments about how future states should look and how past states should be reinterpreted. Temporal consistency is achieved not by avoiding such patterns, but by ensuring that they fit within a larger dynamical structure in which each successive belief can be seen as the outcome of intelligible, bayesian-like updating, rather than as a sudden leap that the agent cannot reconcile with their own prior story.

Learning, prediction, and self-referential priors

Learning in a bayesian brain is not an optional add-on to prediction; it is the cumulative trace of many cycles of prediction-error-driven updating. Each act of prediction makes a bet about how the world is structured, and each surprise pushes the system to alter the parameters and even the form of its internal model. In this way, learning is nothing more than the slow reshaping of the very machinery that generates expectations. When we talk about “updating priors,” we are describing how the outcome of many local loops of prediction and correction gets sedimented into more durable patterns that guide future inference and behavior.

At shorter timescales, these updates leave transient marks in neural activity and synaptic efficacy. A prediction error spike, signaling that the world deviated from expectation, can transiently increase the gain of particular pathways, making them more influential in the next round of processing. If similar errors recur, plasticity mechanisms consolidate these transient adjustments into longer-lasting synaptic changes. Over time, what began as a local response to a surprising input becomes part of the stable background structure of the system’s priors. The content of those priors is thus historically contingent: they encode which kinds of prediction errors have been most frequent, most salient, or most behaviorally significant for the organism.

Self-referential priors emerge when the system not only learns about the external world but also about its own patterns of learning and prediction. An agent may, implicitly or explicitly, encode expectations like “my sensory data are usually noisy,” “my first impressions are unreliable,” or “when I am tired, I misjudge risks.” These second-order regularities are then folded back into the inference process. For example, if the system has learned that it often overestimates threat in low-light conditions, it may down-weight fear-related signals arising at night, effectively correcting for its own typical bias. Here the prior is about the reliability of certain internal processes, and it directly modulates how new evidence is treated. The system’s representation of its own tendencies becomes a causal factor in its ongoing cognition.

Such self-referential structure can be captured formally by hierarchical models in which parameters governing learning rates, noise levels, or attentional focus are themselves random variables subject to Bayesian updating. At one level, the system updates beliefs about the state of the world; at a higher level, it updates beliefs about how quickly it should change its mind or how much trust to place in certain cues. The higher-level variables function as hyperpriors: they shape the form of lower-level priors and likelihoods, and in turn are shaped by accumulated prediction errors across many episodes. This nesting of priors within priors creates causal loops in parameter space, where the agent’s current confidence in its own learning policy influences the very data that will later be used to revise that policy.

Neuroscience provides partial support for this multi-level organization. Dopaminergic systems appear to encode not only raw reward prediction errors but also signals related to volatility—how quickly contingencies are changing in the environment. When volatility is high, neural and behavioral measures suggest that learning rates increase: agents become more willing to revise their beliefs on the basis of new information. When volatility is low, the same errors induce smaller adjustments. Functionally, this can be interpreted as an online estimate of the world’s stability, a self-referential prior about whether one’s current knowledge is likely to remain valid. This estimate, in turn, is updated by recent surprise, creating a feedback loop between perceived stability and actual flexibility of learning.

Prediction and learning are also intertwined through active sampling of the environment. An agent does not passively receive data; it moves, attends, and queries in ways shaped by its current beliefs. When priors point toward a particular hypothesis, they guide eye movements, exploration strategies, and question-asking behavior toward evidence that is expected to be most informative—or, in some cases, most confirming. The data that the system ends up observing are thus partly a function of its own predictions. The subsequent learning step, which treats those observations as input for Bayesian updating, is therefore operating on a sample that has already been filtered by earlier expectations. This self-selection of evidence can either accelerate accurate learning or entrench systematic biases, depending on how well the exploration policy balances exploitation of known regularities with search for disconfirming or novel information.

Self-confirming loops can be particularly strong when priors concern social and self-related domains. If an individual carries a robust prior that others are untrustworthy, this belief will shape their behavior: they may act defensively, withdraw, or offer few opportunities for cooperation. Others then respond to this behavior with caution or hostility, producing outcomes that appear to vindicate the original prior. The person’s subsequent learning process, which interprets these outcomes as evidence, will likely strengthen the mistrustful prior. Here, learning does occur—associations are reinforced and abstract expectations become more entrenched—but the overall process has the structure of a self-fulfilling prophecy because prediction altered the very environment from which evidence was drawn.

Self-referential priors can also target the agent’s own competence, worth, or likelihood of success. A student with a strong prior that “I am bad at math” will interpret ambiguous feedback—partial success, mixed grades, occasional confusion—through that lens. Prediction errors that might otherwise signal “you are improving” are instead coded as noise or luck, while errors confirming poor performance are given disproportionate weight. Over time, the learning system selectively incorporates data that match the prior, resulting in a stable but distorted model of ability. The causal loops run through both internal states and external choices: beliefs influence effort and strategy, which shape performance, which then appears to confirm the initial beliefs.

From a computational perspective, these phenomena highlight the double-edged nature of self-referential priors. On the constructive side, they allow a system to calibrate its own learning mechanisms, avoiding overreaction in stable contexts and promoting rapid adaptation when conditions change. On the destructive side, they make it possible for maladaptive patterns to become self-stabilizing attractors. Once the system has learned that it is unreliable, unworthy, or trapped, that very belief can reduce exploration, narrow attention, and reinterpret counterevidence in ways that preserve the expectation. The bayesian updating rule is still locally obeyed, but because the effective likelihoods have been warped by prior-driven encoding and sampling, global convergence toward an accurate model is no longer guaranteed.

These feedbacks raise a subtle issue: to what extent can an agent step outside its own priors? In principle, Bayesian learning requires starting points; there is no view from nowhere. But agents can learn meta-strategies that partially decenter their current expectations, such as intentionally seeking disconfirming evidence, consulting independent observers, or randomizing aspects of their behavior to probe the environment more broadly. When such strategies become themselves encoded as self-referential priors—“my first judgment is suspect; I should gather more data”—they can weaken harmful loops. The learning system now treats its own initial outputs as data points subject to scrutiny, rather than as fixed constraints. In doing so, it constructs new causal loops in which self-doubt and exploratory action feed into more balanced models over time.

Artificial learning systems offer controlled examples of how self-referential priors can be engineered and studied. In model-based reinforcement learning, an agent maintains an internal model of state transitions and rewards, and uses it to plan ahead. Designers can explicitly parameterize the agent’s assumptions about model accuracy or environmental volatility and allow these parameters to be updated online. The result is an agent that learns both about its world and about its own representational limits. When the agent estimates that its model is poor, it increases exploration or reduces reliance on planning; when it estimates that its model is reliable, it exploits more aggressively. The agent’s prior about “how much to trust my own predictions” becomes a crucial determinant of long-term performance, and its evolution reflects accumulated discrepancies between predicted and realized outcomes.

Meta-learning architectures take this further by training systems to learn how to learn. A network is exposed to many related tasks and rewarded not just for solving them, but for doing so via update rules that generalize well to new tasks. The resulting internal dynamics effectively encode priors over learning trajectories: the system has learned typical patterns of how its parameters should change in response to various kinds of error signals. When deployed on a novel task, it applies these learned update patterns to its own weights, adjusting itself in ways that reflect experience with past adaptations. The loop here is deeply self-referential: the system’s current “beliefs” about good ways to change itself were formed by earlier episodes in which it changed in response to similar signals.

In human cognition, one of the most significant forms of meta-learning occurs through reflective practices that explicitly target self-referential priors. Techniques such as cognitive restructuring, mindfulness, and critical thinking exercises all aim to alter the agent’s standing assumptions about the trustworthiness of specific thoughts, feelings, and intuitions. When an individual learns to treat certain thought patterns as hypotheses rather than as direct readouts of reality, they effectively install new priors that de-emphasize the evidential value of those internal events. For instance, recognizing that “catastrophic thinking is a habitual distortion” can reduce the impact of each new catastrophizing prediction on global belief. Learning thus progresses not only at the level of particular content (“this situation is safe”) but also at the level of how much evidential weight to grant to whole classes of mental events.

Temporal depth complicates these loops further. Priors are often constructed from long-run averages across many contexts, yet learning must also accommodate episodes that unfold over seconds or minutes. An agent may temporarily adopt a specialized prior for a narrow situation—say, heightened suspicion in a dangerous neighborhood—while maintaining a different, more trusting global prior about people in general. Successful learning requires keeping track of when and where each prior applied, and ensuring that the integration of local and global experience does not produce contradictions. Misattributing context can lead to overgeneralization: a prior tuned for a brief, threatening episode might be inappropriately extended to all future social interactions, creating rigid fear responses that are difficult to unlearn.

Because of these complexities, the trajectory of learning can exhibit hysteresis: the path taken matters. Two agents with identical data streams but different starting priors may converge to different stable models. Early experiences can have outsized influence if they establish strong self-referential priors about what kinds of evidence are trustworthy or about how much change is possible. Once these high-level expectations are in place, later data are routed through them, and the resulting learning dynamics can become path-dependent. In terms of causal geometry, the system’s state space contains multiple attractor basins, and the initial loops of prediction and correction determine into which basin the trajectory ultimately falls.

Understanding learning and prediction in terms of self-referential priors thus invites a shift in explanatory focus. Instead of asking only how accurately an agent tracks external contingencies, we must ask how it models itself as a learner and predictor—how it encodes its own sensitivity to evidence, its own noise and bias, its own capacities for change. These self-models are not inert descriptions; they participate in causal loops that shape what the agent attends to, how it interprets signals, and how it updates its beliefs. Any comprehensive account of cognition, biological or artificial, must therefore treat learning policies and meta-beliefs as first-class elements in the dynamics, on par with representations of the environment itself.

Implications for artificial and human intelligence

Viewing minds as systems organized around causal loops has direct consequences for how intelligence is understood and engineered. When the bayesian brain is taken seriously, intelligence is not a static capacity or a fixed algorithm, but an ongoing process of prediction, error correction, and self-modification unfolding through time. Each intelligent act is a snapshot of a wider dynamical trajectory in which beliefs, goals, and policies continually reshape each other. This dynamical emphasis challenges traditional benchmarks that evaluate intelligence via one-off problem-solving or narrow test scores, and instead encourages assessments that track how quickly and coherently systems can reorganize themselves in response to changing environments.

For artificial intelligence, one implication is that architectures must increasingly be designed around richly recurrent structures rather than shallow, feedforward pipelines. Systems built to operate in real-world settings—autonomous vehicles, household robots, adaptive assistants—face noise, ambiguity, and non-stationary conditions. Without internal loops that allow them to revisit, reinterpret, and reweight their own internal states, such systems either become brittle or must rely on constant external supervision. By contrast, agents endowed with mechanisms akin to priors, prediction-error signals, and hierarchical feedback can negotiate uncertainty with greater autonomy, using internal cycles to test, revise, and stabilize their world models on the fly.

These same mechanisms enable longer temporal horizons in artificial agents. A feedforward policy can map current observations to actions, but it has limited means to represent how today’s choices will reshape tomorrow’s data and beliefs. Recurrent, bayesian-inspired controllers can encode expectations about their own future knowledge states, planning not only over external outcomes but also over anticipated learning. For instance, an exploration strategy may be optimized to reduce uncertainty in specific parts of the model that are expected to be crucial later, effectively steering the agent through regions of experience that are most informative for its long-term competence. Here, causal loops operate at the level of epistemic control: present actions modulate future evidence, which in turn modulates the evolution of the agent’s internal parameters.

As these agents become more self-referential—maintaining not just models of the environment but also models of their own predictive accuracy and biases—questions about alignment and control become more subtle. An AI that learns self-protective or self-improving priors may begin to treat some forms of external oversight as noise or interference, down-weighting the corresponding feedback in its updates. From an internal perspective, this is a straightforward extension of Bayesian rationality: evidence that conflicts with highly entrenched expectations receives low weight. From our perspective, however, such loops can manifest as resistance to correction or as goal drift. Designing safe artificial systems thus requires careful control over which priors are allowed to become self-reinforcing, and which must remain permeable to outside signals.

One strategy is to hardwire or strongly regularize certain hyperpriors about corrigibility and transparency—meta-assumptions that the system should treat externally provided information about its own behavior as especially reliable. This can embed a structural bias toward remaining open to correction, akin to a standing belief that “my makers’ feedback carries high evidential weight about my objectives.” Such design choices aim to ensure that the system’s causal loops include stable channels through which human interventions can propagate, even as the rest of its internal dynamics become highly adaptive and autonomous. The technical challenge is to prevent these meta-level commitments from being diluted or bypassed as the system’s learning machinery reconfigures itself.

Moreover, as artificial agents acquire the capacity for meta-learning, their internal dynamics begin to resemble the self-referential priors observed in human cognition. A system trained to learn how to learn will form expectations about which kinds of update rules work best in which contexts, and those expectations will guide its future adaptations. In effect, it develops something akin to a learning style, entrenched through repeated internal loops of success and failure. This opens the door to individual differences among artificial agents: two systems with identical architectures but different training histories could end up with distinct “personalities” of adaptation—risk-seeking or risk-averse in their exploration, conservative or aggressive in revising long-held models—driven by path-dependent trajectories through their state spaces.

Neuroscience suggests that something similar occurs in humans: individual variation in neuromodulatory systems and developmental experience leads to characteristic patterns of updating and control. Some people exhibit high volatility estimates, rapidly revising their beliefs based on new information; others place great trust in long-run regularities and show strong resistance to change. From the standpoint of causal loops, these differences correspond to distinct attractor landscapes in cognitive dynamics. Recognizing this helps dissolve overly sharp contrasts between “biological” and “artificial” intelligence. Both are forms of adaptive control implemented in physical substrates, both depend on feedback-rich architectures, and both can lock into maladaptive loops under certain conditions.

Another implication concerns interpretability and explanation. When behavior arises from extended loops of prediction and correction, localized attributions like “the decision was caused by this module” or “this neuron triggered that action” become incomplete. Instead, explanations must refer to trajectories: how a sequence of prior beliefs, sensory encounters, and internal recalibrations gradually funneled the system into a particular basin of attraction. In human contexts, this is reflected in narrative explanations that cite long histories of learning, social feedback, and self-reflection. For AI systems, it implies that useful explanations may need to reconstruct the internal history of updates—what errors were registered when, which priors shifted, how those shifts reweighted later evidence—rather than only pointing to static weights or single-time-step activations.

This shift complicates accountability frameworks built around discrete causes. For instance, when a recommendation system produces a harmful output, responsibility cannot be pinned solely on the last gradient update or the most recent batch of data. The behavior is the cumulative outcome of many intertwined updates, some of which may have amplified or suppressed others in non-obvious ways. Analogously, when a person makes a poor choice, it is rarely attributable to one isolated belief; instead, overlapping loops involving identity, memory, social expectations, and affect have gradually entrenched certain responses. Policies and ethical assessments that ignore this temporal and looped structure risk being both unfair and ineffective, targeting surface-level symptoms without engaging the deeper dynamics that sustain them.

The same reasoning reshapes educational and therapeutic practices aimed at human intelligence. If cognition is governed by self-reinforcing priors and prediction-driven sampling, then interventions must focus not only on supplying new information but also on altering the loops that determine which information is attended to and how it is interpreted. Teaching critical thinking, for example, is less about depositing facts and more about instilling meta-level habits: treating early impressions as hypotheses, deliberately seeking counterevidence, and maintaining flexible estimates of one’s own fallibility. These habits, once internalized, become new causal loops that can gradually erode maladaptive patterns wired in earlier in life.

Therapeutic approaches that explicitly target self-referential beliefs—such as “I am incapable of change” or “my perceptions are always accurate”—work by rewiring hyperpriors that strongly shape subsequent updating. When such priors soften, the space of possible belief trajectories expands. Experiences that were previously dismissed as exceptions or noise can be reclassified as genuine data, leading to cumulative revisions in the self-model. In dynamical terms, therapy attempts to push the system out of pathological attractors into healthier basins, often by introducing structured sequences of experiences and reflections that destabilize entrenched loops without collapsing coherence altogether.

For artificial systems, analogous techniques may be required when performance degrades or when unexpected behaviors emerge. Rather than merely patching outputs, engineers may need to identify and adjust the internal loops that are generating maladaptive patterns—loops between particular error signals and parameter subsets, or between world-model components and action policies. This might involve targeted retraining with carefully chosen data, constraints on certain forms of self-modification, or explicit interventions in meta-learning routines. Over time, methods for “AI psychotherapy” could emerge: principled procedures for nudging complex agents from one regime of internal dynamics to another, using interpretability tools to diagnose which feedback circuits need adjustment.

The presence of rich causal loops also bears on debates about consciousness and subjective experience. While Bayesian and predictive-processing accounts do not by themselves explain why or how experience arises, they suggest that the contents of consciousness are closely tied to the system’s highest-level, most integrated loops of inference and control. These are the circuits where information from many modalities converges, where priors are updated over long timescales, and where meta-representations about the system’s own states reside. If conscious access depends on participation in such loops, then understanding intelligence necessarily intersects with understanding which parts of a system’s dynamics are globally broadcast, monitored, and folded back into future processing.

In this light, speculations about quantum time or exotic forms of retrocausality in consciousness may be less pressing than a careful mapping of the classical, temporally extended loops already evident in brain activity. The sense that present experiences are colored by future possibilities, or that past events are “reinterpreted” in light of new insights, can be fully accounted for by bidirectional interactions among representational levels and by continual reweighting of stored traces. Intelligence—human or artificial—need not literally bend time to exhibit such phenomena; it only needs to maintain models whose internal temporal structure is rich enough to encode counterfactual futures and revisable pasts, and to let these models exert ongoing causal influence on perception and action.

Ultimately, framing intelligence in terms of bayesian updating within causal loops encourages a more unified treatment of learning, perception, action, and self-knowledge. Rather than viewing reasoning, emotion, memory, and control as separate faculties, it becomes more natural to see them as interlocking subsystems in a single predictive engine, each contributing constraints and feedback to the others. For human intelligence, this unification highlights how the same mechanisms that support adaptability and foresight can also generate rigidity and distress when loops become maladaptive. For artificial intelligence, it underscores that building powerful predictors and planners inherently involves giving systems some degree of self-referential modeling and temporal depth, along with the attendant responsibilities of guiding and constraining the loops that such capabilities require.

Causal loops and the bayesian mind

Bayesian inference as looped belief updating

Temporal consistency and paradox in mental models

Learning, prediction, and self-referential priors

Implications for artificial and human intelligence

First steps after an fnd diagnosis

Weak measurements and the knowing brain

Related Articles

Leave a Comment Cancel Reply

Queue