LLM predictions aren’t brain predictions

Written by

Qubic Scientific Team

Sep 2, 2025


When we say that something “predicts,” we really mean it anticipates what will come next. In AI environments we often read, once and again, the word prediction as a phenomenon through which AI is said to imitate one of the most important features of the human brain and its biological networks. Apparently both make predictions. However, although they fulfill the same mission and thus anticipate the immediate future, the similarity is more semantic and superficial than real.


Prediction in machines

Prediction in LLMs, such as ChatGPT, Grok, Gemini or DeepSeek, involves minimizing error through several mathematical functions. Large Language Models (LLMs) do not think or understand like a human being, nor do they have internal representations of knowledge, but they learn to predict the next word in a text sequence.

This learning is based on the similarity between the model’s output and the real word. The greater the similarity, the better; the greater the difference, the more discrepancy and error between what was expected and what actually happened. To reduce this error, a mathematical function called the cost function is used, which measures the difference between what a model predicts and what really occurs.

To do this, each word or token is represented as a numerical vector in a space of many dimensions, called an embedding, which encodes semantic and syntactic relationships learned by the network.

You may wonder why a word is not represented in a 2 or 3 dimensional space. Language is much more complex and requires simultaneously capturing thousands of nuances (phonological, grammatical, semantic, syntactic, pragmatic, prosodic, contextual, frequency, usage, closure probability). With many dimensions (thousands in LLMs), the model can distribute words better, so that semantic and grammatical relationships are represented with precision. Each dimension does not correspond to a specific “meaning” (it is not that there is a “plural/singular” or “positive/negative” or “noun/verb” dimension), but rather latent components learned automatically, which in combination capture some regularities.

A word (or more precisely a token) inside LLMs is represented as a numerical vector of X dimensions. For example, suppose the vector corresponding to the word “cat” in a hypothetical 3-dimensional model is [0.7, −1.2, 2.3] and the one for “dog” is [0.6, −1.0, 2.1]. Both vectors are close in space, reflecting that “cat” and “dog” occur in similar contexts in texts.

The vector then passes through multiple layers of self-attention and feed-forward networks. “Attention”, which is a feature of transformers, takes into account the information of each token/word by looking at the others in the sequence, as if it could analyze phrases instead of isolated words. After several layers, the embedding of “cat” no longer only contains information about “cat,” but also about the context (“The cat climbed up to the roof…”). Attention allows each word to “look at” all the others in the sentence to decide which are relevant. For example, in the sentence “The cat climbed onto the roof because it was scared,” the model needs to decide who was scared: the cat or the roof? Thanks to “attention”, the model correctly links “scared” with the cat. Once trained, the model generates text autoregressively by taking a context, calculating probabilities for the next word, choosing one, adding it to the text, and repeating the process word by word, as if completing an infinite puzzle.

Figure 1. World embeddings.

The final vector yields a probability, encoded between 0 and 1, of the expected word. In this case, it could be a probability distribution with values such as “dog” = 0.25, “cat” = 0.45, “rabbit” = 0.25, “bird” = 0.10. The cost function minimizes error through a mechanism that adjusts the weights and biases of the preceding layers. By minimizing the cost function, the model progressively learns to improve its predictions.

In training practice, the model receives thousands of examples of sentences like “Today is a very ___ day.” During learning, the model assigns probabilities to different options. It may estimate that “happy” has a 30% probability, “sad” 25% and “sunny” 10%. If the real answer in the corpus is “sunny,” the model has failed and its cost function penalizes it for not having assigned enough probability to that option. Backpropagation corrects the internal parameters (the “weights” of the network) to reduce that error in future occasions. This process is repeated millions of times until the model adjusts its parameters to reflect the statistical patterns of language.

Prediction is, as you can see, essentially the conversion of words into numerical vectors within a multidimensional space that gets adjusted through training.


Does this kind of prediction resemble that of the brain?

In neuroscience, the notion of prediction is based on predictive coding, Bayesian inference and Friston’s free-energy principle. The brain is not a passive information processor, but a hierarchical system that generates hypotheses about the state of the world and compares those hypotheses with incoming sensory information. These types of predictions occur mainly in the cerebral cortex. Higher-level areas send descending predictions (top–down) to lower-level areas. Lower areas return prediction error signals when stimuli don´t match the anticipated inputs.

For example, in vision, if we see a ball moving, higher-order areas (parietal and motor cortex) anticipate its trajectory. Primary visual areas (V1, V2) receive the sensory information. If the ball suddenly changes direction, prediction error appears, forcing the upper layers to update their information in order to predict the next movement again. In audition, when following a melody, the primary auditory cortex (A1) predicts the next note. If a dissonance appears, we are surprised, reflecting the error signal. If, for instance, you go to pick up a suitcase you pre-believe is empty, the premotor cortex sends signals to the motor cortex and from there to motor neurons to activate muscles with the necessary force and tension. If upon grasping the suitcase it turns out to be full, the brain readjusts predictions matching the actual sensory information.

The goal is always to reduce the discrepancy between what is expected and what is perceived. That´s why the brain functions fundamentally as a predictive organ, not a reactive one. We maintain an active model of the world, not a passive one.

This internal world model is refined continuously, on rapid timescales of milliseconds and also in prolonged learning processes in time. All cortical areas, from vision to motor control, operate under this same predictive scheme. The discrepancy between expected and actual input, the prediction error, propagates upward (bottom–up) and serves as a learning signal to adjust the internal model.

How does learning occur? Through synaptic plasticity, which acts as a tuning mechanism via long-term potentiation (LTP) or long-term depression (LTD), depending on activity.

The concept of a model of the world is not limited to sensory perception. Social cognition, attributing intentions, beliefs, and emotions to other people, also depends on predictive mechanisms. In regions such as the medial prefrontal cortex (mPFC), the superior temporal sulcus (STS) and the temporoparietal junction (TPJ), hypotheses about the mental states of others are generated and confronted with the signals observed in their behavior. Social prediction error, when another person’s actions do not match our expectations, forces us to readjust the mental model we have of their intentions. It happens every single day. We need to readjust our previous ideas, expectations and analysis.

Emotions also work this way. The anterior insular cortex, the anterior cingulate cortex and regions of the orbitofrontal cortex integrate visceral signals from the body sensations (interoception) with predictions we make based on context. Although we usually consider “emotion” as a universal preprogrammed reaction, it´s in fact the result of predictive inferences about the most likely bodily state in a given context. Emotion emerges as a hypothesis of the brain about “what the body should feel now” and is corrected through comparison with real visceral afferents.

At memory level, each experience is not archived as a fixed data. It's integrated into the internal model through synaptic plasticity that produces the necessary changes to refine future predictions. Remembering, in this sense, means  reactivating inferences of cerebral networks (the cortical-hippocampal network) that are updated with each new experience.

Ultimately, the model of the world in the brain is not only sensory, but also predicts the intentions of others and the bodily states that we label as emotions. Biological intelligence is a predictive machinery oriented to reducing uncertainty at all levels of experience.

The brain therefore doesn´t represent information with discrete numerical vectors like an LLM. There are no explicit embeddings of 300 or 1024 dimensions. Neurons do not store fixed lists of numbers, although in computational neuroscience the population coding of neurons can be modeled as activity vectors in high-dimensional spaces.

In LLMs, the embedding is static and fixed after training, whereas in the brain neuronal vectors are dynamic over time, variable and plastic. In an LLM the values are abstract parameters with no link to energy or biology, whereas in the brain the values correspond to real bioelectrical and biochemical processes, subject to noise, plasticity and modulation.

As you may infer, the possibility of general intelligence arising from an LLM is, at least from a neuroscientific perspective, a chimera, since prediction, even though it shares the name, reflects very different mechanisms and realities.

Now it’s Aigarth’s turn.

Jose Sánchez. Qubic Scientific advisor.

 

References

  • Barrett, L. F. (2017). The theory of constructed emotion: An active inference account of interoception and categorization. Social Cognitive and Affective Neuroscience, 12(11), 1833–1840. https://doi.org/10.1093/scan/nsw154

  • Hutchinson, J. B., & Barrett, L. F. (2019). The power of predictions: An emerging paradigm for psychological research. Current Directions in Psychological Science, 28(3), 281–289. https://doi.org/10.1177/0963721419831992

  • Friston, K. (2010). The free-energy principle: A unified brain theory? Nature Reviews Neuroscience, 11(2), 127–138. https://doi.org/10.1038/nrn2787

  • Rao, R. P. N., & Ballard, D. H. (1999). Predictive coding in the visual cortex: A functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience, 2(1), 79–87. https://doi.org/10.1038/4580

  • Knill, D. C., & Pouget, A. (2004). The Bayesian brain: The role of uncertainty in neural coding and computation. Trends in Neurosciences, 27(12), 712–719. https://doi.org/10.1016/j.tins.2004.10.007

  • Frith, C. D., & Frith, U. (2006). The neural basis of mentalizing. Neuron, 50(4), 531–534. https://doi.org/10.1016/j.neuron.2006.05.001

  • Keller, G. B., & Mrsic-Flogel, T. D. (2018). Predictive processing: A canonical cortical computation. Neuron, 100(2), 424–435. https://doi.org/10.1016/j.neuron.2018.10.003

  • Georgopoulos, A. P., Schwartz, A. B., & Kettner, R. E. (1986). Neuronal population coding of movement direction. Science, 233(4771), 1416–1419. https://doi.org/10.1126/science.3749885

  • Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neural probabilistic language model. Journal of Machine Learning Research, 3, 1137–1155.

  • LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. https://doi.org/10.1038/nature14539

  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30, 5998–6008.

Tay, Y., Dehghani, M., Bahri, D., & Metzler, D. (2023). Efficient Transformers: A survey. ACM Computing Surveys, 55(6), Article 109.https://doi.org/10.1145/3530811


Follow us on X @Qubic
Learn more at qubic.org
Subscribe to the AGI for Good Newsletter below.


Sign up for Qubic Scientific Team Newsletter Here:

© 2025 Qubic.

Qubic is a decentralized, open-source network for experimental technology. Nothing on this site should be construed as investment, legal, or financial advice. Qubic does not offer securities, and participation in the network may involve risks. Users are responsible for complying with local regulations. Please consult legal and financial professionals before engaging with the platform.

© 2025 Qubic.

Qubic is a decentralized, open-source network for experimental technology. Nothing on this site should be construed as investment, legal, or financial advice. Qubic does not offer securities, and participation in the network may involve risks. Users are responsible for complying with local regulations. Please consult legal and financial professionals before engaging with the platform.

© 2025 Qubic.

Qubic is a decentralized, open-source network for experimental technology. Nothing on this site should be construed as investment, legal, or financial advice. Qubic does not offer securities, and participation in the network may involve risks. Users are responsible for complying with local regulations. Please consult legal and financial professionals before engaging with the platform.