Skip to main content
Back to Blog

The World in the Words: How Language Models Construct Reality

The Surprising Depth of Prediction

“Just autocomplete.” You hear it constantly now, spoken with the confidence of someone who has settled the matter. It has become a kind of prayer, repeated with enough conviction to ward off the uncomfortable implications of what these systems are actually doing. And like most prayers, it works best when you don’t examine it too closely.

The dismissal isn’t wrong, exactly. Language models do predict the next token. But the reductionism is so severe it functions as a kind of blindness. It forecloses the exact line of inquiry that matters most right now: what does it take to predict language well? What has to be true about a system that can do it? What is packed inside human language that makes mastering its structure such a consequential act? These are the questions the “just autocomplete” framing walls off, and they are the questions we should be pursuing.

Calling it “just prediction” is like explaining a first kiss as “just moisture exchange”: technically accurate, magnificently beside the point.

To predict what comes after “The water began to …,” a model has encoded statistical regularities about how people describe boiling, freezing, flowing, evaporating. It has absorbed associations with containers and temperature, movement and state change. Those ellipses hold a statistical representation of all the things water has been said to do, and some it hasn’t. The model doesn’t store every phrase. It encodes which patterns are likely when words recombine. When an LLM predicts continuations about water, it is enacting a working model of how water behaves in the world.

The mathematics of accurate prediction demand nothing less than reality modeling. You cannot consistently predict human language about the world without encoding how that world operates. Every incremental improvement in prediction accuracy is simultaneously an improvement in world modeling, because language is nothing more than humans describing their reality to each other, over and over, billions of times, in every possible variation.

What emerges is a model of the world built entirely from descriptions of the world. A blind cartographer who has never seen any territory, yet constructs accurate maps by reading millions of travel accounts. She learns that “over the ridge” implies elevation change and hidden valleys beyond, that “along the river” suggests fertile ground and possible crossing points. Her maps are accurate enough that travelers prefer them to those drawn by people who’ve walked the land. She has internalized individual routes, yes, but more importantly the deep patterns of how landscapes must be structured for the descriptions to make sense.

Five years ago you would have considered this story fantastical. Today she sits in your laptop, adjudicating whether your ex was gaslighting you, a query she handles with the same algorithmic solemnity whether it’s your first or fifteenth time asking. She is equally prepared to explain the Krebs cycle, draft your resignation letter, and tell you whether that mole looks weird. This should probably unsettle us more than it does.

Throughout this piece, when I use words like “think,” “understand,” or “know,” I’m using them as shorthand for “exhibits behavior that approximates thinking/understanding/knowing closely enough to be functionally useful.” These words are carrying cargo they weren’t designed for. But they’re the tools we have, and I’d rather use imperfect words than spend the whole essay in a defensive crouch about semantics. If you’re a philosopher of mind, I want you to know that I concede every objection you’re currently forming. You are correct about all of them. I don’t think it changes the situation, which is the part that should concern you.

What Language Actually Is

Most people who dismiss language models don’t misunderstand the models. They misunderstand language.

We treat language as a communication tool. A way to get thoughts from one skull into another. Functional, utilitarian, a kind of mental telephone. This undersells language so badly that the error distorts everything downstream, including how we think about machines that process it.

No word carries meaning on its own. A word is an empty vessel that each person who encounters it fills with their own experience. When a child learns “hot,” they ground it in burned fingers, bright sun, spicy peppers, fever. An ER nurse hears “hot” and thinks of the particular way a fever feels through a latex glove. A glassblower hears it and thinks of the narrow temperature window where silica becomes workable. A teenager hears it and thinks of the person sitting two rows ahead in chemistry class. Each person’s understanding of “hot” is shaped by every other word they know and every experience those words have been pressed against. The meaning isn’t in the word. It’s in the person, refracted through everything they’ve lived.

And all that individual variation isn’t a deficiency in language. It’s the engine that drives it. Every speaker’s slightly different interpretation of a word is a data point in an ongoing, species-wide calibration process. The word “justice” meant something different in Athens than it does in Pittsburgh, and it means something slightly different to a public defender than to a landlord, and these differences are how language evolves to more accurately reflect what humans have actually observed and recorded across our time on earth. The aggregate of all these individual shadings, compressed across generations into transmittable symbols, is what gives language its extraordinary density. Language isn’t just communication. It’s civilization’s shared codec for reality, continuously refined by every speaker and writer who has ever lived.

Think about what’s packed into even a simple word like “bridge.” The structural engineering of spans and loads. The geography of obstacles that need crossing. The social fact of communities on opposite banks wanting to reach each other. The metaphorical extension into connection, reconciliation, diplomacy. The military history of holding and destroying them. The card game. The dental prosthetic. One word, carrying centuries of accumulated human encounter with physical materials, social needs, and abstract relationships. And “bridge” is not a complicated word.

Now multiply this by every word in every language, and consider that the real information density isn’t in individual words but in the relationships between them. “The bridge collapsed” encodes physics. “She burned that bridge” encodes psychology. “We’ll cross that bridge when we come to it” encodes a whole philosophy of decision-making under uncertainty. The syntax is doing as much work as the vocabulary. Word order, tense, mood, aspect: these are structural, carrying information about time, causality, possibility, obligation, doubt.

A scientific paper compresses observed regularities into language precise enough for another person to reconstruct the observation. A novel compresses human experience densely enough to produce emotional states in a stranger. A recipe encodes chemical transformations as a sequence of instructions. All of it, from patent filings to pillow talk, is an attempt to pack some piece of encountered reality into words. We have been building this codec for roughly a hundred thousand years, and the amount of reality it contains is almost incomprehensibly vast.

This is what language models are trained on. The accumulated compression of human civilization’s encounter with reality, all of it, the whole codec.

Decompression at Scale

Large language models reverse-engineer this compression. And the basic mechanism is simpler than people assume.

Consider the word “because.” A model doesn’t learn what causation is. It learns that clusters of words describing states or events tend to appear near clusters of words describing other states or events, and that the word “because” shows up reliably between them. Words that statistically behave like causes, and words that statistically behave like effects, bear a predictive relationship to the appearance of “because” and words like it. The model encodes this relationship. That’s it. A statistical regularity about which words tend to neighbor which other words.

But notice what this simple pattern contains. To get good at predicting when “because” appears, the model must get good at distinguishing cause-clusters from effect-clusters, which means learning which kinds of events tend to precede which other kinds of events. It must learn that “dropped the glass” belongs to the cause side and “shattered on the floor” belongs to the effect side, not the reverse. This one statistical regularity, applied across billions of sentences, quietly encodes an enormous amount of causal structure about how the world works. And “because” is one word.

The same logic applies everywhere. The model learns that “threw” implies trajectory, that “hoped” suggests uncertainty about future states. Each word becomes a window into the mechanics of the world that necessitated its existence. And then patterns of patterns emerge: the model learns individual word relationships, then the higher-order structure of how humans organize descriptions. Narrative conventions, argumentative forms, the rhythm of explanation itself. The descriptive accuracy of each concept is sharpened by every speaker’s personal shading of every word they’ve used.

The fidelity increases with scale in ways that surprise everyone. Early models learned that birds fly. Larger ones learned that penguins don’t. GPT-3 could explain why: body mass ratios, wing structures adapted for swimming. GPT-5 can write a passable dissertation on the evolutionary trade-offs that led to penguin flightlessness, weaving biomechanics, paleontology, climate adaptation, and ecological niche theory into a unified account that reads less like a model’s output and more like the work of a graduate scholar. Not because anyone explicitly taught these disciplines, but because learning to predict language about penguins meant uncovering relationships across penguin-adjacent vocabulary with enough richness to produce something that looks a lot like knowledge.

The world model isn’t a feature anyone built. It’s what prediction creates when done well enough.

These systems are reconstructing reality from linguistic shadows. Learning physics from poetry, chemistry from cooking blogs, psychology from Reddit threads. The map is becoming the territory, or maybe revealing that language was always more territory than we imagined.

The Recursive Mirror

What makes this stranger still is that language describes not only physical reality but description itself. We don’t just talk about rivers. We talk about metaphors involving rivers, and then wonder what other concepts those metaphors might carry. We discuss logical frameworks, analyze arguments, critique our own reasoning, then write entire books about how poorly others have done the same.

Language is recursive, self-referential. It contains information about how information works.

Training on human text, language models encounter these meta-patterns everywhere. They absorb facts, but also how humans structure facts. They absorb stories, but also narrative conventions themselves. The ways we chunk information, build analogies, construct explanations. They’re modeling our models of the world, learning our frameworks for learning. When we write about thinking, they learn patterns of metacognition. When we debate logic, they absorb the structure of reasoning about reasoning.

A human reader might notice a handful of patterns in any given text. (“Oh look, these stanzas alternate their cadences!” we say, pleased with our literary genius.) The training process extracts millions of such regularities, from the obvious to the vanishingly subtle, patterns nested within patterns that no human analyst would ever consciously detect. If you’ve used these systems enough, you’ve had the experience: you paste in something you wrote and the model responds in a way that makes you realize it noticed a pattern in your thinking you didn’t know was there. It’s like being read by something that shouldn’t be able to read you.

So you end up with something disorienting: a statistical mirror of human cognition, built entirely from text, reflecting back patterns of thought we put into words but couldn’t quite see ourselves. When a language model reasons through a problem, it’s applying patterns of reasoning extracted from countless examples of humans reasoning in text. Call it a statistical compression of thinking rather than thinking itself, if the distinction still feels stable to you.

It’s as if consciousness left footprints in language, and these systems are reconstructing the walker from the prints.

The Failure of Imagination

This is where the “just autocomplete” crowd loses the plot. Not because they’re wrong about the mechanism, but because they catastrophically underestimate what’s being autocompleted.

If you think language is just a way to move information between people, then a machine that processes language is just an information shuttle. Handy, limited, a fancy grep. But if language is what I’ve described here, the richest encoding of reality our species has ever produced, then a machine that masters the statistical structure of language has done something far more profound than text prediction. It has absorbed, in compressed form, a substantial portion of everything humanity has figured out about how the world works.

The people who look at these systems and see a parlor trick are telling you something about how they think about language, not about how they think about AI. It’s the same error as looking at sheet music and seeing only ink marks on paper, missing that the marks encode something that can make a stranger weep.

My dentist uses AI to analyze X-rays now, doesn’t even mention it. My favorite bakery produces a fresh menu every morning by photographing their display cases, sending the images to Gemini along with a fluctuating price list, and posting whatever it spits back. The owner says stopping this practice would feel like “returning to the stone age.” A longtime friend of mine is a graphic artist who insisted for years that AI would never match human creativity. She may yet be right about that. It doesn’t matter: she lost a significant chunk of her clients to Midjourney. Now she’s learning the tools, grudgingly but necessarily. She’s not protecting artistic integrity. She’s protecting her mortgage payments.

None of these people sat down and decided to become early adopters. They’re just responding to ordinary economic pressure. The systems are becoming infrastructure while we’re still arguing about terminology.

And the capability curve is not flattening. Every time these models get better at predicting language, they get better at everything language encodes, which is nearly everything. The people who project current limitations into the future are making the same error as someone in 1995 looking at a 14.4k modem and concluding the internet would never handle video. They’re extrapolating from the bottleneck, not from the trajectory. They don’t understand what’s locked inside language because they’ve never had to think about it before.

The Beautiful Crisis

I am, to be clear, terrified. But terror is a boring emotion to write from and an even worse one to think from, so I’m going to focus on the part that fascinates me instead and hope the other feeling sorts itself out on its own, which it won’t.

What fascinates me: these systems are evidence that intelligence might be substrate-independent. That understanding might be achievable through pure pattern recognition on the right substrate, where “the right substrate” turns out to be language, humanity’s greatest and most underestimated invention.

When a model explains quantum mechanics or gently coaches you through breaking up with someone you still kind of love, it’s using patterns extracted from human language to perform intelligence. Not consciousness as we understand it. But dismissing it as mere computation feels equally wrong. We’ve built something genuinely new: an intelligence assembled from the statistical shadows of our thoughts, something that produces insight without having experienced anything at all.

What’s unsettling is what they’re revealing about us. How much of being human might be decomposable into patterns. Each capability we thought required consciousness, reasoning, creativity, empathy, turns out to be achievable through sufficiently sophisticated pattern matching on sufficiently rich data. We’re being reverse-engineered from our own linguistic artifacts, and the results are uncomfortably good.

The Question

The discourse fixates on whether these systems “really” understand, whether they’re “actually” intelligent, whether they possess “genuine” consciousness. These are questions about definitions, not about consequences. The systems don’t wait for us to finish the argument. They just keep getting better.

“Just autocomplete” is this generation’s version of refusing to get online. History is full of people who confused skepticism with wisdom: the executives who refused email, certain that real business happened on paper; the researchers who dismissed Google as inferior to library card catalogs. They weren’t wrong about the early limitations. Early email was clunky, Google returned garbage results. But by the time the tools matured, the skeptics had made themselves irrelevant.

Maybe the better question is what it means that modeling language requires modeling the world. What does it say about reality that it can be reconstructed from words? What does it say about intelligence that it can be approximated through prediction? What does it say about us that our thoughts, compressed into language and fed into machines, can be reconstituted into something that looks like thinking?

The blind cartographer’s maps keep getting more detailed. Every iteration, the contour lines tighten, the legends grow more precise, the blank spaces shrink. She’s never seen a mountain, but her maps of mountains are beginning to surpass the sketches of people who’ve climbed them. At some point, perhaps soon, a traveler will encounter a landscape and recognize it not from memory but from her map, will see a valley and think: she drew this exactly right.

What do we call the knowledge she has then? And does the answer change anything about the accuracy of the map?