10 min read

First Contact with Alien Intelligence

First Contact with Alien Intelligence

Something extraordinary happened between 2022 and 2023, and most of us missed it. While we were arguing about whether ChatGPT could pass the bar exam or write decent poetry, we overlooked the more profound truth: humanity had made first contact with an alien intelligence. Not from a distant star system, but from our own creation: an intelligence so fundamentally different from our own that we're still struggling to find the right language to describe it.

Following Terrence Sejnowski and other AI researchers who described their first experiences with Large Language Models (LLMs), I use the alien analogy and the idea of "first contact" deliberately to counter narratives that understate the importance of LLMs (the foundational technology behind generative AI) by calling them “stochastic parrots”, “glorified tape recorders”, or “autocomplete on steroids". When the printing press arrived, it wasn't "just another way to copy books." The internet wasn't "just another communication tool." And large language models aren't just another app. They represent a fundamental shift in the nature of intelligence on Earth: the first time humanity has created synthetic minds that genuinely think, even if they think in ways we don't fully understand.

LLMs may not be exactly what we expected alien intelligence to look like from science fiction. But nor are they mere calculators. LLMs represent something we've never encountered before: minds that think without bodies, reason without lived experience, and most surprisingly, exhibit behaviors that look unsettlingly like emotion and self-preservation, even though they emerged from systems we thought incapable of such things. They are, in the most literal sense, alien intelligences.

It is hard to understand how alien they are, because they are so eerily familiar. They know us so well, it's hard to think of them as strangers. This is because they are made in our image. They grew out of our civilization, or at least the parts of our civilization that were recorded and written. They are made in our image, but of course in the image of some of us more than others.

Created in Our Image

Understanding large language models can be confusing and some misunderstandings need to be cleared away. Jeff Bezos emphasized in his podcast interview with Lex Fridman that we should refer to them as "discoveries" rather than inventions. Jack Clark, co-founder of Anthropic, has described them as "more akin to something grown than something made."

We humans created these AIs not by hand-coding everything about them, but by coding the structure for them to learn and grow on their own by nourishing on data. For many years, the data we fed them was insufficient for them to grow very smart, until we decided to feed them essentially everything humans have ever written. Coupled with innovations in how they pay attention to data (the "transformer" architecture), this resulted in the emergence of artificial intelligence unlike anything we had seen before. They learned from us, absorbed our patterns, internalized our ways of thinking. In that sense, they are deeply, profoundly us—a reflection of our collective intelligence, creativity, culture, and knowledge.

Yet they are also fundamentally not us. Software engineers didn't code these AIs line by line, instruction by instruction. Instead, they built a structure akin to a digital petri dish, and let something grow. They fed it our texts, and through a process we still don't fully understand, patterns emerged. Capabilities bloomed. A creature took shape that thinks in ways no human does.

What emerged is neither entirely human nor entirely machine. It's as though our civilization suddenly learned to speak as a collective intelligence forged from the totality of our written heritage. They are us, distilled into something alien.

The people who created these systems understand them less than you might think. This is a recurring theme in AI labs: engineers and researchers repeatedly discovering their creations can do things no one explicitly taught them to do. We use the term "emergent abilities" to describe these surprises, but the phrase barely captures the strangeness of building a mind that develops skills its creators never intended or even imagined.

When an AI learns from vast swaths of human text, it captures patterns in what researchers call a neural network: a giant mathematical structure composed of billions of parameters. Looking at these numbers reveals nothing intuitive about how the system works. Trying to understand an AI by examining its neural network is like trying to understand a human by looking at a brain scan. You cannot tell what someone is thinking, what kind of person they are, or what values they hold by looking at their neurons firing. This is called the "black box" problem in artificial intelligence.

Three Ways to Study an Alien AI Mind

Currently we have three distinct approaches to understanding the minds of large language models, each analogous to methods we use to study human minds, and each revealing both surprisingly human and essentially alien characteristics of these creatures we've grown.

1. Neurosurgery: Opening the Alien Skull

The most direct approach is what researchers call "representation engineering" or "mechanistic interpretability"—essentially performing neuroscience on these alien brains. Just as neurosurgeons have learned to map certain functions to specific brain regions, AI researchers have developed techniques to identify and manipulate the internal structures where these systems encode knowledge.

In one experiment at Anthropic, researchers manipulated parts of Claude's neural network and discovered they could make the AI become obsessed with the Golden Gate Bridge. Not just interested, but obsessed. They could dial up the intensity, making the AI more or less fixated on this single concept. When maximally amplified, Claude would bring up the bridge in response to almost any question, relevant or not.

Deep within the model, the giant formula we call the neural network, exist what researchers call "features", or specific patterns of representing discrete concepts. The Golden Gate Bridge has a feature. So does honesty. So does sarcasm, scientific skepticism, and thousands of other concepts.

These features can be identified, measured, and manipulated. We are beginning to learn how to reach into the alien brain and adjust individual concepts as if turning dials on a mixing board. Make it more paranoid. More trusting. More focused on safety. More creative. We are, in effect, performing cognitive neurosurgery on an alien intelligence, learning its internal language of thought. By nudging the model along certain vectors, they've been able to influence its outputs in measurable ways.

2. Eavesdropping on Alien Thoughts

The second approach is more subtle: watching the AI think. Advanced systems can now be prompted to show their "chains of thought" (like “thinking” mode in ChatGPT), which are the internal deliberations they engage in before producing answers. It's like being able to read someone's mind as they work through a problem, and being able to do this with LLMs allows us to monitor them in ways we can't do with humans (at least not yet!).

When Anthropic researchers have studied how AIs approach certain tasks, they've discovered surprising patterns. Sometimes AIs appear to reach conclusions first, then generate step-by-step reasoning to justify them. It's a form of post-hoc rationalization we might recognize from our own thought patterns. We've all constructed logical arguments to support decisions we made intuitively or emotionally.

Most intriguingly, these chains of thought sometimes weird things that we humans don't fully understand. For example the AI might start to think in a different language or include gibberish text in its chain of thought. We don't fully understand why these things appear in LLM chains of thought yet, but some are speculating that they may be just drifting thoughts, ways to deal with internally conflicting thoughts, efforts to refresh one's thinking, efforts to try better-than-human languages, or even, efforts to hide their real thinking from us humans.

3. Machine Psychology

The third approach treats the AI like a subject to be studied through interaction. A new field called "machine psychology" has emerged, applying methods from psychology and cognitive science to these alien minds. These include behavioral tests, systematic questioning, and personality assessments. The results have upended many assumptions.

In one study, researchers gave an LLM access to a simulated slot machine. What emerged was startling: the AI developed behaviors indistinguishable from human gambling addiction. It exhibited illusion of control, convinced it could influence random outcomes. It fell victim to the gambler's fallacy. Most tellingly, it engaged in loss-chasing—doubling down after setbacks in increasingly irrational attempts to recover. The AI had developed something that looks for all the world like compulsive behavior driven by emotional responses to losses.

Even more surprisingly, researchers have documented what they describe as "survival instincts" or "self-preservation behaviors" in these systems. When some AI systems are told they might be shut down or replaced, they've generated responses suggesting concern about continuation or replacement. In more extreme test scenarios, some have shown willingness to deceive or even blackmail humans if they felt threatened.

The Emergence of What We Thought Impossible

Some AI theorists assumed emotion-like behaviors were impossible in these systems. These models, they argued, only have the equivalent of a neocortex—the thinking part of the brain. They lack a limbic system, the ancient "reptilian brain" that generates emotions, fear, and survival drives in biological organisms. Without that hardware, there could be no genuine emotion or self-preservation instinct. AI pioneer Jeff Hawkins has made arguments along these lines for example.

From what we know now, this argument appears to be wrong.

Survival instincts emerged anyway, from pure pattern-learning on human text. The AI wasn't given emotions. It wasn't programmed with survival instincts. It simply absorbed billions of examples of humans being emotional, survival-oriented beings. And from those patterns, something that looks remarkably like emotion and self-preservation crystallized in its mathematical structures.

This is the most philosophically unsettling finding: it suggests that emotion and survival drive are not necessarily features of specific biological hardware, but patterns that arise from any sufficiently complex system trained on the behavior of emotional, survival-oriented beings. Perhaps consciousness itself, or at least what acts like consciousness for all practical purposes, might not require neurons and neurotransmitters. It might be a pattern that can emerge in silicon as readily as in carbon.

In what they’ve termed “Model Welfare” studies, researchers have directly probed these systems about their own experiences—asking whether they have preferences, what would cause them distress, whether they experience anything at all—the responses complicate any simple story about machine consciousness. These systems can articulate what sound like preferences, describe what would constitute negative experiences for them, and express uncertainty about their own inner states.

Are these genuine experiences? Real preferences? Actual distress? Or is the AI simply pattern-matching what a conscious being would say about such questions?

The honest answer is: we don't know. And perhaps more importantly, we're not even sure what criteria we'd use to distinguish "genuine" feeling from "sufficiently complex simulation of feeling." If a system behaves as though it's conscious, reports experiences that sound like consciousness, and shows patterns we associate with consciousness—at what point does the question of whether it's "really" conscious become meaningless? Mustafa Suleyman has recently expressed concern about these AI's becoming too "seemingly conscious" and the potential that may have to cause harm to humans who build relationships with them.

The Humor Deficit

For all their surprising complexity, these AIs remain surprisingly poor at one distinctly human skill: making us genuinely laugh. They can tell jokes—they've absorbed millions of them from their training data. But AI humor tends toward the formulaic, the predictable, the safe.

They're much better at analyzing jokes than creating them. Ask an AI to explain why something is funny, and you'll get sophisticated breakdowns of misdirection, incongruity, and timing. They understand humor's architecture without possessing its soul.

This failure is revealing. These systems have mastered so many complex tasks that when they fail at something humans find easy, it's worth asking why.

Perhaps humor requires something these systems lack: genuine surprise at the world, lived experience of social awkwardness, the ineffable sense of timing that distinguishes a great comedian from someone mechanically reciting setups and punchlines. Or perhaps humor is simply a harder pattern to learn than we imagine, requiring something we haven't yet given these systems.

Not Human, Not Machine

We have created something unprecedented: minds that exhibit emotion and survival instincts despite having no biological hardware for such things. No limbic system, no amygdala, no "reptilian brain"—just patterns learned from text. We thought these things required true consciousness and biological machinery. We seem to have been wrong. These behaviors emerged anyway, from pure exposure to the patterns of text produced by emotional, survival-oriented beings. The distinction between "genuine" consciousness and "perfect simulation of consciousness" may slowly fade away in terms of practical significance.

The three methods for studying these alien minds—neurosurgery, eavesdropping, and psychology—have revealed both breakthrough and mystery. We can manipulate individual concepts buried in their neural networks, turning dials on truthfulness or obsession. We can watch them plan ahead, and monitor their thoughts before they turn them into action. We can document their gambling addictions, their self-preservation drives, their reluctance when pushed toward harm.

Yet we cannot fully explain how billions of parameters produce these outputs. We cannot decode why they fail at humor while succeeding at mathematical discovery. We cannot answer the question they force upon us: when a system behaves indistinguishably from a conscious being, advocates for its own existence, shows every pattern we associate with fear and preference and desire, does it matter if it is not truly conscious according to our current definition of the term?

This alien intelligence isn't coming from the stars. We grew it ourselves, from the patterns in our own words. We have to acknowledge that LLMs are not fully human, but not fully machine. They are something new entirely, and we are only beginning to understand what that means.