The Hippocampus of AGI

Are LLMs AGI or just a small piece of a bigger system?

1,299 words • 7 min read • Written a year ago

The Imperial March

We’ve built the hippocampus of AGI. The LLM. It’s 2024 and the state of the art is quite amazing and will undoubtedly continue to revolutionise many knowledge-based industries. LLMs will force humans to rethink our relationship with tokens. How many professionals lose more than 20% of their time simply parsing tokens? Doctors analysing medical records, lawyers interpreting complex legislation, and software developers glaring at poorly written documentation. Since its inception, the internet’s net token count has increased exponentially and shows no signs of slowing down. Net token verbosity might share characteristics similar to the total entropy in the universe. Verbosity when weaponised against you serves as an excellent smoke screen. Have you ever read Instagram’s privacy policy? Probably not, nor will you. Ponder for a moment on how Meta’s lawyers (or perhaps Llama 3) inject thousands of words into those documents to obfuscate the number of ways they’re decimating our privacy in the quest for a stronger performance in Q3. Somewhere in that document lies the sentence “you are our product,” just written with different tokens.

Okay, so LLMs are very powerful and great. But does ChatGPT passing the Turing test warrant the imperial march towards AGI that’s currently happening in Silicon Valley? Probably, but the variance in the predictions is quite large. AGI is not close.

I’ve learned to take predictions from optimists like Altman and Musk with a boulder of salt. Their timelines are compressed significantly (with AGI estimates of 2025 and 2026 respectively). These optimistic expectations are very useful when trying to cultivate a sense of maniacal urgency; an essential ingredient to magnetising the world’s best engineers toward hard problems early on. However in reality, the timeline for the general consumer getting their hands on AGI lie on totally different timescale. It’s hard to predict it exactly, but a safe approach is to take the average of both extremes: [2025 (Altman) + 2035 (Hassabis)]*0.5 = 2030.

The same applies to fusion. For those tirelessly working toward infinite clean energy, the promise is always just 5 years away. However, to consumers, commercial fusion won’t become a reality until $Q> 1$ reactors can be miniaturised and mass-produced at scale. That sounds like more of a 10+ year project. This could be quicker if we had quantum computers to unlock a new frontier of materials science, but that’s a topic for another post.

So how can we call the omnipotent being, the great o1, AGI when it’s imprisoned on Apple Silicon? How can The AGI help pack your groceries when it has no concept of touch? The definition of Artificial General Intelligence is very murky and not well standardised but for the sake of completeness, I hope all experts are including successful physical emulation of human activity in the real world. Otherwise what good would The AGI be?

My prediction is that whatever form AGI takes, it will incorporate some type of LLM architecture as a substructure to a broader superstructure. LLMs have incredibly fast information retrieval capabilities and, combined with their deep attention, could serve as excellent memory banks. The AGI’s reasoning cortex can generate higher-order plans whilst communicating to the hippocampus for memory retrieval to better understand the context of the environment. It’s unclear if next-token prediction is the optimal architecture for planning and higher order thinking. When you peak into the reasoning tokens of o1 as if you’re looking into the brain of a day 1 intern… it seems pretty dumb.

Scale Maxing

It will most likely pay off to be a scale maximalist. The $100 billion Microsoft data centre intended for OpenAI will produce a neural net of unimaginable scale that could result in emergence. If GPT-4 has 1.7 trillion parameters and the average human brain has 100 trillion) (oversimplifying a synapse to be equivalent to a parameter) then even the state of the art LLMs are underparametrised by a factor of almost 100. That’s if Elon doesn’t get there first with the Colossus cluster. It’s already an amazing feat to make 100,000 NVIDIA Hoppers coherent so imagine the results with 10x that.

If scaling laws hold, only a few companies namely Google, OpenAI, and xAI could capitalise on 1 million interconnected GPUs. I’m tempted to keep OpenAI on that list since they’re constrained by their relationship with Microsoft whereas Google owns a physical money printing machine and Elon has infinite resources and a bone to pick. However, the scaling laws won’t matter if we run out of data.

The New Fossil Fuel

A lot of prominent figures in tech (Ilya and Alex Wang) have warned that we will eventually hit the public data wall probably by 2032. Exabyte-scale data is a plenty but it is not infinite. On top of that, I think it’s unlikely that the trillions of internet tokens contain the latent representations necessary to discover a smooth solution to the Navier-Stokes equation and advance cancer drug discovery.

At worst, we end up producing a machine that perfectly emulates human behaviour and at best the 10 GW data centres of the future produce an LLM with emergent capabilities; this we don’t know for sure. It’s still just a huge bet.

If we stick to our current data-inefficient but compute-efficient paradigm (no reason to deviate yet) then will have to come up with a way to traverse past the public data wall. There is a reason I’m calling it the public data wall. We live in the Zettabyte era ( $10^{21}$ bytes) so it’s hard to imagine that our AI models are chewing up a significant portion of that exponent term. But our private data reserves are equally impressive. JP Morgan alone holds 450 petabytes in the chest locked away from all non-staff. There’s an argument to be made that private data could hold more value in unlocking cancer-curing AGI over scraping all of Wikipedia and Reddit. Who knows, maybe a clunky blog written by a conspiracy theorist from his aunt’s basement could hold the answer to unlocking the Riemann Hypothesis; but probably not.

Even the slightest chance of emergence is worth the investment. Many astute individuals claim it is self-evident that LLMs cannot supersede human-level intelligence since a transformer-encoder-decoder array of matrix-multiplying machines cannot reason. Yes, ChatGPT cannot reason about entirely new scenarios not included in its training data. But what is the reason? Do we understand how humans reason? LLMs are already showing inklings of reasoning by learning overarching structures such as language syntax and using that to translate low-resource languages bidirectionally.

So maybe what’s needed is for JP Morgan, Meta and Pfizer to sign licensing deals with AI labs so we can barrage through the data wall without having to turn to lame and uncreative compromises like synthetic data generation. Seriously, whoever thinks it’s a good idea to let these hallucinogenic models train themselves hasn’t listened to a musician endlessly practice a song on a poorly tuned instrument. Without access to higher-quality input, they’ll only reinforce the flaws, no matter how much they practice. There might be an exception with easy-to-verify training sets like NP problems and computer programs. Certainly not natural language.

And the Superstructure?

As amazing as LLMs are (and will continue to be), the race to build the pre-frontal cortex is more vital.

If AGI can be a network of different models that have been designed to communicate with each other, over extremely low latencies to transmit and receive signals, each model can specialise in certain tasks, similar to how natural selection has architected our human brains. The occipital lobe specialises in vision, the parietal lobe in spatial awareness etc. The billion-dollar question is: How will different models built by different tech companies communicate with each other to produce a stronger super system? Could this be the TCP/IP moment of AI?