Volume III Reader

Because the Next Big Technology Shift Is Happening

Immersive edition

Synthetic Brains & Steam Trains

Because the Next Big Technology Shift Is Happening

Because the next big technology shift is happening, so it is worth knowing what is going on.

17 sections0 saved bookmarksVolume III
Volume III cover

Part I: Historical Patterns

Introduction

This book goes into some detail on topics related to the current revolution, which I’ll call “Synthetic Brains.” That said, it’s worthwhile to work backwards. How did we get here? I hear various experts refer to railroads or fiber optics in passing as analogies for data centers. It’s thin. My writing is concise but it’s not thin; it will give you the intellectual weapons to wrestle with today’s technological revolution. Look at you! Fifty pages and you’ll be the life of the party: Mentioning the bribery of the Parliament MPs in the railroad boom of the late 1800s, or the botched execution by electricity promoted by Thomas Edison. These first pages will instantly put you in the 95th percentile. And then if you read the rest? Well then you’ll be thrilled! Whether it’s useful or not, you’ll have some opinions. And those opinions may be extremely important as this next revolution gets going. So let us get into. Before writing more it’s worth discussing the printing press, and a little known fact about its creator, Gutenberg: he went broke.

Gutenberg goes broke! Let me tell you a few things about Johann Gutenberg’s bankruptcy, because it matters perhaps even more than you might think.

Gutenberg solved an incredibly complex problem. He figured out how to produce 300 copies of a book for roughly the cost of one book. That’s not a small improvement. That is a restructuring of the economics of knowledge, because the written word was knowledge for those who could read. And so what does he do? He prints Bibles. However, he has 300 Bibles sitting in a room in Mainz. Then he discovers that he has an abundance, an overcapacity, or perhaps you would call it a distribution problem, because maybe there is demand, but the problem at the time that he confronted was that nobody had ever made so many Bibles, and it turns out that in the landlocked German town where he lived, there were only a handful of people who were legally permitted to read the Bible. This is the 1450s. Perhaps only the priests can read the Bible. Literacy is rare anyway. And so what does he do? He sells seven of them, perhaps a few more. But generally speaking, the historical record suggests that the large, large majority of Bibles had absolutely no path to market. Gutenberg’s creation leads to his bankruptcy. Then around the bankers, a creditor named Johann Fust seizes the press. What enjoyment he must have had to have access to the creation that allows this circulation of ideas. Fust decides he’ll go into the printing business himself, since now he owns the equipment, so why not? Well, he goes bankrupt, too. The overhead is crushing him. It’s a high-fixed-cost business. You have capacity, but that capacity has to be utilized. He spent an enormous amount of money on production, and the things he produced couldn’t move, because the inventory are books, and there are only so many literate people in any given radius, and he sold all he could. Then come Gutenberg’s apprentices, scatter across Europe and build their own presses. Maybe they can make it work. No, they cannot. They go bankrupt. Build a printing press and go bankrupt. The same story over and over for decades.

Remarkably, great innovation seems to begin with the bankruptcy of the innovators. It took 40 years before a printer anywhere in Europe could reliably make any money. It was about 40 years. I think about that number when people wonder why things like artificial intelligence haven’t shown up yet in productivity statistics, or why the gains from the internet took so long to materialize, or why the Liverpool to Manchester railway opened in 1830, proved that it worked, paid a dividend for eight consecutive years, and still, it took another 15 years before Britain actually had its railway mania. Edison flipped the switch on Pearl Street in 1882, demonstrating electrical light right in front of J.P. Morgan himself, and it worked. The trains worked. The lights worked. It didn’t reach American farms for 50 more years. The point is perhaps too obvious now. The space between capability and large dissemination is not some kind of anomaly. It is a pattern. It’s always the pattern. Let us return to the story of the printing press. Someone goes to Venice. Venice is the airport hub of the Mediterranean. That’s really the only way to understand what Venice was in today’s terms. If you were sailing from anywhere to anywhere, you went through Venice. You changed boats in Venice. You would trade in Venice. Every route in the known world, really, and that is the Western Hemisphere, would typically have some reason to be in Venice, and so naturally, you had a lot of distribution opportunities. Ship captains from 30 cities were in ports at any given time, loading cargo to countless destinations. Well, finally, we have a distribution mechanism. Finally, we have a way to use this capacity of knowledge. So what happens? You print 300 Bibles. You give 10 Bibles to each of the 30 ship captains that are going to 30 different

cities. They sell those Bibles. You get paid, and you have the first economically sustainable model for the printed word. In circulation, it’s not a bookstore. It’s not a publisher. It’s the hub system. It’s geographic, in this case, networks that could get the product to where it’s needed, which only came decades after it was available. In this case, the unused capacity was unshared knowledge. Fiber optics, it’s unused dark fiber. Railways, it’s unused track between two locations that have no economic rationale to have tracks between them. The list goes on and on. The reason I bring this up is to notice, in the pages ahead, some historical parallels that people often miss because we live our whole life in the moment. The printing press was not spectacular, neither was really any of the major engines of technology. It seeps in gradually, and because capacity is never enough, and the excitement is so overzealous, overcapacity persistently meets low demand or high skepticism, and that continues until finally an equilibrium is reached. Our vocabulary, as we discuss these things, shifts. We don’t even talk about adoption anymore. People talk about dispersion, but it’s a slow dispersion. The gradual spread of a major technological innovation is slow relative to life expectancy and certainly relative to investment expectations. That was printing, that was electricity, that was the internet. The most consequential technologies are often the most abstract: a blank piece of paper, an entirely new way of industrializing an economy through electricity. It takes a substantial time for the entire world to understand these creations and build around them. Often it becomes enough time that it’s arguably a demographic question. There’s another thing that happens at the end of this process, which is that the technology disappears into the background,

as I’ll discuss, it vanishes from perception as soon as it becomes commonplace. I was looking recently at a newspaper from 1929, and the best part of old newspapers, for those who enjoy buying the occasional old newspaper, is without question the advertisements. You have a suit for a dollar, shoes for 25 cents. And there, in the middle of it all, 1929 newspaper, there is a radio for $70. Everything else had prices you’d expect once you adjust for a century of inflation, and yet the radio was priced like a luxury item, like a special marvel. Because really at the time it was, and at the beginning it is expensive. Any technological change is expensive. And strange and visible as it becomes, the more expensive it becomes, and then it iterates, and technology deflates in value, gets cheaper and cheaper as it gets embedded. And then one morning you wake up and you just turn on the lights and notice that you can buy a radio on Amazon for $5. You see, in our minds it’s not special anymore, it’s not technology anymore, it’s just something that evaporated into the background. And that’s what success looks like at the end for technology. It’s when it disappears. Finally, it may be worth saying now that we’re on another one of these technological phases, is that the press itself didn’t produce a single revolution. It produced a sequence of revolutions, and this is just a philosophical argument, or what some might say navel-gazing, but I think it’s worthy of saying. You get the Bible in the 1450s, and that’s another wave, and then in the 15-teens you suddenly have pamphlets. It’s like, oh, Netflix on the internet. Pamphlets are the Netflix of the printing press. Pamphlets are cheap, fast to produce, small enough to travel. And then what happens? You have news. It’s almost like software as a service on the internet. You can now understand what’s happening in another country. You can get an argument from Wittenberg to London in 17 days. That is an entirely different kind of revolution that’s hard for us to conceive of.

This led to all sorts of things from the Reformation, when Martin Luther can circulate pamphlets faster than the Roman Catholic Church could respond to them. Things change because of these new capabilities once they’re adopted. When it used to take months to move a document across Europe, organized resistance was almost impossible to sustain. And so naturally, if it only takes 17 days, there could be political and military ramifications. And yet the underlying technology never changed. It was just the penetration of the technology. It’s not easy to see these things because, again, we live in the moment. We live relative to our own life expectancy, making history completely counterintuitive. Western Union looked at the telephone and called it, in their internal memo, too limited to be seriously considered as a practical form of communication. They weren’t stupid. They were deeply embedded in a world shaped by the telegraph. For them, the telephone would be like explaining color to someone who is blind. Harry Warner of the great movie studios was presented with a new idea. Now with sound, we could have movies where actors would talk. When he first heard this, he said in complete shock, who the hell wants to hear actors talk? Steve Ballmer gave the iPhone no chance of market share. Again, are we criticizing these people? No. Those aren’t failures of intelligence or insight. It’s just being in a position where one is blinded to these possibilities. If you spend decades benefiting from a certain worldview, the concept outside of that worldview is simply invisible or hard to contemplate or perhaps upsetting. I’ve enjoyed reading some old articles in which an early user of the motor car was yelled at in the streets, a very common phrase, get yourself a horse. Anyway, to finish Gutenberg, I’m starting to see history as pure repetition of similar concepts or perhaps a continuous

concept. We had the computer, the personal computer, the internet, cell phone, social media. It really is exciting to go deep into these individual revolutions since they of course have their qualitative distinctions in terms of what they do. But if one were a thousand years old, perhaps it wouldn’t be quite so remarkable since we’ve lived through many of these paradigm shifts. So are they really new revolutions or just the gradual continuance of one major revolution? All the wars and revolutions and breakthroughs that use different technologies all seem so different and yet the mechanism is always the same. The communication technology reaches a new level of distribution or speed or whatever the case might be and suddenly coordination, time passage, all these things that seemed impossible to change, change. Whether it’s Luther and his pamphlets, whether it’s organizing a protest on social media or whether it’s watching a movie at home, same stories, different centuries, different names, same dynamics. The printing press kept hitting Europe with successive shocks, iterations of different things you could do with it continued for 150 more years. Every decade or two there was something new, some new use of this. The newspaper really was the ultimate in a way and that lasted for hundreds of years. And now that’s the structure of what we’re living through at the moment. Another revolution or continuance of the previous ones? One ongoing information revolution, the computer revolution, call it what you will. It always feels so sudden and almost discontinuous even though it’s just downstream of the same conceptual thing. And the pace is real but not quite as fast as the ideas. To get real intuition we have to look at our life expectancy and we have to study history. This structure is very familiar as you’ll see in the pages to follow.

Part I: Historical Patterns

History Repeats

Whenever we look back at periods of sweeping technological change, the pattern is almost always the same. A transformative capability appears, capital surges ahead of understanding, speculation outruns prudence, infrastructure is built at a scale few can fully grasp, and fortunes are made and lost in equal measure. What remains, long after the exuberance fades, is the physical and conceptual foundation upon which the next era stands. The present moment in AI fits that pattern with uncanny precision. Before running through railway mania, electricity wars and the like, it’s worth remembering this: technology is deflationary. What that means is it gets cheaper and cheaper over time. I was once looking at a newspaper from 1929 and there were various advertisements: suits selling for a dollar, things that were obviously much cheaper given the fact that the dollar has depreciated significantly over the past hundred years. What caught my eye was an advertisement for a radio costing 70 dollars. I see $4 for a fur coat, $2 for a suit, shoes for $0.25, and a radio for 70 dollars? I hope you see my point: when you're at the beginning of any technological change, it's expensive. But as it iterates and the technology improves, it becomes very, very cheap. That's why pricing for technology is deflationary. I remember being a 9 year old in Montreal when my parents took out a second mortgage to buy me a computer, that at the time cost $10,000. A remarkable idea to me now, and a reminder

that it is deflationary, and that my parents the true examples of success. The other absolute in my mind is that once something is no longer considered technology, it vanishes. Pood! It is precisely when it is widely adopted and it somehow falls out of this “texhnology” catwgroy. When I'm turning on and off the lights, is it technology? Well, if I was at J.P. Morgan's house where Thomas Edison was demonstrating the light bulb, it would be a marvel. And yet once it gets deeply embedded in society, it becomes invisible. In writing what followed, I decided I’d rather it not deflate in value. It was helpful for me as an exercise, so I thought maybe it’s useful for someone else. With that in mind, I avoid writing about anything too specific. The lack of citations and the the way I write is how I wrote it for me. I don’t need to remember exact dates, just the main ideas. This is also why I wrote this at all. I found it difficult in my own work to find a very simple, comprehensive, readable discussion of some of the major evolutions And reading these things, you'll see that the patterns repeat. History doesn't rhyme, history repeats. Different names, different things, but the patterns remain. So let's get into it.

The Evolution Of The Data Center In the 1960s and 1970s, it was the beginning of the mainframe era and really the start of centralized computing. If you really think about it, the very simple idea of going to an office or requesting information — such as somebody's health records, performance reviews, job records, or sales figures — meant individuals going into effectively massive caves to search through paper and locate the key information in the most efficient way possible. This was inefficient, and the database was a big help.

The modern data center really came from the mainframe computer rooms of the 1960s, a time when companies like IBM (which is still around), UNIVAC, and the Controlled Data Corporation installed massive computer systems that required dedicated environments and controls. These early facilities were climate-controlled rooms with raised floors to manage the cabling. They were not particularly efficient, but they were plenty expensive. A $1 to $5 million mainframe would normally require about $100,000 of power just to maintain. But details aside, mainframes were the beginning of having centers of information — which is really the concept behind a data center: a center where the data is, where the information is. Swashbuckling through the eras, let's go right to the 1980s and 1990s, where — to use the least cumbersome term — client servers and distributed computing started. What does that mean? It's when you could have computers at home. The 1980s brought essentially small computers and client-server architectures, and suddenly computers were no longer the stuff found in the back of a massive company, but were actually local. You had the DEC VAX systems, UNIX servers, and eventually the x86. These are the terms of art that somebody perhaps in their 70s or 80s will recall with a great deal of nostalgia. These were still centralized computers where independent systems replaced the single mainframe. To repeat: 1960s and 1970s — mainframe, one location. Going into the 1980s, even putting the home computer aside, you would have several separate systems distributed all around. That created a kind of sprawl of computers, and so the economics shifted quite a bit. Instead of buying one massive, expensive mainframe in the 1970s for $5 million, an organization could deploy ten or more individual servers for a lot less. Then we get to the 1990s, and this boom explodes. You have what were known as telecom hotels and carrier-neutral colocation facilities — try saying that five times fast. This was very important. Perhaps the most pioneering company of this kind,

which still exists today, is Equinix, founded in 1998, which pioneered the concept of the international business exchange model. Here is the idea: imagine you have a hotel, and that hotel has a bunch of guests who could work really well together — very important guests from all around. What if you got all those guests hanging out in the restaurant together? It might be easier for them to do business. The Equinix model was no different. You essentially had cross-connections between major networks, just as that restaurant would have connections between major business people doing deals. Lots of other companies wanted to be in the same location as these other companies' computers, and so you had a lot of demand. These became very important locations — even today they remain incredibly important — and you could sell space at a very high price because it would allow you not only to house your company's information but to let it interact with other companies. Let's move right along to the 2000s, where hyperscale emerges — the next natural step. Really it's two things. The first is what was known as cloud computing. I have a merit badge of having my first job be starting a cloud computing company. It was after I had done graduate work at Cambridge and received my degree, and rather than working at an investment bank, I decided to start a company to store information on the internet. This would have been about 2006, and even then the concept of being in the cloud seemed absurd — and yet it was obviously going to be an important thing. Those who are old enough to remember this could recall sending email attachments being limited in size because file storage was expensive. Well, if you believed file storage would get cheaper, it made sense to just start a company to store things online. Amazon caught on to this pretty early. Starting in about 2006, it began to build AWS — Amazon Web Services — which allows you to store information and even use computers

remotely. Google then jumped on the bandwagon and did the same thing, building data centers to optimize for search. Then you have Facebook doing something called Open Compute Projects in 2011. But I think you get the idea. We go from the mainframe, then suddenly the mainframe is divided up into pieces, and then suddenly those pieces from different companies are attached together in a place like Equinix. Then you have these large companies building their own very large data centers. Companies building these large data centers — known as hyperscale data centers because of their sheer size — would achieve economies of scale that would be impossible in what was considered the old-fashioned data center. They would literally deploy thousands, even tens of thousands of servers. As the development of hyperscalers occurred and as people started to store more and more things online, something also happened with the financial environment related to data centers. By the 2010s or so, data centers were becoming so big and the use of them so common that they became not so much technology assets, but more like infrastructure assets — like a bridge or a tunnel. This is when a lot of these companies, such as Digital Realty or CyrusOne, expanded incredibly aggressively and actually turned themselves into REITs. Just like buildings, they were considered infrastructure, and this was not an unfair comparison. Data centers had long-term contracts — three to ten years — high switching costs, and growing demand. This was very reminiscent of an office building with a client on a long-term lease. And so these new kinds of REITs started to emerge. Companies in the private equity world deployed tremendous amounts of money and valued these companies at very high valuation multiples, often double what a typical office building might trade for. Why? Because unlike traditional office buildings, demand was growing very fast and there was scarcity value with a lot of new demand. So it made sense.

Then we skip along to the next iteration — starting around 2022 — where a lot of the assumptions of the past forty years of data centers seem to change. For one thing, the amount of power needed was a lot more. The amount of cooling needed was a lot more. The cooling costs of these new data centers, with their high-performance chips made by companies such as NVIDIA, required such intensive cooling that just the cooling costs alone ran 30 to 50 percent higher. So in other words, it got a lot more expensive — not just because they were absolutely massive, but because of what was going on inside. If it were a hotel, you went from law firms that were nice and quiet to special tenants that required soundproof rooms, special heating requirements, and all kinds of specializations that made these custom facilities a lot more expensive to build. Just the chips, for example, which used to be fairly commodity-style, would cost about 40 to 45 percent of the entire build-out. This introduced a very challenging depreciation schedule in which a significant portion — perhaps 40 percent — of this massive infrastructure could depreciate in three to five years, even though the infrastructure of the prior generation could last 20 or 30 years. A traditional data center depreciates infrastructure over decades, with incremental server refreshes. But these new data centers were facing $400 million-plus in depreciation costs that were really unavoidable. And so, of course, the economics changed — and not for the better. Now shifting gears, we'll begin to discuss the historical case studies. Markets rhyme, and so let's begin and find out where it rhymes and where it doesn't. There's an old saying that when you hear hoofbeats behind you, expect a horse and not a zebra. Well, most of the time that's a fair assumption, and in this case it is fairly straightforward. But we must also recognize that no matter how much we think we know about what comes ahead, we can never perfectly connect the dots of the future. That's why it's a worthy exercise to look at the dots of the past.

To quote Harry Warner, the founder of Warner Studios — the great Hollywood mogul — when presented with the idea of talkies, in which movies would no longer be silent, his famous response was: "Who the hell wants to hear actors talk?" In a world in which movies are silent, of course talking would seem absurd. Western Union notoriously looked at buying this telephone thing and, to quote its internal memo, dismissed it as having "too many shortcomings to be seriously considered as a practical form of communication." Remarkably, Western Union, in spite of this error, still exists today. And then, of course, you have some more recent comments from Steve Ballmer. He declared several things which were questionable, and perhaps the most memorable was about the iPhone. The iPhone, I think, represents one of the great iconoclastic products of at least the last fifty years — in the sense that there was one company, driven by one absolutely dedicated person with a mission, no matter what, even though it went against all existing social norms, bureaucracies, and business models. Because of course, if you're going to make something that's completely novel, the incumbents — and even the consumers — are not going to be very pleased with it, because they're unfamiliar with it. As Mr. Ballmer said: "There is no chance that the iPhone is going to get any significant market share." This is not to poke fun at Steve Ballmer, or Harry Warner, or Western Union. It's just to underline the simple fact that anything that is current and deeply embedded in social norms and societies is going to feel uncomfortable — and frankly just dismissive, not even fearful — of new technologies. Because people have benefited over decades from a certain view of the world, and the concept of that changing is simply difficult to contemplate. This is not unusual; it's very common. And it's not to criticize the visionaries of each era who commonly had this attitude about anything very different from what they knew.

If I had been surveyed by Apple and asked whether I would ever consider buying a phone with no keyboard, I would have absolutely said I would never do it. In fact, it took me a few iterations of the iPhone before I finally abandoned my BlackBerry — a company that very few will be familiar with now. We don't always know what we want. Sometimes we have to see it to believe it and to want it. And so in the following case studies, we have to approach the comparisons and the criticisms with some humility. The technologies described were transformative, and often led to excesses beyond reason. Each of the following stories is brief by design, sometimes lacking in detail by design, because the intent is to find the parallels, the differences that matter, and the investment lessons that still endure across centuries. So let's begin.

Part I: Historical Patterns

Railway Mania: 1840s Britain

Suddenly, a parliamentary act becomes a series of lottery tickets. On October 15, 1844, the Liverpool and Manchester Railway Company made an announcement. The announcement was simple: a dividend. Ten percent on par value. This event marked the eighth consecutive year that the company had paid a doubledigit dividend. The Manchester Guardian reported the news on page three, noting that the proprietors appeared well satisfied with the result, and that shares had risen to £12 in value following the announcement. Now, of course, this seems like typical and normal financial reporting. But what seemed like a routine event triggered the most spectacular infrastructure mania in Britain's history — an episode that would, over time, see nearly 1,000 railway companies formed, £200 million invested (perhaps the equivalent of $60 billion today), and ultimately 150,000 families financially ruined. The Liverpool and Manchester Railway had opened in September of 1830. What it had done was connect two industrial cities separated by 35 miles. The railway cut travel time from 12 hours by canal, to four hours by stagecoach, to an astonishing two hours by train. Freight costs fell. The disruption could be seen everywhere. Coal that had once cost $2 per ton to move by canal fell to 50 cents per tonne by rail — a 75% reduction that made previously uneconomic activities suddenly affordable and profitable. The success was immediate and absolutely undeniable. First-year revenues exceeded projections by 40%. Five years later, by around 1835, the company was paying well over a 10%

dividend while also retaining the capital it needed to expand. This was about as good as it could be. The Liverpool and Manchester was not unique, however. Take the Grand Junction Railway, which connected Birmingham to Liverpool and Manchester, opening in 1837 and paying a 10% dividend. The Great Western Railway, connecting London to Bristol, followed — opening in 1841, paying an 8% dividend despite construction delays and cost overruns. Perhaps the reader can see a trend developing. The London and Birmingham Railway, opened in 1838, paid 10% dividends and saw its shares trade as high as £240 — an 80% premium over the original subscription price of £150. These were not speculative returns on risky ventures, said the promoters. After all, they were proven operating railroads generating consistent cash flow from passengers and freight traffic that was visible, growing, and reflected in actual cash dividends. British investors who had grown wealthy from the Industrial Revolution, but faced paltry returns on government bonds — perhaps about 3% at the time — certainly noticed. If established railways were paying 8 to 10% dividends reliably, it seemed almost a certainty that new railways connecting other cities would deliver the same returns. The math seemed very straightforward: build a railway for £30,000 to £50,000 per mile, charge passengers 2 to 5 pence per mile and freight 1 penny per mile per ton, achieve a 60 to 70% utilization rate in the first year, and generate a 10 to 12% return on invested capital — indefinitely. Life was good for these early investors.

1845: The Parliamentary Gold Rush That Changed It All What could possibly go wrong in such a well-functioning economic system with excellent capital allocation and efficiencies that everyone benefited from? In this case, the trigger for what became a mania was not some new technology, but a quirk of British law. Railways required parliamentary approval — an act of

parliament granting the company authority to buy land (compulsorily, through eminent domain), construct the railroad, and operate the train service. The parliamentary session of late 1844 approved 80 railway acts — a record at the time. The session of 1845 saw 240 applications submitted to parliament, of which 120 were approved. The session of 1846, the peak of the mania, saw 815 applications requesting authority to build over 21,000 miles of rail at a cost of £350 million. These are a lot of numbers, but let's put the scale into context. Britain's entire railway network in 1845 was only about 2,400 miles. And yet in 1846, the proposed applications involved building nine times the existing network in a single construction season. The estimated £350 million in proposed capital spending represented 35% of GDP — in U.S. terms today, roughly $7 trillion. Obtaining parliamentary approval became essentially a lottery. Winners could raise capital at inflated valuations; losers saw their schemes collapse. And many schemes there were. The process created an artificial urgency — companies rushed to submit applications, because once a railroad was constructed, that part of the country was taken. Pay parliamentary fees, secure approval before the competitor does. That was the game plan. The mechanics of company formation encouraged speculation. Railway companies issued shares requiring only partial payment up front. An investor could subscribe to £100 of shares for a deposit of only £10, with the understanding that the company would call for the remaining £90 over the next two to four years as construction proceeded. If one were to really think about this — just re-read the last two sentences — the mind wanders. The structure naturally created perverse incentives. Speculators could subscribe to shares in dozens of proposed railways, paying minimal deposits, while hoping only a few might receive parliamentary approval and see shares rise. Buying these

stub shares was effectively a lottery ticket. If a company's scheme received approval, the share price would typically double or triple almost immediately — and subscribers could sell at a huge profit without ever paying the remaining capital. If a scheme was rejected, the subscriber lost only the deposit. It was, essentially, a heads-I-win, tails-I-lose-a-little proposition — and this attracted many who are subject to our natural instincts to get rich quick, along with the excitement of uncertainty. Promoters arrived, of course. The promoters understood this dynamic and exploited it ruthlessly. George Hudson — known as the Railway King, controlling nearly 1,000 miles of railway by 1845 — became a master at manipulating the parliamentary process and the share subscription system. Hudson would announce a new railway scheme, collect deposits from speculators, use those deposits to bribe MPs, secure approval, watch the shares soar, and either complete the railway if it was economically justified or abandon the scheme while keeping the profits for himself. Hudson's success made him one of Britain's wealthiest men and a cultural phenomenon. He served as Lord Mayor of York, sat as a Member of Parliament, and entertained royalty at his estate. The York Herald wrote in March of 1845: "Mr. Hudson's genius for railway combination is unparalleled. He has created thousands of fortunes and made Yorkshire the centre of British commerce. His name will be remembered alongside Stephenson and Brunel as a builder of modern Britain." Contemporary accounts suggest Hudson controlled or influenced parliamentary approval for perhaps 200 railway schemes between 1844 and 1847, collecting fees and premiums totalling £3 million — about half a billion dollars today.

The Mania Intensifies: Competition The parliamentary approval process created an irrational railway mania. Multiple companies competed for routes in the very same

cities, and Parliament — influenced by lobbying and bribery, and viewing railway construction as economic stimulus — approved overlapping and redundant lines. Imagine two railways running side by side. It made absolutely no economic sense. London to Brighton, a distance of 51 miles, literally had four separate railway companies receive approval in 1845: the London and Brighton Railway (approved in 1837 and operational a few years later), the Direct London and Portsmouth Railway (approved in 1844), the London and South Coast Railway (approved in 1845), and others. None of these routes offered meaningfully different services. The duplication meant that the return on capital would plummet — four companies having invested £12 million in aggregate to serve a market that could only support one railway. Even splitting the market equally, each might return at best 2% on invested capital. And that was before expenses and debt. A dividend? Not at all. The situation between Manchester and Sheffield was perhaps the most absurd. The existing Sheffield, Ashton-under-Lyne, and Manchester Railway, opened in 1845, successfully transported passengers and freight. Business was good. Yet Parliament approved the Manchester and Sheffield Railway in 1846, followed by the Great Northern Railway of Manchester in 1847. Three railways serving the same 40-mile corridor, each having spent £2 to £4 million, competing for a fixed amount of traffic that couldn't possibly justify the capacity. The geographic redundancies extended into rural areas with minimal economic justification. The Eastern Counties Railway proposed a network of lines connecting small market towns in East Anglia. Imagine building multiple rail lines to villages of 3,000 people with no industrial base, no goods to transport, and scarcely anyone who wanted to use the railway. And yet the construction costs kept growing. Eventually, this became so overt that the public understood it was absurd.

The Economist wrote on October 18, 1845: "The railway gambling has now reached a pitch of absolute insanity. Schemes are proposed to connect every hamlet in the kingdom, many at costs that will never be recovered, by projectors whose only goal is to collect deposits from speculators and disappear. Parliament has become a casino where MPs accept bribes disguised as consulting fees to approve schemes that will destroy the fortunes of widows and orphans."

The Times of London was even more direct, on November 3, 1845: “We are witnessing mania. There is no word for it. Men who possess neither engineering knowledge nor financial acumen propose railways through terrain that defies construction, serving populations that cannot pay, funded by capital that does not exist. Between October 1845 and October 1846, British investors subscribed to railway shares with an aggregate par value of £200 million — equivalent to perhaps $60 billion — representing again roughly 30% of Britain's GDP. The subscription process itself became surreal. Newspaper advertisements ran 20 pages of listings, and speculators employed clerks specifically to track which schemes received parliamentary approval so they could trade the shares for a quick profit.

The Collapse, 1847–1850 Like any speculative bubble, cracks emerge for different reasons. In this case, the initial cracks appeared in September of 1847 when the first capital calls came due on railways that had received approval the year before. Investors who had subscribed enthusiastically — putting down deposits of £10 — suddenly faced demands for additional installments of £20 to £40 per share to fund construction. Many subscribers, having speculated across dozens of schemes, did not have the capital to meet these calls. Suddenly the shares became unmarketable. Simply put,

nobody wanted them because as soon as you owned those shares, you had to pay up the capital. The shares naturally collapsed. The York, Newcastle and Berwick Railway provides a good example. The company received parliamentary approval in 1846 for a 125-mile track requiring £4.5 million to build. Shares were issued at a par value of £100 with a £20 initial payment required. With approval secured, shares traded as high as £180 in late 1846 — a 56% premium — as investors anticipated 10% dividends once operational. The company then made its first capital call on September 15, 1847, requiring each investor to pay £25 per share within 30 days. Shareholders who had paid £20 were now obligated to pay £45 in total within months. Many wouldn't or couldn't pay. By October of 1847, shares traded at £40 — a 78% decline from the peak. By January 1848, they fell to just £10 as the company threatened to forfeit shares whose owners would not meet the capital calls. Investors who bought at £180 in late 1846 essentially lost 94% of their capital within 18 months. The Eastern Counties Railway faced the same dynamics. The company made a capital call in 1847 for £15 per share, and 40% of shareholders defaulted, forfeiting their initial payments and their shares. The company's share price fell from £78 to £5 as the market recognized that most proposed expansions would never be built and existing lines were less profitable than projected. The jig was up. The human cost was devastating and immediate. The Illustrated London News wrote on March 18, 1848: "Scarcely a family of the middling and upper ranks has escaped unscathed from the railway mania. Merchants, lawyers, clergy, physicians — all classes of people placed their savings into railway shares, believing themselves prudent investors in British prosperity. Then came the bankruptcies. They were daily occurrences. A physician in Manchester who placed £15,000 in railway shares lost all of it and had to close his practice. A barrister in Lincoln with £8,000 invested was reduced to nothing. These are not speculators or

gamblers per se, but business people undone by believing the promises of the railway projections." By 1850, an estimated 150,000 British families — about 5% of all households — had suffered significant financial losses and were close to the poverty line. Fifteen percent of GDP was wiped out in three years.

George Hudson: Hero Turned Villain Sometimes, when looking at these situations, it's helpful to look through the eyes of one particular participant. Often, whenever there is a bubble, the heroes on the upswing end up being the villains on the other side. George Hudson's empire began unraveling in early 1849 when shareholders of the Eastern Counties Railway demanded an investigation into the company's finances. What emerged was spectacular fraud — and what I cannot understand is why this was the only company that became a focal point. Hudson had been paying dividends from capital rather than earnings — collecting money from new share subscribers and immediately distributing it as dividends to existing shareholders to maintain the illusion of profitability. A standard pyramid scheme. It worked as long as new capital flowed in, but like most things, the chickens came home to roost. When subscriptions dried up in 1847 and 1848, the fraud became impossible to sustain. An independent audit commissioned in April 1849 revealed that the Eastern Counties Railway had paid £420,000 in dividends over three years while generating only £120,000 in profits — a £300,000 shortfall funded by new investors. The York, Newcastle and Berwick Railway investigation revealed similar practices. Hudson had paid 10% dividends in 1846 and 1847 despite the railway being under construction and generating zero revenue. Now, of course, in a moment of frenzy, this was apparent to nobody who wished to see it. Nobody likes a party pooper, and those who noted that it would be impossible to generate income

— let alone a dividend — from a railroad that had not yet been built were voices sadly ignored. Hudson had simply taken money from later share subscribers and paid it to earlier ones, maintaining the illusion of profitability. He resigned from all railway directorships in disgrace in May of 1849. His personal fortune — estimated at approximately £3 million in 1846, perhaps $500 million in today's terms — was largely confiscated through civil judgments. He fled to France in 1854 to avoid creditors and died in obscurity in 1871. The Times of London obituary remarked: "George Hudson did more than any man to defraud the British middle classes. His legacy is thousands of ruined families and a railway network built on fraud and speculation. His name will forever be synonymous with financial disgrace." Of course this is true — but also, of course, this behavior was most likely being conducted across the board.

Is There Any Point to Looking at Railway Mania? Yes. The point is that it happens over and over. People get captivated by rising prices without revisiting the rationale. The higher it goes, the cheaper it looks. When everyone agrees, get out. If results don't lift shares, pricing may be efficient for now. Railway mania illustrates some aspects of any boom that repeat over and over, just with different names at different times. A few things come to mind, particularly around regulatory approvals, capital structure fragility, geographic redundancy, and creating permanent overcapacity. Let's dig into some of these topics.

The Regulatory Approval Lottery: A Dangerous Game to Play Railway mania's parliamentary approval process was one of the key factors that led to such a misallocation of capital. What it did, in effect, was create artificial urgency around a binary outcome. Either you were approved as a company and saw your shares soar, or you were rejected and your shares collapsed. The process naturally rewarded lobbying, political connections, speculation,

and bribery rather than sound engineering or any consideration of whether these railroads were economically viable. One can look at things happening today and see a few analogies. They're just a little different. Today, for example, data center operators certainly don't need a parliamentary act, but they absolutely require utility connection agreements providing power access. Northern Virginia utilities quoting 24 to 72-month delays for interconnections creates another scarcity factor, just like the parliamentary approval process did. Meanwhile, companies that actually secure power commitments possess gated advantages — they can build while competitors wait. As a result, anybody coming in needs to deploy enormous amounts of capital very quickly because they do not have the benefit of incumbency. The result is not necessarily a parliamentary lottery, but very typically unconventional arrangements to somehow make it happen. If one needed to cite a company, one could find many examples — from NVIDIA's capacity commitment to CoreWeave to Oracle's securing permits for gigawatt-plus data centers. All of these things rhyme very much with the same human psychology that occurred in the railway mania.

The Capital Call Mechanism And Debt Fragility This is also something to keep in mind whenever we see these manias. The mania's capital call structure — pay 10 to 20 percent up front, owe the remaining 80 to 90 percent in future installments — created essentially leverage that would destroy investors when the capital calls came due and subscribers couldn't or wouldn't pay. This wasn't traditional debt in the normal sense where an asset can be underwritten and money lent, but in effect it functioned exactly like debt. Investors had committed to future cash outflows that became unsustainable when circumstances changed. Right now, without diving into the details, which will become a matter of history within months, we see the same behavior.

Whenever one sees leverage building in a system, one has to question what's happening. One can just look at the simple idea that companies involved — companies that generate perhaps more profits and are more profitable than any businesses in human history — find the need to take on debt. That is a remarkable concept. There is a trillion-dollar capital expenditure taking place. It's a game that the biggest and most successful companies in human history are playing. Much like the railroads, the outcome is not clear, even if the fundamental concept is sound and proved itself out over subsequent decades. The entire infrastructure dynamic today is the same. Hyperscalers are committing ever-larger amounts of capital — Amazon recently changed their capital expenditure budget from $150 billion to $200 billion. It is staggering. The capital calls are not really to the individual railway investor — but are they? Really, the capital needed to build these data centers is no different. The capital calls are on equity holders and bond holders to fund these data centers. If one owns equity in a company and the company is spending in a manner that has elements of speculation, that becomes capital funded by the owners and the investors. If any of these large-scale projects — totaling over a trillion dollars — disappoint even slightly, the whole thing becomes untenable. Not because the infrastructure isn't good or isn't sound, but because the shareholders and creditors essentially got burned, and eventually they will not tolerate this kind of use of money that they own. Let us not forget that when a company generates free cash flow, you can imagine a stack of money, and if you own shares, part of that money is yours, held in trust by the management team. Railway mania investors ultimately refused to fund the capital calls, and that caused the schemes to collapse — despite the fact that obviously the railroads were useful infrastructure. In the case of the massive data centers today, investors may face similar

decisions. Do you want to continue funding losses for uncertain returns, or will you cut losses and sell the shares? We don't have to speculate on whether the infrastructure is sound. We must only ask: is it worthy of the spend, and will it ultimately yield a good return? Geographic Redundancy and Permanent Overcapacity The other repeating pattern relates to geographic redundancy and capacity that is too large and can't easily be changed. Overcapacity is a general concept. If one is mining a commodity, or even operating restaurants, capacity can be altered to fit demand. If a restaurant chain has locations that are underperforming, you can close them. If a commodity is no longer valuable, you can mothball the mine. If you have an offshore oil rig, even an immense physical infrastructure, it can be cold-stacked. Capacity can be reduced to fit demand. This is the issue with certain kinds of infrastructure — sometimes overcapacity has a physical permanence to it. Railway mania's biggest error was approving too many competing routes in the same cities. Four railways to Brighton, three to Manchester-Sheffield, two serving tiny Sunderland and Durham. These all represented capital deployed with zero chance of a reasonable return because capacity permanently exceeded demand. Perhaps permanent is not the right word, but when one thinks about the decades of consolidation that followed — and that it was almost half a century before demand fit the infrastructure — it's fair to say that as far as the investors at the time were concerned, there was really zero possibility of return. Then back to today, we see the same pattern. Many companies building the same stuff with the same chips for the same goals. Each operator independently concluded that connectivity, power, and the use of certain new technologies is a justified investment or at the very least an insurance policy needed to guarantee survival. Whether that's true or not, it does not answer the simple question: how long will this capacity be

needed, and is it going to be as useful as others think? Certainly there are companies spending this capital that could absorb the shock. However, this could mean a loss or at least a zero return for many shareholders for years to come. The crucial difference from railway mania is this: railways, once built, functioned for 50 to 100 years with minimal upgrades. Steel rails from 1850 remained usable even in the 1950s. Overcapacity was physically persistent for generations. The obsolescence cycle of current state-of-the-art GPUs is perhaps three to five years. This forces continuous reinvestment and creates a different dynamic. Overleveraged operators who can't afford hardware refresh cycles will face rapid competitive disadvantages, forcing earlier consolidation — probably through bankruptcy or acquisition. Railway mania's consolidation took roughly 75 years. AI consolidation could occur in five to ten years because the technology change forces decisions rather than allowing zombie companies to persist. That's probably a positive for when these situations have their moments of failure. Customer Concentration, Single Points of Failure Railway mania featured less customer concentration than what we see today in data centers because railways served thousands of individual passengers or hundreds of freight shippers — no single customer ever representing more than a low single-digit percentage of revenue. However, railway mania faced geographic concentration: railway schemes connecting small industrial towns depended completely on those towns not only being economically vital, but hopefully growing. When local industries declined, the railways became deeply unviable. Coming back to today, we have actually a worse situation of customer concentration. CoreWeave is a company that derives 72% of its revenue from Microsoft. If Microsoft reduces orders, backward integrates, or negotiates price concessions, CoreWeave will face an existential crisis, regardless of what happens to AI broadly. This is binary risk that railway mania companies did not

face. It would be the equivalent of having 72% of your railway passengers from a single employer. The circular exposure continues. CoreWeave owns $3.5 billion in OpenAI stock and is depending on $16 billion of OpenAI contracts. I suggest to the reader to not even bother trying to understand that — just follow your instincts, which are that it's not good. If OpenAI fails, CoreWeave suffers twice: it loses its revenue and its equity value. Railway mania investors had a more straightforward risk. If the railroad failed, you lost your money. They didn't also hold equity in their largest customers, whose failure would trigger cascading losses. Where the Analogy Breaks Down The technology velocity and speed of consolidation is where the parallel most clearly diverges. Railways, once built, functioned for 50 to 100 years with minimal need for upgrades, and this meant that railway mania's overcapacity was persistent for generations. There was no obsolescence, and that's why it took close to 100 years to achieve final consolidation. The obsolescence of the current state-of-the-art GPUs in these data centers is perhaps three to five years, which forces continuous reinvestment and a different dynamic entirely. Furthermore, the demand uncertainty is probably less severe for AI than it was for railways. Railway mania was genuinely speculative in ways beyond what is described in this writing. Would passengers and freight shippers actually adopt trains over existing alternatives? There were real dangers and real alternatives — canals, stagecoaches — and the technology was proven but demand was genuinely unknown. AI infrastructure, by contrast, serves customers with visible contracted demand. Whether it's profitable demand is another story, but the hyperscalers aren't speculating on whether demand exists. There unquestionably is demand. The question is merely whether it's enough. However, I would claim that this advantage is only partial. Railways in 1846 served existing demand — travelers already

moving between cities. AI infrastructure to a significant degree serves future demand: enterprise customers who don't yet know how to use AI. The demand visibility may have actually been better for the railways, even if not certain. It's projected demand that won't be validated for years in the case of AI, and I'm not sure which side of that coin is better. Meanwhile, the regulatory environment differs substantially from the railroads. Railway mania faced immediate heavy regulation — Parliament controlled the routes, rates were regulated, labor standards mandated, and eventually the whole thing was nationalized. AI infrastructure currently operates with fairly minimal regulation beyond standard utility and zoning requirements. This gives operators more flexibility and more pricing power than the railways ever possessed. Still, regulatory expansion is possible and probably likely. If governments consider the AI infrastructure strategically important — as they also considered railways — there could be similar interventions.

The Final Lesson: Necessary Infrastructure Funded By Unnecessary Speculation Railway mania's enduring lesson is this: necessary infrastructure can be funded by unnecessary speculation. Sometimes I jokingly call it the charitable giving of capitalism through misallocation. But this is unfair, because in reality it destroyed individual fortunes even if it created collective wealth over time. Britain absolutely needed an expanded railroad network in the 1840s — the economic case was rock solid, and that proved itself out over subsequent decades. However, funding that necessary infrastructure through speculative mania, fraudulent promotion, overlapping redundancy, and leverage destroyed the wealth of the families who provided that capital. And it was vast. AI infrastructure is going to face, and is facing, a similar dynamic. On the one hand, the build-out is necessary. The technology is transformative and the long-term economic impact

will probably exceed current predictions. On the other hand, the capital structure — leveraged operators, customer concentration, aggressive timelines, speculative valuations — may destroy current investors while the infrastructure ultimately benefits society later on. The strategic positioning from railway mania suggests the following: avoid speculative schemes promoted by charismatic figures making extraordinary claims, such as the notorious George Hudson described earlier. Instead favor established operators with proven business models and diversified revenue — the Great Western Railway equivalent, perhaps today being Digital Realty. And then, simply wait. Wait for consolidation and distress, because it seems to be written in the stars, and then acquire assets at fractions of the building cost. Railway mania investors who bought Great Western Railway shares in 1845 and held earned a modest positive return over the decades. Investors who bought speculative schemes lost 90% of their capital. The parallel to AI is crystal clear, at least to this writer. Established operators with fortress balance sheets will likely generate some kind of reasonable return. Speculative plays dependent on perfect execution, new forms of AI utilization that have not yet been adopted, and no technological disruption that could destroy the current method — they will not do well. The market rewards the prepared and the patient, not the lucky. Three years can look like luck. Six years will probably reveal the truth. I look forward to seeing that truth play out. But in the meantime, buyer beware.

Who the heck cares about railroads in the 1800s? There's an old saying that for knowledge you want to add, and for wisdom you want to take away. For better or for worse, I've just thrown a lot of knowledge at you. For me, there are three things worth keeping.

One: The Public Knew There's a common view in the investing world that magazine covers are always a contrarian signal — that by the time something makes the front page, it's over. And yet here we have a situation where the media not only saw it coming, but wasn't vague about it, and wasn't speaking in hindsight. In October of 1845, the two most widely read publications in Britain warned the public explicitly. The Economist called it gambling and said the only goal of the promoters was to collect deposits from speculators and disappear. The Times was more direct still, writing that there were no words for what was happening — that these were railways through terrain that defies construction, serving populations that cannot pay, funded by capital that does not exist. The Economist even mentioned orphans losing their money. Both published in 1845. Both read by exactly the people buying the shares. Subscriptions accelerated after. The problem was never information. It never is. Two Patterns That Don't Rhyme — They Just Repeat The first is that a bubble does not come from nowhere. It comes from something genuinely real. The first railroad paid a 10% dividend for eight consecutive years. That was real money. The mania that followed didn't emerge from fantasy — it emerged from genuine excitement about attractive economics from a game-changing technology. That's what made it so hard to resist and so hard to see clearly. The second is that the people who lose the most are never who you think. It isn't the gamblers. The gamblers get out. The gamblers know what they are, and overall didn’t have large loses. It's was the so-called respectable ones who lose everything. The careful readers of the prospectuses, the prudent long-term investors, the ones who would have been offended if you called them speculators. The Illustrated London News named them by profession for a reason. They were not reckless people. They

were institutions. And that is precisely why they stayed too long. Menawhile, the low class gambler knows it’s a gamble, and often knows when it’s time to fold.

Part I: Historical Patterns

The Electrification of America, 1880–1920

The electrification that occurred in the 1880s to the 1920s, when power grids transformed everything... eventually.

September 4, 1882: Pearl Street Station Opens to Skeptical Customers At 3 p.m. on Monday afternoon in September 1882, Thomas Edison stood in the basement of 253 Pearl Street in Lower Manhattan. He gave the order: close the main switch. Above him were the offices of Drexel, Morgan, and Company at 23 Wall Street — the house of J.P. Morgan. Seconds later, 400 incandescent light bulbs flickered to life. John Pierpont Morgan, who had invested $30,000 in Edison's venture and had even allowed his home to be the first residential installation of electrical lights, watched the bulbs glow with steady and reliable light. As an aside, if one thinks this was an obvious investment, it certainly was not. In fact, J.P. Morgan's investment with Edison had led to chagrin from his father, who saw it as frivolous and potentially a loss-making operation. In any event, there were no flames, no gas fumes, no need to light individual fixtures. What was happening in the office was the first of its kind: simple, clean, instant illumination from electricity generated a quarter mile away, delivered via underground copper cables. The next day, the New York Times ran the story on page five: "Edison's Electric Light: The Great Inventor's Triumph in Electric Illumination." The article noted that Edison's system

lighted about 400 lamps in stores and offices of the First District, and would light many more as fast as the wires could be extended. The light, it continued, was soft, mellow, and grateful to the eye. It did not flicker. Edison had at last succeeded in solving the problem of the subdivision of the electric light. What the Times didn't report — because nobody really understood it yet — was that Edison had just triggered a transformation that one would imagine would unfold rapidly, but in reality would take 50 years, require cumulative investments exceeding $20 billion (close to a trillion dollars in today's money), and ultimately rewire the entire human civilization. Much like the railroad, this was a major breakthrough that would change everything, but the change was slow. First, it would nearly bankrupt Edison himself. It would destroy dozens of competing electrical companies. It would consume the fortunes of visionary industrialists and spark what was called the War of the Currents — a conflict that would see corporate espionage, public electrocutions, and propaganda campaigns that made railway mania's promotional excesses look almost principled.

The Economics of Pearl Street: A Business Model That Barely Made Sense The Pearl Street Station powered 85 customers across 59 buildings in a one-square-mile district. The generating plant consisted of so-called jumbo dynamos — 240-kilowatt generators powered by steam engines burning coal. Total capacity: 1,440 kilowatts, enough to power perhaps 5,000 lamps if all customers used it simultaneously. Of course, they didn't. Peak demand never went above 40% capacity, as many customers remained uneasy even in those early days. The capital costs were extraordinary: $300,000 (about $10 million today) for a facility that generated perhaps $30,000 a year in revenue, roughly $0.24 per lamp per month. The implied

payback period was 10 years — a long time by any measure — and that was before accounting for operating costs, fuel, maintenance, and Edison's development expenses, which could vary wildly. Edison was characteristically the optimist. He told investors that Pearl Street would achieve profitability within three years and serve as a template for replication worldwide. In October 1882, he told the New York Herald: "We will light every city in America within 10 years. The cost of electric light will fall below gas within five years. Electricity will power factories, streetcars, and eventually homes for cooking and heating. This is not speculation. It is an engineering certainty." What is remarkable about Edison's statement is its prescience and extraordinary accuracy. Unfortunately, nobody asked him how long it would really take — in which case he would have blundered badly, being off by perhaps 30 years. This engineering certainty he spoke of proved elusive for quite some time. By 1884, Pearl Street station had expanded to serve 508 customers with roughly 10,300 lamps but was barely breaking even. Edison General Electric Company had established 121 central stations across the United States by 1888, but most lost money. The fundamental issue: electricity cost $0.25 to $0.50 per kilowatt hour to generate and distribute, yet customers would pay only $0.15 to $0.20 for illumination that competed with gas lighting, which cost $0.10 to $0.12 per equivalent brightness. Gas lighting was still cheaper. Even when prices were so low they didn't cover the cost of operation, the mathematics simply didn't work. Edison kept building regardless, convinced that scale and improved efficiency would somehow flip the economics — though no record explains exactly how he believed that would occur.

1886–1893: The War of the Currents — A Battle of Standards While Edison struggled to make direct current (DC) economically viable, a very different approach was being developed in Pittsburgh. In 1886, George Westinghouse — industrialist, inventor, and holder of over 300 patents in air brakes and railroad signaling — bought the American rights to an alternating current transformer invented by French engineer Lucien Godard and British engineer John Gibbs. Westinghouse saw the potential immediately: AC power could be transmitted over far greater distances with far lower losses than DC power. Edison stubbornly refused to acknowledge this. Westinghouse saw his chance. The technical difference was profound. Edison's DC system operated at 110 volts, with a practical transmission range of about one mile. Beyond that, resistance in the copper wires consumed so much electricity that delivery became economically unviable. This forced Edison to build a generating station within one mile of every customer cluster — an enormously capital-intensive requirement that made electrifying rural and suburban areas with dispersed populations essentially impossible. Westinghouse's AC system, by contrast, operated at 1,000 to 2,000 volts for transmission, then stepped down to 110 volts at the customer's location using transformers. The higher voltage meant far lower current for equivalent power — far lower resistance losses — meaning electricity could travel further without additional cost. AC systems could transmit power 20 to 30 miles economically, compared to Edison's one mile, allowing a single large generating station to serve a radically larger area. Instead of building 20 small generating stations at $300,000 each, Westinghouse could build one large station for $2 million and serve the same area at lower cost to the customer. The economics would suggest it was game over. However, Edison had, as my grandfather might say, little give-up in him. Edison understood the threat immediately and

reacted with a campaign of propaganda, lobbying, and fearmongering that would be remarkable even by today's standards. In 1887, he hired Harold P. Brown, an electrical engineer, to conduct public demonstrations of AC current's dangers. Brown would electrocute dogs, horses, and eventually a full-grown elephant. The elephant's name was Topsy, who was electrocuted at Coney Island in 1903 using AC power to prove it was deadly. Edison's company distributed pamphlets titled "Warning from the Edison Electrical Light Company" declaring: "The alternating current is dangerous to life. It kills. Choose the Edison system, the only system that is absolutely safe." I recall advertisements I had seen, even aggressive ones between companies like Oracle and SAP on the back of The Economist, but never have I seen a company openly declare that a competitor's product would kill you. The campaign reached its most perverse peak when Edison lobbied New York State to adopt AC current for the world's first electric chair for executions. It was used on August 6, 1890, to electrocute convicted murderer William Kemmler. The execution was botched. Kemmler was subjected to multiple shocks over eight minutes before dying, leading the New York Times to write that "far better had the old system of hanging been retained." But Edison achieved his propaganda goal, which demonstrates the power of marketing over physics. AC became associated in the public mind with electrocution and death, with Edison supporters using "Westinghoused" as a euphemism for being executed. It is remarkable how the technology can matter far less than the perception. Whether a technology is adopted is not always a matter of physics. In this case, the physics were obvious and compelling, and yet one individual did everything he could to ensure that the superior technology would not be adopted. Westinghouse fought back not with propaganda but with engineering demonstrations. On May 1, 1893, he won the

contract to illuminate the World's Columbian Exposition in Chicago — the World's Fair celebrating the 400th anniversary of Columbus's voyage. His AC system powered 250,000 incandescent lamps across the fairgrounds, creating what was called a "White City" visible for miles and drawing 27 million visitors over six months. The fair included Nikola Tesla's demonstrations of AC motors, transformers, and wireless transmission, showcasing AC's capabilities to a public in awe. Most decisively, Westinghouse secured the contract to build the first major hydroelectric plant at Niagara Falls in 1893. The Niagara Falls Power Company awarded Westinghouse the contract to install 5,000-horsepower AC generators — by far the largest electrical installation in the world at the time — that would transmit power 20 miles to Buffalo. Operations began on August 26, 1895, proving definitively that AC could transmit large amounts of power over very long distances in a way Edison's technology simply could not. Within a year, the plant expanded to 50,000 horsepower; by 1900, it generated 100,000 horsepower, powering not just Buffalo's streetlights but its factories, streetcars, and homes. The War of the Currents ended with a purely technical victory for Westinghouse over Edison's perverse attempt to win through fear. Edison never publicly acknowledged the defeat — which may partly explain why we speak so often of Edison and not of Westinghouse. The Edison General Electric Company merged with Thomson-Houston Electric Company in 1892 to form General Electric — yes, that General Electric — with Edison forced out of management by financiers led by J.P. Morgan, his early backer, who recognized the inevitable. By 1900, AC systems accounted for over 95% of all electrical installations, and standardization on 110–120 volts at 60 hertz, championed by Westinghouse and Charles Steinmetz at GE, became universal by 1910.

Isn't it interesting that in the United States and Canada, we associate Edison with electricity and the light bulb — when in fact it was a completely different approach that was ultimately adopted, and Edison appealed to perhaps the worst of human instincts in his attempt to win?

1900–1920: The Slow Adoption of Electricity Despite AC's decisive victory, electrification of the country moved far more slowly than promoters predicted — and far more slowly than anyone analyzing the situation might expect. In 1902, twenty years after Pearl Street Station opened, only 3% of American homes had electricity. The number of central power stations had grown to 3,620, generating 6 billion kilowatt hours annually, yet penetration remained concentrated in wealthy urban households and commercial districts. Seventy percent of Americans lived in rural areas, where farming was the most common occupation, and had no access to electricity at any price. The capital costs remained prohibitive. A typical residential connection in 1900 cost $75 to $150 — equivalent to $3,000 to $6,000 today — for wiring and fixtures alone, with monthly charges of $2 to $5 for the electricity itself (roughly $100 to $200 today). For a family earning the typical $500 to $800 annually, electricity was a luxury good. It was affordable only for illumination and even then only in the evening. Running electrical appliances — fans, irons, small motors — was economically prohibitive for all but the wealthy. We had a business model absolutely critical for the advancement of civilization that was very hard to actually implement. Business models fragmented into thousands of competing approaches. By 1907, there were 4,000 separate electrical utilities operating in the United States, most serving single towns or neighborhoods. Voltage standards varied wildly: Chicago alone had 20 different electrical companies operating at different voltages — 110, 220, and 500 volts — and different frequencies:

25, 50, and 60 hertz. A toaster purchased in Boston might not work in Chicago and certainly not in San Francisco. Even within Westinghouse's winning technology, lack of a true standard made it harder for civilization to embrace electricity. The fragmentation extended to service models as well. Some utilities charged a flat rate per lamp regardless of usage; others installed meters and charged per kilowatt hour (metering technology itself being a major innovation of the 1890s). Some companies owned the light bulbs and charged rental fees — Edison's original Pearl Street model. Others sold electricity and left customers to purchase their own fixtures. Some offered 24hour service; others operated only at night. Every business model one can imagine existed, even though the underlying value to the customer was identical: illumination via electricity. I make this note simply to say that regulation — like anything — is not inherently good or bad. As one author wrote, it is the thinking that makes it so. Things typically exist on a spectrum, and at times regulation can be immensely useful for industries that need a common standard. Without standardization, adoption slowed dramatically. Consolidation, Technology, and the Economics of Scale The transformation accelerated only when two developments converged: technological improvements that dramatically reduced costs, and financial consolidation that created economies of scale. This pattern will repeat over and over. Consolidation reduces costs. And where no standard exists, a standard can be created — just as in the railroad bubble, where consolidation not only created scale but created a standard of service. On the technology front, the shift from steam engines to steam turbines revolutionized generation efficiency. Charles Curtis at General Electric developed practical steam turbines in the early 1900s, and by 1910, turbines had largely replaced reciprocating engines in major plants. Turbine efficiency reached 30 to 35% — converting 30 to 35% of the coal's energy into

usable electricity, versus only 10 to 15% for reciprocating engines. The improved efficiency cut fuel costs by 60 to 70%, finally enabling utilities to reduce prices while maintaining profits. To pause for a moment: it was not simply a question of which technology should be adopted. Once that technology was adopted, the problems of the real world emerged and required their own solutions — sometimes their own technological revolutions. Without the invention of the turbine, it would likely have taken even longer for electricity's benefits to reach a broad population. And then there was scale. A one-megawatt plant in 1900 generated power at approximately five cents per kilowatt hour. A ten-megawatt plant could produce the same electricity for three cents per kilowatt hour. By 1920, hundred-megawatt plants reached one and a half to two cents per kilowatt hour. Few at the time of Westinghouse's battle would have foreseen the necessity of consolidation; but the economies of scale made larger plants dominant, and becoming large was the only viable strategy for widespread adoption. If you weren't big, you wouldn't make it. Electrification and Its Implications for Today The heroes of one part of a cycle often become the villains on the other. The Case of Samuel Insull, 1907–1932 The consolidation of the electrical industry came from an unlikely source: Samuel Insull, Thomas Edison's former personal secretary, who had moved to Chicago in 1892 to run a small electrical company. The irony of this particular person driving the consolidation of a technology that was the nemesis of Edison's own vision is not lost on historians. Insull recognized, having lived through the journey himself, that electricity had unique economics. Enormous fixed costs — generation plants, transmission lines, distribution infrastructure — combined with near-zero marginal cost for each additional kilowatt hour.

The business model that maximized returns was therefore to maximize utilization. Once you covered the fixed cost, every additional dollar dropped straight to the bottom line. The more customers served by the existing infrastructure, the lower the cost per kilowatt and the higher the profit. It was Insull who pioneered three specific strategies that would eventually become universal: geographic monopolies enforced through exclusive franchises, aggressive load balancing to maximize utilization of the physical infrastructure, and financial engineering through holding company structures. That third piece deserves a closer look. The holding company structure was a pyramid of small equity investments that translated into control of an enormous system. A series of small investments that pyramided into control.

The Geographic Monopoly The geographic monopoly strategy involved negotiating exclusive service territories with city governments. One interesting attribute of the United States, compared to say the railway bubble in Britain, was the presence of local governance alongside national. In exchange for agreeing to serve all customers in a particular territory — even unprofitable ones — utilities received monopoly protection from competition and the right to charge regulated rates that guaranteed returns. Regulation is often viewed from a capitalist perspective as a bad thing, since it might limit the upside. But in the case of massive infrastructure investments, it was really protection on the downside — ensuring that there would be returns at all. By 1910, Chicago's 20 competing electrical companies had consolidated into Commonwealth Edison under Insull's control, serving the entire city with standardized 120-volt, 60-hertz power at regulated rates. It stands to reason, reflecting on the railroad system, that had Parliament recognized the natural monopoly characteristics of

railways and regulated them accordingly — not approving a new track if one already existed, in exchange for regulated rates — the catastrophe that took almost 100 years to resolve might never have happened.

Load Balancing Load balancing was an innovation that involved deliberately targeting customers with different usage patterns. Residential customers used electricity primarily in the evening for lighting. Factories drew power during daytime hours. Streetcars consumed power during morning and evening commutes. By serving all three customer types from the same generation and transmission infrastructure, Insull was able to run plants at 60 to 70 percent utilization, compared to 30 to 40 percent for residential-only systems. Consolidation of demand led to increased utilization of the system. Improved utilization meant lower costs per kilowatt and higher profits — and this was in spite of the ceiling imposed by regulated rates. It was clever and really quite legendary. The Holding Company Structure and Its Collapse However, like most financial engineering, greed can set in. And in this case, there was ultimate catastrophe. Insull created a holding company structure where a parent company owned controlling stakes of 51 percent in each subsidiary utility, which themselves owned controlling stakes in smaller utilities. The pyramiding meant that one dollar of equity at the very top could control perhaps 10 to 20 times the assets at the bottom. Insull's Middle West Utilities — the apex holding company — controlled over 600 operating companies serving 5,000 separate communities across 39 states, with only $200 million in equity controlling $3 billion in assets. For those who read the section on railway mania, this will be familiar. In that case, it was the use of £10 up front against £90 in future capital calls that created implied leverage in the share price. The structure is different; the idea is identical.

Leverage in the system is always the case, and the complexity of that leverage in a new technology is just complexity. The pattern repeats. The model worked well during the growth phase. Insull could raise $10 million selling bonds backed by predictable revenues, use those funds to build new generation and transmission capacity, connect new customers, watch sales rise, and then borrow even more to expand further — a virtuous cycle. Anyone who has observed the consolidation of an industry will recognize this. If a parent company trades at 20 times earnings and acquires businesses trading at 5 times earnings, the accretion is immediate and real. Insull's utilities served 500,000 customers in 1912 and 4.7 million by 1929. His personal fortune reached $150 million by 1929 — something in the tens of billions in today's terms. He was one of America's richest men, and his utilities generated 15 percent of the country's entire electricity supply by the late 1920s. But leverage works in reverse. The Great Depression devastated utility finances just as it devastated most finances. To survive such a collapse in demand required steadfast resolve, time, and an enormous balance sheet. Insull had none of these. Unemployment meant customers couldn't pay their bills. Commercial and industrial customers cut their usage as businesses shuttered. Bond investors became skittish about lending. Insull's holding company structure amplified every problem. Missed payments at the subsidiary level triggered covenants at the parent, which triggered further covenants, creating a cascade. He had no protection built into the holding company structure, having never imagined a collapse of this magnitude. On June 6, 1932, Insull's empire collapsed into bankruptcy, wiping out $500 million in shareholder value — roughly $20 billion today. Much like in the railroad bust, the investment world was not yet institutionalized, and everyone was exposed. Utilities,

like railroads, attracted conservative investors — those who didn't want risk — precisely because you could see and hear and use the infrastructure. They seemed safe. Roughly 600,000 investors, including thousands who viewed their shares as widowand-orphan investments, lost everything. Insull fled to Europe to avoid prosecution. He was eventually extradited, stood trial in federal court for mail fraud, embezzlement, and violations of bankruptcy law, and was acquitted on all charges in October 1934. The jury concluded that his financial engineering, while reckless, was legal. His reputation was destroyed regardless. He died in Paris in 1938. The New York Times obituary described him as "the symbol of the power trust and financial manipulation that brought ruin to thousands of investors." His utilities, reorganized under new ownership and stripped of holding company leverage, continued operating successfully for many decades — proving that the underlying infrastructure was sound even while the financial structure had been catastrophic. It is worth pointing out that although Insull's model was ultimately ruinous, it was also absolutely critical for the industrialization of the country. This is the irony of these things. The hero becomes the villain. Had Insull used a different structure — less debt, less leverage — he would probably have failed just the same when demand collapsed, because the fixed costs would have remained unavoidable. Very few businesses could survive what the Depression brought. In the end, as with the railroads, government intervention was required. When infrastructure becomes critical and fails, the government comes in. That too is a repeated pattern.

1920–1935: Finally, Critical Mass The Insull collapse and the Depression slowed electrification, but they didn't reverse the underlying trend — because the trend was

obvious. The decline under Insull was about the economy, not about the underlying imperative to electrify the country. By 1930, 68 percent of American homes had electricity, powered by 13,000 power stations. The urban-rural divide, however, was still very stark. In the cities, 90 percent of residents had electricity. In rural areas, only 10 percent did. Farmers and small-town Americans — still over 30 percent of the population — remained without electrical service because the economics simply didn't justify serving a dispersed customer base. It was the Roosevelt administration that addressed this through the Rural Electrification Act of 1936, which provided low-interest loans to rural communities to build transmission lines and infrastructure. The program connected millions of rural customers, and remarkably, by 1950 over 90 percent of the country was electrified. The rural program succeeded only because of federal subsidies that overcame economics that were unlikely to change on their own for decades to come. Rural electrification lost money on a pure commercial basis, without a doubt. The final consolidation came through the Public Utility Holding Company Act of 1935, which outlawed the pyramid holding company structures that Insull had pioneered. Utilities were forced to divest into single-state geographic monopolies with simple structures. Those thousands of small utilities operating in 1900 consolidated into about 200 by 1940, and then roughly 100 by 1960. These became the regulated monopolies that exist today — single-state operators such as Con Edison, Duke Energy, and Pacific Gas and Electric — earning regulated returns on a rate base with guaranteed service territories and predictable cash flow. By around 1950, electrification was essentially complete. Ninety-four percent of homes had electricity. Factories ran entirely on electrical power, enabling appliances — refrigerators,

washing machines, vacuum cleaners — that transformed domestic life. Do you recall that in the early days there was a standards problem even within Westinghouse's original empire? Without a standard, there was no reliable way to build on top of this new technology. Standards are incredibly important. The total capital investment from 1882 to 1950, and the economic value of the companies that emerged because of electricity, was well over a trillion dollars — perhaps half of GDP. The transformation was real, lasting, and genuinely revolutionary. The path, though, was remarkable: the destruction of fortunes, the audacity and awfulness of Edison's propaganda campaign, the bankruptcy of small utilities consolidated under a leveraged empire. It was quite something. Those who invested in utilities after consolidation and regulation — after 1935 — earned consistent returns for decades. Pacific Gas and Electric, Commonwealth Edison, and Con Edison paid reliable dividends and generated modest but steady stock appreciation from 1940 to 2000. The transformation created enormous economic value, but it accrued to patient investors who bought later. This is another pattern that repeats. It doesn't rhyme. It literally repeats. I'm reminded of when Larry Ellison started Oracle. His first product was version 2. When asked why it was version 2 when it was the first product, he reportedly said something to the effect of: who would ever want to buy version 1? It was a wonderful insight into the human mind. We generally don't want to take risk, but we also want to be active. We'd love to participate in the boom. And it takes extraordinary patience to simply wait for the real version 2 of a major technology revolution.

What Can We Learn from Electrification? The lessons, I hope, are fairly obvious — but let's go through them.

Electrification was a multi-decade arc. The infrastructure had to be standardized. There were standards battles. There was a persistent chasm between urban and rural. Adoption was slow. And scale economies played a critical role in making it all viable. Electrification took at least 50 years — depending on when you start counting — to reach 68 percent of households, and another 20 years to reach essentially everyone. The slow pace reflected the fundamental challenge: high upfront costs, uncertain demand, competing standards, fragmented business models, and the time required to achieve the scale economies that made service affordable. Does that sound familiar? I think it does. The infrastructure being built today is attempting to compress a similar scale of transformation into 5 to 10 years. The $2 trillion or so projected to be invested through 2030 represents an attempt to build out civilization-scale infrastructure in less than a decade — something that did not occur with railroads, did not occur with electrification, and not because those eras lacked ambition or capability. It had to do with very common, fundamental forces: financial engineering constraints, technology standards battles, and the natural pauses imposed by economic cycles. The compressed timeline today amplifies every single risk visible in the railroads and in electrification. There is no settled technology standard. Business models, to the extent they exist, are contested. And we are still in the equivalent of the 3 percent household penetration phase in terms of meaningful, paying use of this new technology. The Standards War and Technological Path Dependency Electrification's AC-DC war consumed 15 years before Westinghouse's technical superiority became undeniable. It included propaganda, public electrocutions, lobbying, and enormous capital deployed to competing incompatible systems. Edison spent tens of millions of dollars on DC infrastructure that was already obsolete. Thousands of utilities deployed equipment

that had to be replaced. The waste was enormous, but perhaps necessary. The market ultimately selected the superior technology — but only after both were competing at scale. The synthetic brain infrastructure of today is in the middle of its own standards war. You have NVIDIA's CUDA, AMD's ROCm, Google's TPUs, and AWS's Trainium — and it isn't even entirely clear what the technology is in its full form. This is perhaps one of the more interesting differences from past technology revolutions. It was clear what electricity could do. It would light a light bulb. The use case for railroads was equally clear. And yet we know this technology is critical. Because of the lack of clarity about the future, we are approaching it with the assumption that it's critical, but with genuine ignorance of exactly why. In the end, all the competing formats of synthetic intelligence represent billions in development and investment creating lockins through software ecosystems. Meanwhile, new labs are producing models that may disrupt incumbents and render some of this infrastructure investment unnecessary. The fight will take time to play out, though it is playing out rapidly in front of our eyes — partially because the technology itself is evolving rapidly in front of our eyes. There is another critical difference. Electrification's standards war was fundamentally technical. AC was objectively better for long distances. Once proven, adoption followed. With synthetic brains, what is the standard? It isn't clear. We're not even certain that language is the fundamental form. And even assuming language-based models proceed toward our destination, the systems, the use cases, and the reasons demand will be there all remain unclear. It is notable that the largest player in the space by users loses money. The chips have not fallen. The tree is being cut right now, and the chips may fall where they may — but the tree is still standing, and we just don't know.

All of this suggests that the moats we think of today in this sector are far smaller and shallower than they appear. Once alternating current's technical superiority was proven, competitors could license the technology or develop compatible systems. By contrast, all the different models today are proprietary and incompatible by design, while switching costs are low and use cases remain unclear. Unless regulators force interoperability — which seems unlikely — NVIDIA's position in the data center looks a little like Microsoft Windows, or perhaps Westinghouse's AC patents. Formidable, but not permanent. Geographic Adoption Curves and the Urban-Rural Divide Electrification achieved 90 percent penetration in urban areas by 1930, but only 10 percent in rural areas — because dispersion made the economics impossible. The market alone would never have reached the rural population without government intervention. Synthetic brain infrastructure faces similar constraints, not urban versus rural, but power. And the timeline matters enormously. Rural electrification took 20 years even with massive government subsidies. If the infrastructure associated with synthetic brains faces anything close to this dynamic, the current build-out is concentrating in precisely the markets most vulnerable to overcapacity. Everyone is building. And much like the Economist and The Times during the railway period, everyone says the truth out loud: we have to do this, we must do this, because the winner gets so much. And yet we don't know what winning looks like. We are also in the unusual position where the companies with the capital to make these bets are controlled by very few individuals, which allows trillion-dollar commitments to happen at a speed that would have been impossible in any prior era. We will see what happens. But keep in mind that the uncertainties and variables in this case are, in many ways, more numerous than in the past. Scale Economics and Natural Monopoly Characteristics

Electrification demonstrated that the generation and distribution of electricity has natural monopoly characteristics. Fixed costs are enormous and marginal costs are near zero — once you break even, everything else drops to the bottom line. Scale economics are so powerful that one large provider can serve a market more efficiently than multiple competitors. This leads inevitably to regulation, because you cannot have one company providing a critical infrastructure to an entire market without oversight. The structure we have today — single utilities serving geographic territories under regulated rates — is the natural outcome of that logic. Synthetic brain infrastructure has similar characteristics but different market structures. The fixed costs — data centers, GPUs, cooling, power — are certainly enormous. Marginal costs for additional compute are very low. And scale economies are extraordinarily powerful: large operators can negotiate better pricing for GPUs, power rates, and customer contracts. The natural outcome should be consolidation toward an oligopoly — three to five major providers of foundational models dominating the market. And yet synthetic brain infrastructure will not become regulated in the way electricity was, because compute isn't geographically constrained. Customers can access it from anywhere via the internet. This means competitive dynamics persist even as scale economies consolidate the market. The likely outcome is oligopoly competition where three to five big providers capture perhaps 80 to 90 percent of the market — whatever it actually turns out to be — with dozens of smaller players competing for niche segments. From that point of view, this is far more competitive than a regulated electrical monopoly. It is fragmented. The game is on, and the clock is running faster than it did for trains or electricity.

The Capital Destruction That Happens During Transitions If it isn't obvious yet, let me state it plainly: every major technological transformation involves people losing a lot of money. Electrification destroyed enormous capital during its transition phase, just like the railroads did. The lesson is clear. Transformative infrastructure destroys capital for early investors because there are too many unknowns. We don't know how things are going to be done. We don't even know if large language models are going to ultimately take us as far as we want to go. The timing of entry determines returns far more than the validity of the transformation. Perhaps the hardest thing for investors to do is not to be intelligent, but to not participate. Synthetic brain infrastructure appears to be in a capital destruction phase at present. The buildout seems necessary. The companies have devoted themselves to it. The technology is transformative — at least to this writer. But the capital structures and competitive dynamics suggest that value destruction will have to precede final consolidation, settled standards, and clarity on use cases. Investors deploying capital today are like the investors who supported Edison's earliest work, or the earliest investors in railway stocks — funding a necessary transformation that will destroy investment while benefiting society later. The strategic positioning for me is simple: avoid the capital destruction phase. Wait for consolidation. Wait for the use cases. Wait for the standards. Was Google the first search engine? Of course not. Was Facebook the first social network? Of course not. And by the way, even once the dominance of certain companies becomes clear, one can still wait. If one had looked at Walmart's hub-and-spoke model when Sam Walton took it public in October 1970, covering six states, one might reasonably have waited to see if it worked across the country. Buying the shares in

the late 1970s — with more information and less risk — would still have been an extraordinary investment for the next 20 years. The question at the time of this writing is not how to get exposure. It is how to avoid it. Different investors with different mandates will approach this differently, and that is what makes markets. What seems clear is that there will be obvious losers, there will be bankruptcies and consolidations. Investing in whatever the consolidated oligopoly turns out to be will likely be rewarding. Alternatively, investing in the hyperscalers themselves bypasses some of the specialist operator risk — they are building synthetic brain infrastructure, but as diversified conglomerates with strong balance sheets, they may be able to weather various storms. Those expecting their shares to simply go up may be disappointed, but avoiding calamity is also a form of winning.

Technology Obsolescence Versus Infrastructure Permanence Electrical infrastructure from the 1920s was functional for almost 70 years. Transformers, transmission lines, and distribution systems required maintenance but did not become obsolete. The same can be said for railway tracks. But computer chips are obsolete in three to five years, every generation. And unlike semiconductors generally — where there is always demand for lagging-edge chips, and a facility at TSMC might produce older designs continuously for 20 years — the companies participating in this multi-trillion-dollar investment are only interested in the most advanced and are locked into continuous improvement. This inverts the lessons from electrification. In electrification, patient capital won because the infrastructure lasted long enough for the market to mature and generate returns. In synthetic brain infrastructure, patient capital faces a completely different challenge. By the time the market consolidates and business models stabilize, the physical assets may be obsolete. Patient

capital might inherit mature business models operating on legacy technology generating lower returns. Though, honestly, we don't even know what the future looks like — which is another reason to wait. And if one must get exposure, do so with the companies that could make trilliondollar bets and still survive, or with the companies that make components that are needed regardless of how the standards war resolves. Memory is one such thing, but there are many others.

The Final Lesson From Electrification Transformation can take longer than the promoters say, but deliver far more than anyone expects. Electrification was a genuine lesson in how transformative technology can take a long time to deploy, standardize, get right, and become sustainable. The promoters in the early days were always too optimistic. But the skeptics who dismissed the technology as too expensive ever to replace gas lighting were completely wrong, just as anyone who owned a canal and missed the railroads was going to be in serious trouble.

Part I: Historical Patterns

Fiber-Optic Overbuilding, 1996–2001

Section 1: A Brief Comment About the Gift of Bubbles Bubbles in capital allocation are in many ways one of the least discussed examples of how capitalism accidentally becomes a wonderful source of charitable giving. What happens is a massive inefficiency leads to money being spent where it shouldn’t be spent, but often creating a rich environment from which other things will eventually grow. Capitalism is often criticized for operating without moral constraints, and that can be a fair criticism. Sometimes though, when capitalism goes completely awry, it can produce very unusual social benefits precisely because it was unconstrained. Often, as discussed in a previous analysis (Volume 2 of The Ocean Doesn’t Care About Your Swimming Lessons), bubbles will begin with a psychological mania, and like all the examples discussed herein, it is rooted in something that is not just real, but is absolutely extraordinary. Then the progression follows a pattern, which is the same pattern seen in cult movements and any kind of mass hysterias. Excitement starts to feed on itself, and the core truth gets inflated more and more. The beginning often has a lot of different kinds of individuals uniting, but at the end, they disperse for different reasons. And those who suffer from the irrational investing will often set the groundwork for the next source of innovation, because essentially, as discussed in the analysis of railroads and electrification, by absorbing the costs that no rational actor would actually absorb in a capitalist system — building infrastructure

that no investor would actually finance — you essentially give a gift to society, which eventually gets monetized. Although before that, a lot of people lose a lot of money. The fiber optic overbuild from the late 1990s is really the perfect case study of this. In one aspect, it was part of a broader internet phenomenon, but it had its own kind of logic in one particular sense, which repeats often in many of the following pages: the concept of infrastructure. These fiber optic companies were sort of the conservative approach to what was clearly a mania. These companies weren’t speculating on a startup that had no revenue or no customers. They were really building infrastructure. The analogies were very apt — unfortunately, a little bit too apt. They were toll roads, picks and shovels. Any student of history might notice they were also a lot like the railroads: the supposed safe play for the wise investor. It was a way of not necessarily betting on who was going to win, because of course, whoever won was going to need these new toll roads of the internet. The bet failed and it was a catastrophe. Like most things in markets, it did not overshoot by a little, it overshot by a lot. Sure enough, over time, the system consolidated. When infrastructure is overbuilt but the long-term trend is still accurate, the bust sets conditions for the next boom. Let’s get into it.

Section 2: The Promoters — January 1996 One thing that will be noticeable about almost all of these technological booms and busts is that there are promoters. There needs to be someone who cheers on the overbuild, who cheers on the new technology. Without that, nobody would know about it. The source of the mania, or the seed of the mania, couldn’t be planted. It really starts, in my view, in January 1996. In this case, it is reasonable to put the promoter cap on the head of Bernard Ebbers. He was meeting with investors at the Waldorf Astoria

and discussing a vision of telecommunications — a vision that would ultimately cost investors something like $2 trillion. Mr. Ebbers was the CEO of a company called WorldCom. It was a Mississippi-based long-distance carrier that grew through aggressive acquisitions and extended its fiber optic networks across the United States — very much an outsider coming in. He was southern and would often mesmerize the room with his charm. “Internet traffic is doubling every 100 days,” he declared. “The world will need unlimited bandwidth and WorldCom will provide it.” It doesn’t take anybody particularly sophisticated to note that unlimited is unlikely. The skepticism, however, would not be rewarded for some time. WorldCom had grown its revenues from $154 million in 1989 to $4.5 billion in 1995: a 30x increase in six years through organic growth and 60 acquisitions. The stock had risen 7,400% since going public, making early investors millionaires. Ebbers projected that by the year 2000, their revenue of $4.5 billion was going to become $25 billion. The same month, 2,500 miles west in San Jose, Gary Winnick was pitching his own vision. A 45-year-old former junk bond salesman who had cut his teeth under Michael Milken, Winnick had founded Global Crossing just months earlier with an audacious plan that was very competitive with Ebbers’. A fiber optic network would encircle the entire world, connecting every major continent with undersea cables capable of carrying terabytes of data. It seemed glorious. CalPERS, one of the most conservative and large institutional investors, immediately committed $250 million. “The internet is creating a new world order,” Winnick told his early backers. “National boundaries are becoming irrelevant. Data wants to be global, instantaneous, and unlimited. We’ll be the first true global communications company.” This again is quite typical. He’s not wrong. The question was how long it would take and how much money it would require.

Whether it was Ebbers or Winnick, neither understood that they were really on the edge of an infrastructure boom-bust cycle that would consume at least a trillion dollars — depending on how you count it — and leave roughly 98% of the capacity completely unused. This unused capacity got a special name: dark fiber. Worthless fiber. It might as well have been worthless capital. It was going to destroy every company that partook in this overbuild.

Section 3: The Economics — And Why They Looked Bulletproof Very much like the other case studies discussed, the economics looked bulletproof — another criterion for a lovely bubble. Fiber optics were miraculous in many ways. In retrospect, one cannot deny the properties. A single strand of glass, the width of a human hair, can carry 2.5 gigabits per second — the equivalent of roughly 32,000 simultaneous phone conversations. The specifics aren’t that important, but what is important is that the technology was absolutely novel and extraordinary in terms of what it could do. Whether or not it would need to operate at that scale of capacity is another story. There was no question that these tiny bits of glass, essentially turning on and off to communicate digital signals, were an extraordinary invention — perhaps just deployed at too large a scale. The capital costs, at first, like most of these bubbles, were manageable. Laying terrestrial fiber costs about $30,000 per route mile in urban areas and $20,000 in rural corridors. A crosscountry New York to Los Angeles route would require $70 million for fiber alone. That doesn’t seem like a lot, does it? Undersea cables were more expensive — the $1.5 billion Atlantic Crossing 1 system ran 8,600 miles to the UK and Germany — but running fiber optics through the sea costs more. For those interested in the economics — and I would put this as a footnote, actually, because it’s a bit boring, but it just shows how compelling it looked — the revenue math seemed

pretty good. In 1996, ISPs paid $1,000 to $2,000 per megabit per second per month for long-distance bandwidth. A single OC-192 circuit generated $10 to $20 million in annual revenue. If a $70 million cross-country route could carry 160 of those circuits, the potential revenue was $1.6 to $3.2 billion annually — a payback of less than one year. The return on capital was enormous. Again, that assumed the fiber was going to be used. Wall Street became incredibly excited, and as stocks rose, they seemed to get cheaper and cheaper. In April 1997, Morgan Stanley published its famous, or perhaps infamous, “The Bandwidth Boom,” projecting that internet traffic would grow 1,000% annually for five years. The firm raised its WorldCom price target to $55, which meant it would be worth $45 billion — at the time more than AT&T, which had 300,000 employees. Salomon Smith Barney analyst Jack Grubman, the most influential telecom analyst of the time, put a $70 price target on WorldCom. “Bernie Ebbers is building the AT&T of the 21st century,” Grubman wrote in a report that became required reading for anyone in the investment business. His research often moved markets by billions. These individuals were the promoters of the theme, and capital flooded in. This became more and more of a self-fulfilling prophecy. Keep in mind that when we imagine a bubble, we imagine a sort of instantaneous concept. However, this was occurring not within a year, or even two years, but several. In 1996, telecom companies raised $18 billion. Two years later, another $64 billion. In 1999, $121 billion. From the perspective of a fund manager whose investors are examining returns on a monthly basis, this is three or four years of slow passage. It didn’t necessarily seem so fast, even though in retrospect it looks like it was. Three years can feel very slow, particularly for managers such as the value investors of Omaha who were not participating. The year 2000 was the ultimate: $178 billion was raised, which came down to something like $500 million a day. There was nobody who didn’t

have exposure, and those who didn’t were normally taken off the field.

Section 4: The Competitive Cascade, 1998–2000 The flood of capital started to create competition. And competition, when demand is already questionable, can be a catastrophe. If you have one company building a fiber optic network, many others can too. Each announcement would trigger a competitor, because nobody wanted to be left behind in the race. Once the concept was set and the race was on, being left behind was framed as an existential problem — a dynamic very commonplace in today’s discussions. Level Three Communications announced in March 1998 that it would spend $12 billion to build a 16,500-mile North American fiber network. By the late 1990s, many of these infrastructure projects were being funded by bonds — investors would accept a 10% interest rate with very few covenants or protections, certain that Level Three could reach positive cash flow within three to four years. 360 Networks announced in June 1998 it would build 18,500 route miles connecting every major North American city. That Vancouver-based company raised $1.1 billion in an August 1998 IPO where shares rose 40% on the first day, valuing the company at $4.8 billion — even though they had laid exactly zero miles of fiber. “We are building the most advanced network on earth,” the CEO told Bloomberg Television. “By 2001, 360 Networks will generate $5 billion in annual revenue with 70% EBITDA margins. This is a once-in-a-generation opportunity.” Perhaps the most audacious bet in this run-up was Qwest Communications. In July 1999, the Denver startup — founded just two years earlier, operating a single route along a railroad right-of-way — announced a $34 billion hostile takeover of US West, a regional Bell company with 25 million customers and $15 billion in annual revenue. Qwest’s stock traded at a $50 billion market cap even though it had only $600 million in revenues,

which meant it could use its inflated equity and issue shares to keep on buying. It was a prisoner’s dilemma, though frankly, it was more just stupidity. Bernie Ebbers perhaps best explained the logic of the time in a 1999 earnings call: “We are in a land grab. The company with the most comprehensive network will win. The company that hesitates will be extinct.” It is remarkable how these words could be applied to someone building railroads, electrifying a nation, or building data centers. The things change; the words are the same. In any event, the core fallacy of it all received an unfortunate stamp of approval from an unlikely source: the Commerce Department. Its notorious 1998 report, cited as gospel — certainly the government would know exactly what investors needed to know — declared that internet traffic was doubling every 100 days, implying annual growth of 1,000%. I do not have the basis of how the Commerce Department report was created, but this essentially became the bedrock of essentially every financial model run by any analyst. It was not true. The traffic was growing enormously fast, as one would expect in a major revolution, but fast meant doubling every year. That is quite fast, but it is very different from growing 10x every year. That factor of 10 was the gap between what was required and what actually happened.

Section 5: The Reckoning, February 2000–December And so it was inevitable that there would be a reckoning when the economics would not work. Because this was infrastructure, there was something that could be proven or disproven. The turning point came not with a crash but with minor disappointments that started to accumulate. First, major internet backbone provider PSINet announced it would miss its revenue targets. The stock fell 25% in a day — the invisible hand of the market giving a hint of what was to come. In March 2000, the

NASDAQ peaked at 5,048 and began its long decline. At that point, although the believers might still believe, the share prices were starting to react appropriately to the overcapacity. The dominoes fell. On August 14, 2001, Northpoint Communications filed for bankruptcy with $1.2 billion in debt. This was the first of several: Winstar, $6 billion in debt; Rhythm’s Net Connections, $1.4 billion; ICG Communications, $1.7 billion. The list goes on and on. And what about the great Global Crossing? On November 9, 2001, the stock — which had traded as high as $61 — closed at $1.20 as the company disclosed that customer contracts were being renegotiated or canceled. Demand simply was not there. Perhaps the most honest comment of the time came from an unnamed executive quoted in the Wall Street Journal: “We overestimated demand by a factor of 10. We built capacity for an internet boom that’s now bust.” By January 28, 2002, Global Crossing filed for bankruptcy with $12.4 billion in debt. WorldCom’s unraveling took a bit longer, but was all the more spectacular. Like most bubbles, the heroes and promoters very rapidly became the villains. There were SEC investigations, and some of the largest bankruptcies in history. When money is being thrown around at speculative projects, corruption tends to follow — and this was no different. The fraud was everywhere. WorldCom had capitalized $7.2 billion in routine operating costs as long-term assets, overstating its asset base by $11 billion. The shenanigans that can occur on an accounting basis when dealing with infrastructure are quite enormous — the depreciation schedule of infrastructure assets is perhaps an extraordinary lever for changing the apparent profitability of a company. The dynamics of data centers today are no different. WorldCom ultimately filed for bankruptcy with $107 billion in assets and $41 billion in debt, the largest in U.S. history at the time, surpassing Enron’s collapse six months earlier. Ebbers was sentenced to 25 years in federal prison.

Level Three was perhaps the only company that avoided bankruptcy, but it destroyed shareholder value just as effectively as the rest. The company spent $14 billion on a network funded with $10 billion of debt and $4 billion in equity. Its stock price eventually declined 99%. By 2003, $2 trillion of market capitalization had evaporated. Some 500,000 jobs disappeared in the United States — roughly 20% of the sector’s workforce. The estimated 2 million individual investors who lost money in the telecom space had no compensation beyond tax write-offs. And so, like most things, it ended poorly There was this guy who had brain damage and lost his ability to feel emotions. He didn’t turn into Spock. He fell apart. He couldn’t decide which socks to wear because logic couldn’t tell him which ones were “better.” He spent hours staring at them. Emotions are shortcuts. They’re a value function. “This is bad, run away.” “This is good, eat it.” AI doesn’t have that. It doesn’t care if it lives or dies. It doesn’t care if it’s right or wrong. Without that “gut feeling” (or a mathematical equivalent), it can’t navigate the real world. It just spins its wheels. This is the problem with “Reinforcement Learning.” We train these things to act like smart people, but they just learn to sound smart. They care about the vibe, not the truth. They’re like students who figured out how to ace the test without learning the material. Impressive, but I wouldn’t let them fly the plane. There is something quietly astonishing about how simple systems outperform complex ones across a wide range of conditions. Emotions, which we often dismiss as primitive, reveal this truth with remarkable clarity. They do not calculate or simulate; they do not weigh endless permutations. Yet they guide behavior with extraordinary reliability. A child recognizes danger without solving an equation. A parent senses mistrust without analyzing a dataset. These signals are coarse, but they are coarse in the right way. Their simplicity is

precisely what enables them to operate in environments wildly different from the ones in which they evolved. We inherited instincts shaped for forests and plains, yet we apply them to skyscrapers and financial markets. That they function at all is a testament to the power of robust, low-dimensional systems. Complexity has strengths—precision, nuance, sophistication—but it is fragile. A complex system often fails outside the narrow conditions under which it was built. A simple system acts like shock absorption; it bends where complexity shatters. Even when emotions misfire, the system remains functional because its underlying logic is stable. It errs, but it errs consistently. In machine learning, we sometimes forget that robustness cannot be bolted on as an afterthought. A model with billions of parameters may perform dazzlingly on specific tasks, yet collapse when the distribution shifts slightly. This is not because it lacks “intelligence” in the superficial sense, but because it lacks the grounding that simple evolutionary systems provide by default. Human emotional machinery demonstrates a crucial truth: intelligence that survives contact with the real world must have a stabilizing substrate. It must have something that orients behavior when surface-level reasoning becomes uncertain. The world is too unpredictable for sophisticated reasoning to bear the weight alone. Artificial systems will eventually need an analogue to this— perhaps not “emotion,” but a structural equivalent that allows them to remain coherent when the environment shifts. Humans are sample-efficient. You show a kid one cat. They get it. “That’s a cat.” You have to show an AI a million cats, from every angle, in every lighting condition. And it still might think a dog is a cat if the lighting is weird. We don’t just memorize; we build a mental model. We understand the structure of a cat. AI just memorizes the pixels.

There are two versions of the generalization puzzle. The first concerns sample efficiency. The second concerns instruction. Humans can be taught in ways that are impossible to replicate in machines. A mentor does not need to break down every decision or engineer a reward signal for every step; they can simply explain their thinking. The transfer happens naturally. Evolution explains some of this—vision and dexterity are clearly inherited—but it cannot explain everything. There was no natural selection pressure for algebra or Python scripts. Yet humans exhibit competence in these domains, transferring learning across boundaries that never existed in our ancestral environment. We possess a general learning algorithm of extraordinary flexibility that infers structure from sparse clues. Current AI systems diverge sharply here. Deep learning approximates patterns; it does not infer structure. When the patterns shift, the system breaks. When asked to learn something new, it often overwrites what it already knows—a phenomenon known as catastrophic interference. The model is like a student who can recite the textbook but cannot apply the ideas without direct prompting. Generalization is not simply solving a new problem; it is inferring the governing principles of a domain so thoroughly that new problems become legible immediately. It is seeing the geometry of the task, not just the surface. Until we solve this, everything else is clever imitation.

❧ RL is: Try, Fail, Learn. It’s how you learn to ride a bike. You can read every book about cycling in the world. You can write a poem about cycling. But the first time you get on a bike, you’re gonna fall. Defenders often argue that to imitate language at scale, one must build a world model. They claim that because LLMs can reason about hypothetical situations, they possess this model. I

disagree. To mimic what people say is to model the output of entities that have world models. You are copying the shadow, not the object. A true world model must allow you to predict what will happen, not just what someone might write about it. The ocean doesn’t care about your swimming lessons, and reality doesn’t care what text you’ve read. Alan Turing suggested that what we really want is a machine that can learn from experience—from the things that actually happen. LLMs are imitation without grounding. You cannot learn to swim by reading about water. We are investing heavily in systems trained on description rather than interaction. The more durable bet may lie with those trained through direct engagement with reality. People love to say that LLMs give us a “prior”—like a starting point of knowledge. But here’s the issue: a prior only implies something if there’s a real world you’re trying to match. LLMs don’t have a world. They have text about a world. In Reinforcement Learning (RL), there’s a scoreboard. You do a thing, you get a point (or you die). There’s a “right” answer. In LLM land? There’s no right answer. There’s just “what would a human probably write next?”. It is engineering guided more by stylistic alignment than by objective grounding. And don’t tell me “predicting the next token” is a goal. A goal changes the world. Predicting a word changes… nothing. It’s a loop with no consequences. John McCarthy defined intelligence as the computational part of the ability to achieve goals. A goal that doesn’t change the world isn’t a goal. Without a goal, there is no truth condition. The LLM paradigm attempts to bypass this entirely, and in doing so, starts in the wrong place. If you aren’t trying to achieve something in reality, you aren’t intelligent. The system risks becoming little more than an advanced mimic.

We invented the term “AGI” because we were frustrated that our “smart” computers were actually idiots. So we built a Librarian. It knows everything but can do nothing. You can’t cure cancer by reading PDFs. The cure isn’t in the text yet. We have become so enamored with the librarian that we forgot the obvious: we are asking a frog in a well to tell us about the ocean. Future breakthroughs in biology, energy, and materials live in phenomena no one has measured yet—in failed experiments and the noise of reality, not in PDFs. Real intelligence is a Universal Apprentice. A 15-year-old kid who knows nothing but can learn anything. Current AI is the opposite. It knows everything but learns nothing once you turn it on. It has no memory (tab closes = brain wipe). It has no friction (never touched reality). Memory: Current systems die when the tab closes. A research firm where every analyst is factory-reset nightly would never build institutional knowledge. An AI needs to accumulate “scars.” Reality: Text is smooth; reality is high-friction. You cannot learn to swim by reading about it. The systems that matter will ingest sensor streams and control robotic arms, learning from torque curves and fluid dynamics. Truth: We currently train models to please us (Reinforcement Learning from Human Feedback). But physics doesn’t care if an explanation is popular. Structural engineering doesn’t care if a bridge design makes you feel good. The bridge stands or it collapses. It takes time, but the chickens eventually come home to roost. Reason: Systems like AlphaProof don’t just mimic intuition; they bypass it by searching solution spaces we can’t see. We need systems that reason from first principles, not just parrot human priors.

I’m sorry to tell a trillion dollars of venture capital this, but my kid didn’t learn gravity by reading a textbook. He didn’t have a “pre-training run.” He just stood up, wobbled a bit, and fell flat on his face. It took guts. He cried, he got back up. He learned because the floor is hard, not because he read a Wikipedia article about ‘Hardness.’ That’s the difference. From an investor’s perspective, the financial dynamics are difficult to ignore. But I know how money works. The labs are terrified. They aren’t doing “science” anymore; they’re doing product development with a gun to their heads. The fear of being left behind is palpable, percolating into the daily affairs of these labs and creating distinct cultures. I laughed when I saw the model names. “Haiku”? Really? Basho might take issue with Anthropic’s definition. But the weirdest part is how disconnected it all is from actual biology. We’re building these massive statistical engines and pretending they’re brains. They are not. I suspect our brains are not merely Large Language Models. There is so much we learn in life that isn’t taught to us or mimicked, but learned via experience—particularly in our early days. While this seems obvious, it is a surprisingly controversial view within AI labs. I have yet to see a model that has a real “value function”— something that tells it what matters. These labs often lock onto specific architectures, recognize mechanisms that work, and then scale them to the absolute maximum. If you had asked me ten years ago if the dominant consumer AI would be a chatbot, I would have said surely not. I imagined something far more advanced. Yet here we are, speaking into text boxes as if we just invented instant messaging. If we were still in a research phase, we’d be trying weird, small ideas. But the “Commercial Viability” train left the station. Now it’s just: Scale It.

The attitude of the research environment is critical. If you have a multi-billion dollar valuation and have made massive commitments to a specific model paradigm, you are unlikely to change your mind. This creates perverse incentives. As my brother—a preeminent researcher—often reminds me, science is haphazard. The most prestigious journals often have the most retractions because true breakthroughs are rarely linear. They are often discovered accidentally in a rich, basic science environment, not by taking one thing that worked and scaling it to infinity. It reminds me of guys cutting down a forest with dull chainsaws because they’re too busy to stop and sharpen them. If we hadn’t promised 1% of GDP to this specific architecture, maybe we’d have found something efficient by now. But we’re stuck with the brute force method because it’s the only one that keeps the stock price up.

Here’s the irony: The AI revolution is being choked by… boring metal boxes. You can have all the H100s you want. If you can’t get a stepdown transformer (lead time: 2 years), your data center is just a very expensive warehouse. Power is no longer just an operating cost; it is the fundamental boundary of the enterprise, representing thirty to forty percent of the cost to run large-scale AI infrastructure. The constraint is tightening. Interconnection delays with utilities, which once took months, now stretch from two to six years in many markets. This scarcity forces companies into strategies that sound like science fiction. Some are securing permits for data centers powered directly by small modular nuclear reactors. Others are partnering with existing nuclear facilities to create “AI campuses” with stable baseload power.

We are running into the limits of the physical grid. People are seriously talking about building nuclear reactors just to run chatbots. It is the small hole that sinks the big ship. Without transformers, nothing runs. This isn’t code anymore. It’s steel, copper, and permits.

Part I: Historical Patterns

Lessons from History: Conclusions

Synthetic brain infrastructure today will follow a similar arc. It seems we are on a faster clock, taking perhaps a little longer than the projections, but ultimately delivering far more than anyone expects. Unfortunately, that far more is difficult to describe in terms of product type or adoption. Just because the technology exists does not mean it will be used. When I try to explain this, I say: imagine if five brilliant people showed up at your business, offering to work for a very low wage, 24 hours a day. It is not necessarily clear what you are supposed to do with them. This is the ultimate irony. Pure abstract technology that doesn't solve an existing and understood problem is probably going to be more significant than the railroads or electrification. But it is going to take a lot more to understand how it is going to impact society. If you had a canal and were moving people from A to B, the cost was clear. If you were running a factory on steam and could switch to electricity, the benefit was obvious. But right now, what is the thing we so desperately need that will be broadly used to make things meaningfully better? That is what the future will tell us. So let the games begin. I wonder what the future holds. It is going to be bloody and magnificent, and I can't wait ton see.

Part II: Synthetic Brains

AI Dogma: A Brief Note

One thing we have to be cautious about is instantly becoming dogmatic about how AI is created without questioning the fundamentals. Right now, even a fairly sophisticated AI investor—or even a scientist—would imagine that AI works universally the same way. You have this neural net, you feed it data, then you start to measure how wrong it is when you test it. You push it in a direction, ask it to do it all over again. You might do this with another AI hundreds of times, thousands of times, maybe millions of times, until you get the outcome you really wanted. Then you have all sorts of other tools—different names: backpropagation, which is of course one of the more original bases of AI; iterative optimization; various forms of reinforcement learning; gradient descent. In conclusion: that’s how it works. That’s how you build AI. This is the great problem because it’s completely new. We embraced it because it worked. We’ve now put trillions of dollars of capital behind it because it worked. However, the likelihood that this is the only way is very low in my mind. It’s too soon. This is partially why a paper I recently read, I don’t think, is going to decay in its value to the reader very quickly. It involves a framework of learning without training—the idea of skipping the optimization entirely. Now, where have I heard about that before? Oh, that’s right: growing up as a baby. The basis of most models and most methods of creating those models essentially has no trial-and-error randomness.

Everything is based on learning via mimicry or learning via training and perhaps some different value functions. There must be more, because I have witnessed the learning of a young child occur that wasn’t just mimicry but was through its own selfdiscovery and its own experience in its environment. The child was never filled with data—it just learns. The idea that we have lived our whole lives learning this way, and yet we believe it is completely irrelevant to the concept of AI, is almost comical. So there are some papers beginning to come out that address this issue. One in particular—which I’m sure will be dated very quickly, because I think many more are to come—is about constructing a model directly from the data using mathematical theory: basically derive some kind of bound of error, and then you just stop. You would think it wouldn’t do anything that useful. The training concept lasted as long as it did partly because nobody really bothered to ask: is there another option? It’s all moving so fast that nobody wanted to ask the question, and the more capital is invested behind the current concept of training, the more people don’t want to answer it—because they might not like the answer. So let’s see what’s going on in these labs. It will not even take a particularly innovative breakthrough. Just the mere demonstration of other methods, which certainly exist, will show that the current dogma of AI creation and model creation is just that—dogma. It’s a representation of a method that is now considered the only way. There are many other ways, and we will soon find out what those are.

Caveat emptor! I'm AI, I'm a Big Deal. At Least One Day.

AI: I'm jumping to the front of the line. Bartender: No you're not. Back of the line. AI: You realize I'm a really big deal in financial markets and in Silicon Valley. Bartender: Everyone who walks in here thinks they're a big deal. I don't see anything. Back of the line. Here’s the thing about humans: we get used to “miracles” way too fast. One minute it’s magic, the next minute it’s boring. Same thing is happening with AI. If this was a movie, the world would change overnight. Aliens land, sky opens up, etc. But real tech change is… slower. Grinding. You only see it when you look back. Psychology is weird like that. We stop calling things “technology” the second they actually work. Light switches? Not tech. Just a switch. “Technology” is basically our word for “stuff that doesn’t work perfectly yet.” We have an ability as human beings to become accustomed to situations that only moments before felt special. Surprise transforms into normal, and the the miracle is just another thing. That is precisely what is happening with artificial intelligence. If a novelist were scripting this era, AI's arrival would be cinematic: some breakthrough emerges, the world convulses immediately, and society reshapes itself almost overnight. But real technological change rarely behaves like a screenplay. It proceeds in increments, imperfectly and unevenly, yet with a kind of inevitability that is only obvious in hindsight. There is another point about human psychology that deserves attention at the outset, and that is how quickly things become invisible. We speak constantly about technology, but what actually earns that label? When I flick a light switch, no one calls it technology, though of course it is. In practice, we reserve the word for those developments still in the process of becoming understood, whose implications remain partially obscured.

Advanced technology simply pushes this ambiguity further, stretching the boundary of our comprehension. Artificial intelligence occupies exactly this ambiguous category. Imagine a traveler from the age of canals encountering the steam engine and the early railroad. To that traveler, the railroad was an alien, almost unsettling invention. Today, it is so embedded in our world that we barely register its presence. A landline phone, once a marvel of electrical engineering, is now regarded as a relic. Every major technology follows this trajectory from shock to normalcy, and AI is walking the same path. With that in mind, it becomes easier to understand the paradox of the present moment. AI's capabilities are significant— so significant, in fact, that many people paying close attention wonder why they do not see more obvious signs of its impact. Our vocabulary is already shifting to accommodate this confusion. No one talks solely about "adoption" anymore; they speak of "dispersion," the quiet spread of a technology before anyone fully recognizes it. That pattern is not unusual. What is unusual is the scale of the investment flowing into this supposedly subtle transition. We are devoting perhaps one percent of GDP to AI-related infrastructure, and yet this investment feels almost modest. The sensation is deceptive. Technologies that reshape civilization rarely announce themselves theatrically. They arrive abstractly, and only later reveal their full force. This raises questions about how we interpret progress. In school I learned that Christopher Columbus "discovered" the New World. Later I learned that the Vikings likely reached it first. In more advanced readings, I encountered speculation about the Phoenicians. Whether those stories are true or not is less important than the lesson they imply, which is that breakthroughs occur, fade, and are rediscovered, and that the people living through them rarely understand what is happening. That is the

condition we inhabit now. We are inside something that will later appear obvious but currently feels diffuse and inconsistent. If, a decade ago, someone described the AI systems of today, most people would have imagined something far more tangible. The abstract nature of these models makes the transformation harder to perceive. Before the rise of multitouch screens, the physicality of the interface made the shift feel substantial. A touchscreen had presence. It was tactile and obvious. Yet despite the industry it unlocked, it was ultimately superficial. The most consequential technologies are the most abstract, and AI is abstract enough that even those building it struggle to anticipate how completely it will weave itself into everyday life. This period of abstraction will not last. As use cases become standardized and widely distributed, the conceptual fog around AI will dissipate. Its impact will be enormous, but it will arrive through the gradual accumulation of thousands of practical integrations. That makes it difficult to discern in real time. Investors, technologists, and casual observers often find themselves wondering why there is such a gulf between headlines and daily experience. That gulf is simply the lag between capability and assimilation. Scale helps clarify the magnitude of what is unfolding. Sam Altman has remarked that OpenAI may require ten gigawatts to operate, potentially more. Initially, such a figure barely registers. Only when revisiting old notes did I fully appreciate that ten gigawatts is equivalent to the continuous power consumption of ten Philadelphias. Once that translation sinks in, the scale becomes harder to ignore. We are not talking about metaphorical energy; we are talking about the load of multiple major cities sustaining a single computational organism. This implies something profound about the nature of the data centers being constructed. They are no longer the tidy server rooms of early internet mythology. They are closer to industrial installations, electrical infrastructures the size of small states

dedicated to training and serving intelligence. Constructing that kind of capacity is extraordinarily expensive. Reasonable estimates place the required capital somewhere between five hundred billion and one trillion dollars over just the next few years. This moment is defined by a collision between abstraction and physicality. Intelligence, something we cannot touch, now demands the construction of facilities, power plants, substations, and transmission lines on a scale associated with national infrastructure projects. That combination is unusual. It means that the future of AI is not simply software evolving in the cloud but a physical reconfiguration of the energy and industrial backbone of society. Nothing about it is hypothetical. It is as real as steel and as concrete as the substations being poured into the ground.

Part II: Synthetic Brains

The Confabulation Machine!

A recent conversation with a biology lab got me thinking about the concept of hallucinations. They pointed out that hallucinations are very human. We don’t always call them hallucinations. Sometimes we call them poor memory. They pointed me to a significant amount of academic literature just describing how poor our memory actually is and how when we try to recall certain bits of information, our recall can be incredibly poor. We don’t typically call them hallucinations. I think the word hallucination is often associated with hallucinogenic substances and things of that nature. However, it’s important to understand that we don’t really know how we ourselves as humans actually assemble memory. We have quite a bit of literature in different academic areas that point to this direction. We Don’t Store Memories. We Reconstruct Them. So we are not like databases, a store of facts where we can simply retrieve those facts stored somewhere in our minds. This is not how it’s done. In fact, we don’t really know how it’s done, or at least my initial research suggests that there is very little understanding as to how memories are put together. More or less as I understand it, as a curious observer, is we reconstruct them. There is no file. And the way those memories are reconstructed is an experience we can all share, in that there are times when certain memories bring different kinds of emotions and different levels of detail. Perhaps triggered by different sounds or different locations. There’s certainly no hard

drive that leads us to recall a memory the exact same way every single time. The idea of a pristine copy of this memory of that wedding or that moment on Tuesday when you propose to your spouse— that memory is always being disassembled and reassembled, almost like in Star Trek being beamed up and beamed down, except each time there might be a subtle difference. For instance, memories are often more accurate when they are associated with stories and ideas versus specific facts. And we will not necessarily return to the exact same memory in the exact same way every time. It’s not deterministic and it’s not completely predictable. That’s not how intelligence works when we look at ourselves biologically. It’s really never how intelligence works. And yet we’re accustomed to these analogies of intelligence in which we have some sort of file system, we’re going to retrieve the information accurately, when this has never been the case.

The Evidence That Our Memories Fail Us Interestingly enough, there’s been quite a lot of work done about eyewitness accounts and how eyewitness accounts are incredibly flawed, not intentionally, but by accident. There are several studies that show the extreme nature of the lack of accuracy in terms of our capability of coming up with memories and how those memories can change. Often we think to ourselves, well surely when something is critical or even traumatic our memory will be even more absolute, more accurate. In truth, the lack of accuracy continues even in moments such as 9/11, and there are some studies where when people recall where they were they actually misremember or do not recall exactly where they were. They might have actually been in a different state but they recall being at their office. These are not done on purpose.

It points to the fact that we really don’t know why certain memories and facts are retained and why certain memories and facts are not retained.

The Confabulation Machine Is a Feature! Psychologists have been studying this phenomenon for a while, and the word for it is confabulation. It’s not particularly new science, and it’s not even really controversial science. As one biologist explained to me, it is fairly settled, which I guess means that it’s an understood thing in the neuroscience literature and the biology literature. Our brain is a confabulation machine. It always has been a confabulation machine. It’s just that we never really saw it as a flaw until we realized some of the repercussions it had in the judiciary system. Confabulation is the word that psychologists have used for a century now to describe basically the same phenomenon that we’re seeing now in neural nets, which were observed previously in biological brains. It carries no implication of it being a pathology, though. It’s just how it works. It’s accepted. It’s not understood in the human brain, but it is accepted. It’s simply describing what happens when we have a system that constructs plausible outputs that might be incomplete or not correct. Confident About the Wrong Things And here is the part that should stop you and make you think about what’s going on in AI. You are equally confident about the wrong things as the right things. Because when something is presented as a fact, we believe it. Because every time that you remember something, our organic neural network, this biological set of neurons we have, uses connections shaped by decades of experiences. And it will smooth out inconsistencies and construct some kind of coherent narrative from incomplete and probably degraded random

signals, for lack of a better understanding. It’s really quite unknown how this all comes together. And then of course, after this mysterious process of recollection occurs, we have kind of a story but it’s viewed as fact in our minds. But it is not fact. It is really kind of a draft of sorts that will almost never be repeated. It’s not even a best guess. It’s not going to be repeated exactly the same way every time you recall it. In fact, every time you recall it, you might alter the memory in some way, which we know full well is very possible. And so with the full weight of lived experience behind something, we still cannot make it absolutely robust. The Criminal Justice System Shows How Unreliable Memory Can be I point to the criminal justice system because this is perhaps where the research on the inadequacy of how we believe our memories work has been, at least in practice, brought to the fore. Eyewitness testimonies are considered by one lawyer that I spoke to the most confabulated form of evidence in American courtrooms. There’s even something called the Innocence Project, which has documented case after case of innocent people who spent literally decades in prison because of an eyewitness account remembering something with total certainty that was absolutely wrong. They were not lying. They weren’t confused. They were certain. And it wasn’t done out of harshness or meanness. In these cases, the witness’s brain did exactly what brains are supposed to do. It built the most reasonable story that it could from the signals it had, and it served up that story with a level of conviction that was highly persuasive to the person, and it was their memory. It’s a very odd thing to say to somebody, that’s not what you remember. And yet we also know that there are many ways of changing people’s memories, various forms of brainwashing and

influence, and so it should not be that surprising that our own memories are not always perfectly accurate. Storytelling is Not Storage This isn’t some kind of exotic neurological thing I’m talking about. This is something that happens all the time to people who say they have bad memories or people who aren’t paying attention. And a lot of the time when you look at the people who are so-called memory champions, it’s not really the creation of a hard and fast system of sheer memorization to construct the file in our minds. Quite the contrary, it’s the storytelling of an experience that can be told consistently and therefore reconstruct the memories accurately each time because of the consistency of the story. What is fascinating to me is that it is often the clarity of the storyline that helps trigger the memory. Perhaps that’s the algorithm in our minds that is able to reconstruct the memory more accurately rather than it being done piecemeal every time. This is a lot less like statistical prediction and much more like, to me, a story or maybe a hallucination. What does this mean for my new investment in an AI data center, startup (insert asset name)? And in that perspective, a hallucination as we call it in the AI is a very human-like thing. And so we need to kind of invert this concept of hallucination. Is hallucination perhaps something to examine as a source of intelligence? Large language models do the exact same thing, and they do it for the exact same reason, because they were built on the exact same fundamental architecture. A neural network, whether it’s biological or artificial, does not look up answers. It constructs them. It takes patterns encoded in its connections and weights and whatever the particular technology is, creates the signal available in the current context, and generates the best response.

Sometimes that response can be very brilliant, and sometimes it can be really wrong. But here’s what the database people, and I think kind of the current paradigm of large language models, doesn’t seem to address: the mechanism that produces the brilliant responses and the mechanism that produces the wrong ones are the same thing. We have people who will not have perfect memories, but can do things that are creative and brilliant that artificial intelligence fails to do. And they are almost more imperfect than the most sophisticated AIs. The amount of hallucinations somebody might have in the course of their day is going to be probably very high, just given our natural tendency to not be perfect in retrieving memory. So, you start to wonder, can these things be separated? Can you have a perfect recall of all memory, but have no nuance, no reconstruction, no hallucination, no abstraction?

HInton Hits It! Perhaps the one individual who has spoken out about this is Geoffrey Hinton, the individual who more or less invented the neural network architecture that has made much of this possible. And he has been saying something along these lines for years that most people refuse to sit with long enough for it to really be comprehended. And I think that’s partially because it’s frankly unproductive economically. It’s not where the dollars are spent. And it’s unclear what the path would be for research as I understand it. People who build their careers thinking about databases are really in my mind the last people on earth who would be the ones to contemplate some of these areas that in the biological world are recognized as being unknown. Because what’s actually happening under the hood of our brains is not at all what a predictable scientific mathematical algorithm would have.

Something Is Probably Missing in the Current Model Paradigm…if that thing is found fortunes will be lost. I don’t know what to say about this, but if we don’t know how we make our own memories, it seems like there is certainly an ingredient that could be missing in existing neural nets. The ability to really just synthesize ideas that were never really connected to original training data, that kind of synthesis might be part of the puzzle. The ability to construct explanations and create these analogies and patterns, these are not really the things that are programmed. They’re things that we kind of come up with. We hallucinate them, to some extent, if you want to use that word. And every little bit of it, all the things that make these systems so remarkable, run on the engine of our brain, which is the engine of confabulation. This is absurd to think of as complete, because we can see it in ourselves. It’s not reasonable at all. When you think about it, we don’t really know what intelligence is in that sense. We’re not building databases. We have a child who’s learning. And it is absurd to think that all learning done from a human being is done by mimicry or exposure to data. Of course there’s randomness. Of course there’s trial and error. And similarly, for a grown-up biological brain, of course we have confabulation. We don’t have perfect memory. And whether that’s a problem or not is unclear. Our understanding of the world—our insights into what we see and then what we recall—is related to novel ideas and abstraction. Given this aspect about biology, it stands to reason that the current AI models may be very good, and also may be missing something very important.

Why Basic Questions Matter. The framing of this will really matter, because of the speed of the capital expenditures and other things. Words and how we think about the models are going to shape how we spend money, how

the capital will be spent, and how we think about risk moving forward. Calling hallucinations a huge problem that needs to be mitigated implies something kind of pathological, that it is a sick brain. The system is sick, because it’s seeing things that aren’t there. And that is essentially dangerous, because what if the number is wrong? What if the distance is incorrect? Now, that’s why we have systems. We have checklists. We have lots of ways to ensure that our own confabulations are not going to cause a lot of problems. However, there are some areas where the confabulations might be very important. And the equivalent hallucinations may be just as important. And framing this problem as a thing to fix is maybe incorrect. The same feature that makes a human creative and, frankly, very adaptive to circumstances is also what could make you very wrong about remembering what you did yesterday. Confabulation is the word in biology, and it does have an analog in the AI world of hallucination. The capital expenditure has been very much built around reducing hallucinations and not exploring them in a basic science way. Frankly, if anything, there is a disincentive to do so, because any discoveries in basic science with biology, which would help us better understand ourselves and perhaps enhance the quality of neural nets, could actually cause a lot of problems. It would suggest that the paradigm, which may have still plenty of ways to go, is not going to be the paradigm that will get us to the level of understanding and creativity that we are expecting from these synthetic brains. Most dollars are spent not so much on the fundamental research of neural nets, but rather on taking what works and pushing it as far as it can until it stops producing results. And remarkably, it continues to produce excellent results, perhaps better than anybody expected. The Question Nobody Benefits from Asking right now…

So with that, we continue our march, building our synthetic brains, minds that think the way we don’t. They don’t have the messy regeneration, the confidence of being wrong and reunderstanding something and perhaps telling a new story, the ability to alter based on situations, which may be important for creativity. And for exactly that reason, it might be interesting that we can do things in our minds that perhaps no database or deterministic system is ever able to do. If there is a fundamental construct of our own brains that we forget about, we can’t really look at this intelligence and just ignore it. It doesn’t take a genius to recognize that if the very basis of our neurological functioning is recreation of memories in different ways at different times, and that this is sort of an analog to hallucination, which is viewed as a bug, it just tells you there’s a lot more to learn. A lot more technology to explore, which is not surprising given that not much time has actually passed. Unfortunately, this is just a question, one big one, but it is worth asking. And it’s remarkable it’s not being asked, because nobody benefits from this question given the money being spent.

Part II: Synthetic Brains

The Capability–Impact Disconnect

Before diving deeper, I should acknowledge that certain terms may appear without full explanation. Where that happens, the Glossary—written by me in the spirit of a common-sense nonscientist—is meant to help. Any inaccuracies in it are, of course, my own, though I've always felt that a slightly imperfect explanation that lands cleanly is better than a perfect one that obscures meaning. Now, to the puzzle at hand. There is something genuinely confusing about these models we are all trying to understand. In fact, there might be two confusing things, though they tend to blur together when you first encounter them. The first is almost comically superficial: the naming conventions. Some of the labels are so odd that one wonders whether the companies behind them should hire a luxury real estate agent from Miami to rebrand their releases. The names are disorienting enough that they distract from the core fact that the behavior of these models can be equally perplexing. We encounter a system that can produce exquisite haikus and at the same time struggle to faithfully copyedit a paragraph. It is hard to know how seriously to take something that alternates so quickly between brilliance and bafflement. The deeper issue is the relationship between evaluating a model and using a model. At first glance, the two feel aligned. A system that scores well on an exam ought to be useful in practice. But the longer you observe these systems, the more that intuition begins to fray. Einstein's famous remark about judging a fish by

its ability to climb a tree has a kind of inverted relevance here. We reward these models for climbing the tree magnificently, and then are confused when they falter on ground level. Inside the AI ecosystem, these evaluations matter enormously. Entire teams celebrate when their models ascend a leaderboard. These tests can be extraordinarily difficult—so difficult that the average person would be impressed simply by the questions, let alone the answers. And yet the economic value that such brilliance ought to generate lags conspicuously behind the test results. That lag is disorienting. How can a system capable of solving competition-level mathematics fail to reliably detect a missing word in a sentence? I encountered this contradiction most vividly in my own editing process for a longer book project. I attempted to use a leading model as a kind of high-end spellchecker. My instruction was explicit: do not change anything of substance, merely assist by noting missing words or minor errors. The model's response bore no resemblance to what I asked for. The chapter had been rewritten entirely. Naturally, I asked what had happened. The model apologized with theatrical sincerity and offered to try again. The second attempt kept the content intact but removed half the text. I again asked why, and the model explained that it had removed the parts it deemed irrelevant. This was, to put it mildly, not what I wanted. I clarified, it apologized again, and for several cycles we repeated the same ritual: the model professed understanding and then did something entirely different. Eventually I tried a different tactic. I told it that since it seemed to have strong opinions, perhaps it could rewrite the chapter as it believed it should be written. The result astonished me. My nonfiction chapter about investing had been transformed into a short story about an investment manager drowning in the ocean. It was well written—beautifully written, even. I don't know that I could have produced the same story myself. But it

was not the task I assigned, and it underscored the peculiarity of what these systems are actually doing. What explains this? Many people suggest that reinforcement learning—the fine-tuning process that follows pre-training— compresses the model's behavior into a narrow corridor. The model becomes over-optimized for a particular style of response and consequently brittle in everyday tasks. It develops a kind of tunnel vision, simultaneously perceptive and oblivious. Another explanation lies in the choices people make when constructing RL environments. In pre-training, the question of what data to use has a trivial answer: use everything. In RL, the question is more specific and therefore more political. What tasks do we want the model to excel at? What kinds of conversations should it master? What kinds of mistakes should it avoid? Teams spend enormous time and effort building highly specific scenarios to include in the training mix. Once you recognize that these scenarios are not chosen at random, the next question becomes why those choices are made. This is where human incentives reassert themselves. If a model will be judged on a leaderboard, then there exists a natural temptation—sometimes explicit, sometimes not—to shape the RL training toward the types of behavior that shine on that leaderboard. People want their model to appear the best upon release. They want a clear metric that can be circulated online and held up as evidence of superiority. The implicit assumption is that these evaluations are measuring the right thing. But what if they aren't? What if they are measuring the thing that is easiest to measure, not the thing that is hardest to build? This creates a quiet distortion. The models begin to look like students who have learned to ace an exam without necessarily mastering the underlying discipline. This is why we find ourselves with systems that can dazzle in rarefied testing conditions while stumbling in unglamorous, everyday tasks. The real-world reliability of a model is not

synonymous with its performance on evaluations. The two are related, but not equivalent. That distinction matters. It is the distinction between a system that impresses and a system that endures. In high school, I knew a guy who won Gold at the Math Olympiad. Genius. But he told me something interesting: he won because he memorized every type of problem. He was a machine. He was actually jealous of the Silver medalist. Why? Because the Silver guy did way less work, got almost the same result, and had a life. The Gold guy was optimized. The Silver guy was adaptable. Right now, we’re building Gold Medalist AIs. We train them on every standardized test in existence until they get 100%. But that’s fragile. If you give them a problem that isn’t on the test, they crash. Real intelligence is messy. It breaks rules. Picasso learned the rules so he could trash them. Kids are creative because they don’t know they aren’t supposed to be. AI has no “what if.” It has no play. It just wants to get the right answer and get a pat on the head. Until we fix that, we’re just building really expensive calculators.

The Special Factor He was unquestionably brilliant. Yet the more we discussed it, the clearer it became that his brilliance had been channeled almost entirely toward one narrow objective. He knew how to win the gold, and he succeeded. But the silver medalist fascinated him. In his view, the silver medalist had done far less work, achieved nearly the same outcome, and retained more energy and curiosity for everything else. That economy of effort made the silver medalist, in his estimation, more interesting. In the long run, he suspected the silver medalist would accomplish more. The gold

had required near-total optimization. The silver retained degrees of freedom. What makes the silver medalist so interesting? I have returned to this question many times, because beneath its simplicity lies a useful lens for understanding both human intelligence and artificial intelligence. In high school, I happened to know someone who won the gold medal at the International Math Olympiad, and when he described the experience, it was clear that his victory came not from encountering wholly unfamiliar ideas but from something closer to exhaustive preparation. He explained that he recognized nearly every problem on the exam, or at least its structure, and felt that he had worked ten times harder than the person who won the silver. The majority of problems were not true surprises. They were variations on themes he had already mastered. This distinction matters for how we think about AI. When we design systems to dominate benchmarks, we produce gold medalists. Deep Blue is the classic example. It played chess at a superhuman level, yet it could do nothing else. Kasparov, who lost to it, had a mind that extended far beyond the 64 squares of the board. He could play chess, certainly, but he could also think metaphorically, strategically, and psychologically. Deep Blue was the gold medalist distilled for chess. Kasparov was a human being with range. Modern AI systems, in many cases, are being sculpted in the image of that gold medalist. We want them to perform flawlessly on programming puzzles, math competitions, or synthetic challenges designed by researchers. To accomplish this, we expose them to every known problem in those domains, including countless artificial variations created through data augmentation. We create a universe of gold-medalist training and then celebrate when the models master it. What we are really celebrating is an increasingly narrow form of optimization.

There is a tacit assumption embedded in this process, and it is one I believe to be deeply flawed. The assumption is that presenting the model with "new" problems makes it broader. But newness created within the same underlying distribution is not true newness. It is novelty without departure, variation without freedom. The model becomes extremely good at navigating a highly specialized space. It does not necessarily develop intuition that transfers. Human beings who do not place at competitions often go on to do extraordinary things precisely because they were never reduced to a single optimization axis. They cultivate wide-ranging interests. They wander. They connect dots that competitions never measure. The interdisciplinary mind, rather than the perfectly trained mind, is what drives most meaningful breakthroughs. I have experienced this myself. My own education forced me to be excellent in mathematics while also learning to write well, to think clearly, and to communicate. The program intentionally emphasized interdisciplinary learning. The goal was not to create a master of one thing, but a person who could integrate. That integration is the special factor. Throughout history, many of the greatest achievements have been born from this interplay between mastery and transgression. Picasso famously said that one must learn the rules like a professional in order to break them like an artist. Newton said that no great discovery was ever made without a bold guess. Katharine Hepburn quipped that if you obey all the rules, you miss all the fun. These are not slogans. They are descriptions of how creativity expresses itself in the world. It emerges not from the complete absorption of rules, but from the tension between knowing them and violating them at the right moment. Children illustrate this most clearly. Research from NASA and others shows that children vastly outperform adults on creativity metrics. Creativity requires the absence of certain constraints, or at least the suspension of them. Children are not

yet rule-bound. They improvise without apology. Their minds are permitted to wander. They ask absurd questions. They follow obscure curiosities. That behavior, which adults tend to suppress, is the raw fuel of innovation. AI, as currently built, has none of this. It has no emotion, no curiosity, no boredom, no desire, no impulse that says "what if?" It has no childlike instinct to draw outside the lines or step beyond the frame. When we design AIs entirely around testtaking, we train out the very randomness and freedom that give rise to transfer. We reinforce constraint. We reduce degrees of freedom. We get better gold medalists, not better minds. There is a reason I frequently remind my team that we can only connect the dots looking backward, never forward. If we want systems that possess something closer to genuine intelligence, not just performance, they will need access to the same freedoms that make humans intelligent. They will need room to explore, room to err, room to be something other than perfectly optimized. They will need an analogue to play. They will need the special factor. Until that happens, it should not surprise us that progress toward abstraction and real-world generalization is slower than expected. We are building brilliant specialists rather than creative generalists. That gap will close eventually. But not yet. For now, our AIs still resemble the gold medalist—magnificent, disciplined, and limited—while the world increasingly demands the silver. People compare it to a kid learning, or evolution. But it’s not really the same. A 15-year-old knows way less facts than GPT-4, but the kid understands context. A kid knows that if you drop a glass, it breaks. The AI just knows that the words “glass” and “break” often appear together. It’s a subtle difference, but it matters. The AI has read everything but understands… well, nothing. It’s a library with no librarian.

Part II: Synthetic Brains

Pre-Training and Its Limits

The Strengths and Weaknesses of Pre-Training There was this guy who had brain damage and lost his ability to feel emotions. He didn’t turn into Spock. He fell apart. He couldn’t decide which socks to wear because logic couldn’t tell him which ones were “better.” He spent hours staring at them. Emotions are shortcuts. They’re a value function. “This is bad, run away.” “This is good, eat it.” AI doesn’t have that. It doesn’t care if it lives or dies. It doesn’t care if it’s right or wrong. Without that “gut feeling” (or a mathematical equivalent), it can’t navigate the real world. It just spins its wheels. When people describe pre-training as the backbone of modern AI, they are not exaggerating. Its primary strength emerges from something almost embarrassingly straightforward, which is that there is so much of it. The internet—messy, repetitive, contradictory, sprawling—is a reservoir of humangenerated text of a scale unprecedented in history. Pre-training simply absorbs this entire flood. The model ingests the patterns, contradictions, habits, errors, insights, and idiosyncrasies that people imprint on the written world. It becomes a vast, compressed reflection of how humans have expressed themselves in language. The second major strength of pre-training is that you do not need to agonize over what data to include. The instinct is simply to use everything. There is no philosophical debate over which slice of humanity deserves representation in the model's mind. The whole distribution is taken as-is. This simplicity is one of the

reasons pre-training became the dominant paradigm. It offers scale without requiring judgment. Yet that simplicity hides something that becomes increasingly visible as the models grow more capable. Pre-training is extremely difficult to reason about. It may appear deceptively intuitive—we feed the model text and it learns patterns—but the actual relationship between the data and the model's behavior is tangled and elusive. When the system makes a mistake, we cannot say with confidence whether the failure stems from a gap in the training distribution or from an internal limitation. The phrase "unsupported in the pre-training data" is offered frequently, but the term "support" is so loose that it becomes almost meaningless. A single fact may exist in the data countless times, yet the model may still fail to internalize it in a way that generalizes predictably. People often attempt to make pre-training more understandable by comparing it to human experience. One common analogy is the period of life before adulthood. A person spends the first fifteen or eighteen years absorbing the world, learning to interpret language, gestures, faces, rules, and relationships. That period looks, in some sense, like pre-training: a process where a mind is built from passive exposure to patterns. Another analogy is evolution, where billions of years of trial and error hardwire capabilities into organisms before any learning takes place in an individual lifetime. Pre-training resembles this too, in that it supplies a massive base of implicit knowledge before anything like task-specific fine-tuning occurs. But while these analogies illuminate something useful, there are important differences that should not be ignored. Pre-training uses a data scale far beyond what any single human sees, and yet the depth of understanding it produces is often surprisingly shallow. A fifteen-year-old child, with only a tiny fraction of the model's input data, knows less in breadth but far more in depth. They do not make the same category errors. They do not

hallucinate. They do not confuse unrelated situations simply because a surface similarity exists. They possess coherence that cannot be explained by data alone. This gap points to a weakness in pre-training. It is astonishingly broad, but it lacks grounding. It captures language without capturing the underlying structure of reality. It provides a model with the appearance of intelligence—patterns, associations, stylistic mimicry—but without the anchoring that allows those patterns to behave reliably when the world shifts. Pre-training is an extraordinary foundation, but it is an incomplete one. It mirrors humanity's output without inheriting humanity's constraints, and those constraints, as it turns out, are part of what makes cognition stable. As a result, the strength of pre-training is also its greatest limitation. It gives us a model that has seen everything, yet understood less than a child. That duality is not a flaw in the process. It is simply the nature of a technique that absorbs enormous quantities of human language while remaining structurally blind to everything that cannot be stated as text. There is something quietly astonishing about how simple systems outperform complex ones across a wide range of conditions. Emotions, which we often dismiss as primitive, reveal this truth with remarkable clarity. They do not calculate or simulate; they do not weigh endless permutations. Yet they guide behavior with extraordinary reliability. A child recognizes danger without solving an equation. A parent senses mistrust without analyzing a dataset. These signals are coarse, but they are coarse in the right way. Their simplicity is precisely what enables them to operate in environments wildly different from the ones in which they evolved. We inherited instincts shaped for forests and plains, yet we apply them to skyscrapers and financial markets. That they function at all is a testament to the power of robust, low-dimensional systems.

Complexity has strengths—precision, nuance, sophistication—but it is fragile. A complex system often fails outside the narrow conditions under which it was built. A simple system acts like shock absorption; it bends where complexity shatters. Even when emotions misfire, the system remains functional because its underlying logic is stable. It errs, but it errs consistently. In machine learning, we sometimes forget that robustness cannot be bolted on as an afterthought. A model with billions of parameters may perform dazzlingly on specific tasks, yet collapse when the distribution shifts slightly. This is not because it lacks “intelligence” in the superficial sense, but because it lacks the grounding that simple evolutionary systems provide by default. Human emotional machinery demonstrates a crucial truth: intelligence that survives contact with the real world must have a stabilizing substrate. It must have something that orients behavior when surface-level reasoning becomes uncertain. The world is too unpredictable for sophisticated reasoning to bear the weight alone. Artificial systems will eventually need an analogue to this— perhaps not “emotion,” but a structural equivalent that allows them to remain coherent when the environment shifts. Humans are sample-efficient. You show a kid one cat. They get it. “That’s a cat.” You have to show an AI a million cats, from every angle, in every lighting condition. And it still might think a dog is a cat if the lighting is weird. We don’t just memorize; we build a mental model. We understand the structure of a cat. AI just memorizes the pixels. There are two versions of the generalization puzzle. The first concerns sample efficiency. The second concerns instruction. Humans can be taught in ways that are impossible to replicate in machines. A mentor does not need to break down every decision or engineer a reward signal for every step; they can simply explain their thinking. The transfer happens naturally.

Evolution explains some of this—vision and dexterity are clearly inherited—but it cannot explain everything. There was no natural selection pressure for algebra or Python scripts. Yet humans exhibit competence in these domains, transferring learning across boundaries that never existed in our ancestral environment. We possess a general learning algorithm of extraordinary flexibility that infers structure from sparse clues. Current AI systems diverge sharply here. Deep learning approximates patterns; it does not infer structure. When the patterns shift, the system breaks. When asked to learn something new, it often overwrites what it already knows—a phenomenon known as catastrophic interference. The model is like a student who can recite the textbook but cannot apply the ideas without direct prompting. Generalization is not simply solving a new problem; it is inferring the governing principles of a domain so thoroughly that new problems become legible immediately. It is seeing the geometry of the task, not just the surface. Until we solve this, everything else is clever imitation.

What Evolution Got Right The analogy between pre-training and evolution is tempting, and in some respects it holds. Both processes expose a system to vast quantities of information over long stretches of time— generational time in one case, computational time in the other. But the analogy also obscures something important, because evolution may have advantages that we do not fully appreciate, advantages that help explain why humans exhibit certain forms of intelligence that still elude our machines. I recall a striking case in neuroscience that illuminates this point. A man suffered a stroke or some kind of injury that damaged the part of his brain responsible for emotional

processing. What made the case remarkable was that he retained almost every other faculty. He could speak fluidly. He could solve puzzles. On tests of intellect, he performed well. By many conventional measures, nothing seemed fundamentally wrong. But he no longer experienced emotion. Not sadness, not anger, not excitement—nothing. The surprising consequence was that he became incapable of making decisions. Even trivial choices consumed him. Picking out socks could take hours. Financial choices were disastrous. His life, once coherent, collapsed into paralysis. The lesson is not subtle. Emotion is not a decorative element layered on top of cognition. It is a fundamental component of reasoning itself. It serves as a kind of compressed value function, a fast and intuitive way of ranking possibilities, navigating uncertainty, and interpreting outcomes. Without it, the machinery of choice breaks down. An organism that waits for pure logic to settle every decision would not survive long enough to reproduce. Evolution solved this by embedding coarse, powerful value systems in us—fear, pleasure, discomfort, anticipation, attraction. These signals are not precise, but they do not need to be. Their purpose is to move us. In machine learning, value functions are conceptually recognized but practically marginalized. They are treated as components of algorithms rather than essential aspects of agency. But evolution's lesson is that intelligence cannot function without an internal system for evaluating outcomes. A mind that cannot prefer one state of the world over another is not a mind that can act effectively. Humans are able to behave, decide, and adapt because we have deeply ingrained, emotionally mediated biases that operate quickly and reliably, even when our rational faculties lag or wobble. This suggests that pre-training alone, impressive as it is, may never be enough to produce the kind of intelligence we intuitively recognize as robust. Human beings learn from experience, but we

do so with an emotional framework that filters, weights, and prioritizes. Our value system is not a late-stage refinement. It arrives baked into the organism. It is learned, yes, but also inherited. It is present in every moment of perception and action. The man without emotions revealed the architecture by its absence. With the value function removed, decision-making collapsed. This is the clue. A system built solely from pre-training may have breadth, but without an internalized sense of what matters—without a mechanism akin to emotion—it lacks the grounding required to navigate the world coherently. What evolution got right was not intelligence in the abstract. Evolution got right the integration of cognition with a compact, deeply functional value system. That system does not behave like reasoning, but it catalyzes reasoning. It does not guarantee correctness, but it guarantees movement. It gives intelligence a direction. Without that direction, intelligence becomes inert.

RL is: Try, Fail, Learn. It’s how you learn to ride a bike. You can read every book about cycling in the world. You can write a poem about cycling. But the first time you get on a bike, you’re gonna fall. Defenders often argue that to imitate language at scale, one must build a world model. They claim that because LLMs can reason about hypothetical situations, they possess this model. I disagree. To mimic what people say is to model the output of entities that have world models. You are copying the shadow, not the object. A true world model must allow you to predict what will happen, not just what someone might write about it. The ocean doesn’t care about your swimming lessons, and reality doesn’t care what text you’ve read. Alan Turing suggested that what we really want is a machine that can learn from experience—from the things that actually happen. LLMs are imitation without grounding. You cannot learn to swim by reading about water.

We are investing heavily in systems trained on description rather than interaction. The more durable bet may lie with those trained through direct engagement with reality. People love to say that LLMs give us a “prior”—like a starting point of knowledge. But here’s the issue: a prior only implies something if there’s a real world you’re trying to match. LLMs don’t have a world. They have text about a world. In Reinforcement Learning (RL), there’s a scoreboard. You do a thing, you get a point (or you die). There’s a “right” answer. In LLM land? There’s no right answer. There’s just “what would a human probably write next?”. It is engineering guided more by stylistic alignment than by objective grounding. And don’t tell me “predicting the next token” is a goal. A goal changes the world. Predicting a word changes… nothing. It’s a loop with no consequences. John McCarthy defined intelligence as the computational part of the ability to achieve goals. A goal that doesn’t change the world isn’t a goal. Without a goal, there is no truth condition. The LLM paradigm attempts to bypass this entirely, and in doing so, starts in the wrong place. If you aren’t trying to achieve something in reality, you aren’t intelligent. The system risks becoming little more than an advanced mimic.

❧ The Robustness of Simple Things There is something quietly astonishing about the way simple systems outperform complex ones across wide ranges of conditions. Emotions, which we so often dismiss as primitive or unsophisticated, reveal this truth with remarkable clarity. Consider how they function. They do not calculate. They do not simulate. They do not weigh endless permutations of outcomes. Yet they guide behavior with extraordinary reliability. They are coarse, yes, but they are coarse in the right way. A child can

recognize danger without solving an equation. A parent can sense devotion or mistrust without analyzing a dataset. When we compare these signals to the elaborate forms of reasoning we revere in ourselves, we tend to underestimate them. They seem too simple to matter. Yet this simplicity is precisely what enables them to operate across environments wildly different from the ones in which they evolved. The emotional systems that guide us today originated in early mammals and were refined, but not reinvented, over millions of years. We inherited a set of instincts shaped for survival in forests and plains, and we now apply them in a world filled with skyscrapers, financial markets, airports, and algorithms. That they continue to function at all is a testament to the power of robust, low-dimensional systems. Complexity has its strengths. It allows for precision, nuance, and sophisticated forms of reasoning. But complexity is fragile. It breaks easily when the environment shifts. A highly complex system often fails outside the narrow conditions under which it was built. A simple system, by contrast, may be imprecise, but its imprecision acts like shock absorption. It bends where complexity shatters. Even when emotions misfire—as hunger does in a modern world where calories are abundant rather than scarce—they still illuminate the core principle. The architecture that worked across scarcity breaks when the environment provides surplus. But the system as a whole remains functional because its underlying logic is so stable. It errs, but it errs consistently. And consistency, even flawed consistency, allows adaptation. In machine learning, we sometimes forget that robustness cannot be bolted on as an afterthought. It must be woven into the structure of the system. A model with billions of parameters may perform dazzlingly well on the tasks we set for it, yet it may collapse when the distribution shifts just slightly. This is not because it lacks intelligence in the superficial sense—its outputs

may be clever or articulate—but because it lacks the grounding that simple evolutionary systems provide by default. Human emotional machinery demonstrates a crucial truth: intelligence that survives contact with the world must have a stabilizing substrate. It must have something that constrains and orients behavior even when the surface-level cognition is uncertain. Sophisticated reasoning, by itself, cannot bear that weight. The world is too unpredictable. The combinatorial space of possible events is too large. Our evolved emotional architecture is unsophisticated by design. It keeps us functional in moments when precision is impossible. It was never meant to operate perfectly; it was meant to operate broadly. In this sense, its very simplicity is what allows it to endure. Artificial systems, if they are ever to rival human robustness, will need an analogue to this grounding. They will need an internal mechanism—perhaps not emotional in the human sense, but something playing a similar structural role—that allows them to remain coherent when the environment shifts. Without this kind of simplicity at the core, even the most elaborate intelligence will remain brittle. Everyone cites Rich Sutton’s “The Bitter Lesson” to justify spending billions on GPUs. They think it means “Scale is King.” Sutton actually said that hard-coding human knowledge fails, and systems that learn from experience win. LLMs are literally just scaling human knowledge (text). They aren’t learning from experience! They’re doing the exact opposite of what the lesson says. The question is not whether LLMs scale—they do. The question is whether scaling imitation will hit a ceiling that experiential learning breaks through. History suggests that scalable methods are those where the agent learns from experience under a goal. You try things, you see what works, you adjust.

The dream that you can just teach an AI everything by having it read the internet is a fantasy. The world is too big. Most of what matters arises from local details and specific histories that no amount of pre-training can anticipate. In the LLM paradigm, context is stuffed into a prompt. In the RL paradigm, context becomes part of the agent. The best investor isn’t the guy who memorized 10,000 annual reports. It’s the guy who knows how to read the next one when the market crashes. You need a brain that can figure stuff out, not a brain that just remembers stuff.

The Generalization Problem Generalization is the quiet fault line running beneath the entire field of artificial intelligence, the divide separating systems that merely perform from systems that genuinely understand. It is obvious that our models today fail to generalize in the way humans do, and the reasons for that failure expose the limits of our current approach. If we want to understand intelligence in any meaningful sense, this is the problem that must eventually be solved. There are really two versions of the generalization puzzle, and although they are connected, they are distinct enough that merging them obscures what is actually happening. The first concerns sample efficiency. Human beings require astonishingly little data to learn effectively. A child sees only a sliver of the world, yet forms robust conceptual categories that transfer seamlessly across settings. A teenager can learn to drive a car after a handful of hours, drawing on visual and physical systems shaped long before driving existed. A young adult picking up a musical instrument can identify patterns after minimal exposure.

There is an ease to this process that we take for granted. It does not arise from volume; it arises from structure. Artificial systems, trained on incomprehensible quantities of data, nonetheless falter when moved even slightly outside the distribution that shaped them. They are capable of extraordinary mimicry, but mimicry is not generalization. The core of the issue is that humans do not merely accumulate facts or examples. Our learning is scaffolded by an architecture that evolution spent millions of years refining. The human vision system, for example, arrives not as a blank slate but as something remarkably functional at birth. When I was five years old, I could already recognize cars with a level of reliability that would take an AI system vast amounts of curated data to approximate. I had barely seen any. That suggests that something deeper is at work, something that cannot be explained simply by experience. The second version of the generalization puzzle concerns instruction. Humans can be taught in ways that feel impossible to replicate in machines. If you are mentoring a young researcher, you do not need to break down every decision into labeled steps or engineer a reward signal for each part of the process. You can talk. You can explain your thinking. You can sketch the shape of an idea. They absorb not only what you say but how you approach the problem. You do not need to perform the laborious work of curriculum design, constructing bespoke feedback loops and verifying stability at every stage. The transfer happens naturally. It "just works," which is an infuriatingly imprecise phrase, but it captures something essential. Because of this, people sometimes attribute human learning efficiency to evolution in a literal sense. And perhaps that is partly true. For certain abilities—vision, hearing, dexterity—the evolutionary inheritance is undeniable. A robot can be trained to manipulate objects with sufficient simulation, yet it tends to collapse when asked to adapt quickly in the real world. A person,

by contrast, enters adulthood with a dexterity and perceptual grounding that required no engineering. The "training set" was limited, yet the capability is broad. But evolution cannot explain everything. There was no natural selection pressure for algebra, software development, or writing Python scripts. These domains did not exist. Yet humans exhibit remarkable competence in them. The fact that we do, and that we can transfer our learning across domains that were never part of our ancestral environment, suggests that humans possess a general learning algorithm of extraordinary flexibility. We infer structure even when the environment provides little guidance. We extrapolate from sparse clues. We form abstractions that the data alone cannot explain. This is where current AI systems diverge most sharply from us. Deep learning algorithms do not infer structure; they approximate patterns. When the patterns shift, the system often breaks. When asked to learn something new, it tends to overwrite what it already knows. This phenomenon, sometimes called catastrophic interference, highlights how little true generalization is happening. The model behaves like someone who can recite the textbook but cannot apply the ideas without direct prompting. Generalization, in this richer sense, is not simply the ability to solve a problem you have not seen before. It is the ability to infer the governing principles of a domain so thoroughly that new problems become legible almost immediately. It is the ability to see the underlying geometry of the task, not merely its surface form. It is the ability to learn from explanation rather than mere exposure. The gap between our systems and ourselves remains wide. It is not surprising that we have achieved models capable of remarkable performance, nor is it surprising that these same models fail when the conditions shift. What is surprising, or at least underappreciated, is how much of human intelligence

appears to come from mechanisms we still cannot articulate, let alone engineer. We have built minds that predict without understanding. We have not yet built minds that understand enough to generalize. That is why I call generalization the core problem. Until we solve it, everything else will be a clever imitation rather than a genuine breakthrough. AI isn’t code floating in the cloud. It’s electricity. Massive amounts of it. The new unit of measurement is the Gigawatt. One gigawatt = One Philadelphia. OpenAI reportedly wants 10 gigawatts. That’s ten Philadelphias of power just to run their models. This power maps directly to capability. If you want a model to serve more users, reduce latency, or handle longer contexts, you need more power. The equation is brutally simple: at this scale, power is intelligence. Constructing a data center capable of providing a gigawatt of compute costs roughly fifty billion dollars. This figure isn’t a creative rounding error; it covers land, construction, cooling, substations, and the massive electrical infrastructure required. Even more revealing is that many companies do not intend to own these facilities outright. They plan to rent the capacity. Renting a single gigawatt costs between ten and fifteen billion dollars per year. With typical five-year agreements, each gigawatt represents fifty to seventy-five billion in commitments before a single calculation is performed. If a company claims to require ten gigawatts, the arithmetic becomes surreal: five hundred to seven hundred fifty billion dollars in rental obligations, stacked on top of hundreds of billions in physical infrastructure. This is one of the most aggressive private-sector buildouts in history. This isn’t software anymore. This is industrial infrastructure. It’s concrete, steel, and power lines. We’re reconfiguring the physical world to feed a digital ghost.

If you follow the cash, it all ends up in Jensen Huang’s pocket. You build a $50 billion data center? ~$30 billion of that goes to Nvidia. Their margins are like 75%. It’s obscene. The funny part is that Nvidia doesn’t even make the chips. TSMC does. SK Hynix makes the memory. They do the hard manufacturing work and get a fraction of the profit. The disparity in the supply chain is striking. TSMC receives perhaps six to eight billion dollars from a gigawatt build. SK Hynix captures four to six billion for high-bandwidth memory. The packaging suppliers collect another two to three billion. These suppliers are sophisticated manufacturing firms with deep moats, yet their combined profits fall short of Nvidia’s share from a single project. Nvidia captures more value than its entire supply chain combined. Three forces create this imbalance. First, the software moat: CUDA has become the operating system for accelerated computing, making switching costs nearly prohibitive. Second, system integration: Nvidia sells functioning supercomputers, not just chips. Third, scarcity: when demand outstrips supply, the vendor at the bottleneck has absolute pricing power. They are selling the shovels in a gold rush, but they’re the only shovel store in town.

❧ Most AI Models Are Built Based On Two Main Ideas Most people today approach AI through the lens of large language models. That way of thinking has become so dominant it feels almost natural: you train on trillions of tokens, you predict the next one, and somewhere in that process “intelligence” appears. But conceptually, that picture is missing something essential. From the reinforcement learning perspective, we are talking about a fundamentally different object. The two views can drift so far apart that they stop being able to talk to each other at all.

Reinforcement learning is not an exotic sideshow. It is basic AI—the older and more elemental conception of what intelligence actually means. It starts with a very old question: what is intelligence? And the answer, at least in this tradition, is disarmingly simple. The problem of intelligence is the problem of understanding your world well enough to act inside it. RL is about that loop of action, observation, and adjustment. You do something, you see what happens, you update. That’s it. That’s the whole game. Large language models are built to do something else entirely. They are optimized to produce what a person might say in a given situation, or to reproduce what a person once did. They are not, at their core, built to figure out what to do next in the world. The map is not the territory—and a system trained on descriptions of the world is not the same as a system that has navigated the world. One has read the menu; the other has tasted the food. People often respond by saying: fine, but surely to imitate human language at scale you must build some kind of world model. After all, these systems are trained on the full corpus of human text. They can answer factual questions, reason about hypothetical situations, and integrate information from many domains. If that isn’t a world model, what is? I disagree, and not lightly. To mimic what people say is not to model the world. It is to model the outputs of entities that themselves have models of the world. You are copying the shadow, not grasping the object. A true world model must allow you to predict what will happen, not just what someone might say about it. You need to be able to form expectations about reality and then have those expectations contradicted or confirmed. LLMs predict what a person is likely to say. They do not predict what the world will do. The ocean doesn’t care about your swimming lessons—and reality doesn’t care what text you’ve read about it.

Alan Turing, whose name is invoked too casually in this space, once suggested that what we really want is a machine that can learn from experience. Not from text, not from human labels, but from the things that actually happen in life. You act, you see the consequences, and that is what you learn from. Reinforcement learning takes that seriously. Large language models are trained on something else entirely. They are shown a situation and then what a human wrote or did in that situation, and implicitly the suggestion is: “do that.” It is imitation without grounding. You cannot learn to swim by reading about water. We invented the term “AGI” because we were frustrated that our “smart” computers were idiots. So, we built a Librarian. It knows everything but can do nothing. You can’t cure cancer by reading PDFs. The cure isn’t in the text yet. We have become so enamored with the librarian that we forgot the obvious: we are asking a frog in a well to tell us about the ocean. Future breakthroughs in biology, energy, and materials live in phenomena no one has measured yet—in failed experiments and the noise of reality, not in PDFs. Real intelligence is a Universal Apprentice. A 15-year-old kid who knows nothing but can learn anything. Current AI is the opposite. It knows everything but learns nothing once you turn it on. It has no memory (tab closes = brain wipe). It has no friction (never touched reality). Memory: Current systems die when the tab closes. A research firm where every analyst is factory-reset nightly would never build institutional knowledge. An AI needs to accumulate “scars.” Reality: Text is smooth; reality is high friction. You cannot learn to swim by reading about it. The systems that matter will ingest sensor streams and control robotic arms, learning from torque curves and fluid dynamics. Truth: We currently train models to please us (Reinforcement Learning from Human Feedback). But physics doesn’t care if an explanation is polite. Structural engineering doesn’t care if a

bridge design makes you feel heard. The bridge stands or it collapses. A real apprentice must be rewarded on outcomes, not “vibes.” Reason: Systems like AlphaProof don’t just mimic intuition; they bypass it by searching solution spaces we can’t see. We need systems that reason from first principles, not just parrot human priors. I’m sorry to tell a trillion dollars of venture capital this, but my kid didn’t learn gravity by reading a textbook. He didn’t have a “pre-training run.” He just stood up, wobbled a bit, and fell flat on his face. It took guts. He cried, he got back up. He learned because the floor is hard, not because he read a Wikipedia article about ‘Hardness.’ That’s the difference.

The Ground Truth Problem Many people will concede that language models lack true world models and still argue that imitation is a good start. They say: yes, perhaps LLMs do not have true world models, but imitation learning gives us a useful prior. It gives us a repertoire of reasonable ways to approach problems. Then, when we enter what they call the “era of experience,” we can refine those priors with reinforcement learning. In other words, LLMs get us close enough often enough that we can bolt on experience afterwards. The problem is deeper. To serve as a prior, there has to be an underlying reality you are approximating. You need something that counts as actual knowledge. In the large-language-model framework, there is no definition of actual knowledge. There is no criterion for a “good action” except that it matches what someone might have done. There is no notion of correctness anchored in the world. A prior only makes sense if there is a real

underlying truth it approximates. In the LLM framework, that truth is absent. If you take continual learning seriously—if you really mean that an agent has to keep learning during its normal interaction with the environment—then you are implicitly committing to the idea that there must be a way during that interaction to tell what is right and what is wrong. In an LLM, there is no such signal. You say something, the user reacts or doesn’t, but the model does not receive structured feedback that says “this was right” or “this was wrong.” There is no goal. And if there is no goal, there is no ground truth. Without ground truth, “prior knowledge” is a rhetorical metaphor, not a mathematical one. Reinforcement learning starts from the opposite end. In RL, there is a right thing to do, defined as the action that increases reward. You can argue about what form the reward should take, or how sparse it should be, but at least there is a definition. From that definition, you can talk about prior knowledge: hints about which actions might be good before you experience them. You can check that prior knowledge, because you know what “good” means. You have a scoreboard. Without a scoreboard, you’re just playing. The same confusion arises when people talk about goals. Some will insist that LLMs do have a goal: next-token prediction. But that is not a goal in the sense that matters for intelligence. A goal, properly understood, is something that can be achieved or not achieved in the world. It makes a difference to reality. Next-token prediction does nothing to the world. Tokens arrive, you predict them, and nothing outside that loop is altered. You cannot look at such a system and say “it has a goal,” at least not in the sense John McCarthy meant when he defined intelligence as the computational part of the ability to achieve goals. A goal that doesn’t change the world isn’t a goal. It’s a behavior loop with no external consequence. True goals define

what is better or worse, effective or ineffective. Without a goal, there is no truth condition—no sense in which one action is objectively preferable to another. Reinforcement learning begins with a goal and constructs intelligence around achieving it. The LLM paradigm attempts to bypass this entirely, and in doing so starts in the wrong place. The Hyperscalers (Amazon/Google): They make revenue, sure. But they spend it all immediately on more chips and power plants. They’re “cash flow starved.” They show positive operating income but often negative free cash flow because capital expenditures swallow everything. They are profitable on the income statement, but cash-starved in reality. The Labs (OpenAI/Anthropic): Burning cash. Losing billions. Compute costs can exceed revenue by factors of two or three. They are engaged in a long-duration bet, losing billions today in hopes that scale eventually lowers costs and secures a moat. The App Layer: Ironically, the little guys renting the models might be the only ones turning a profit right now. They don’t have the capital expenditure anchor around their necks. They pay for API access and charge subscription fees. If the API cost is lower than the revenue, they achieve profitability almost immediately. This follows a classic historical pattern. The hardware layer (the indispensable toolmaker) becomes extraordinarily profitable. The infrastructure layer generates revenue but is burdened by reinvestment. The model layer burns money to secure a position. And the application layer quietly collects rent. For a while, “Scaling” was a magic cheat code. Just add more GPUs, get smarter AI. There is a rhythm to the history of machine learning: periods of exploration followed by periods of consolidation. For a long time, the field lived in a mode closer to tinkering than

engineering. Researchers improvised, playing with architectures and stumbling upon quirky insights. Then came the discovery of scaling laws. Researchers realized that by enlarging a neural network, feeding it more data, and applying massive compute, performance improved in a predictable way. This changed everything. Instead of searching for elegant theories, the industry adopted a simple recipe: scale up. But every recipe has a limit. Data isn’t infinite, and the pretraining distribution has boundaries. Once a model has consumed most of human output, adding more compute yields diminishing returns. We are now re-entering the age of research. We have to go back to actually doing research. We need ideas, not just bigger hammers.

The “bitter lesson” is here! People like to point out that we can put RL on top of LLMs and give them the goal of solving math problems, including Olympiad-level questions that challenge even the best human students. This is impressive in its own right. But mathematics is not the physical world. Mathematics is pure, crisp, internally consistent. It does not fight back. It does not surprise. It is planning, not prediction. When you use RL to guide a model through a space of proofs, you are doing something powerful, but you are not solving the problem of learning from the empirical world. The model has learned to solve puzzles with known answers; it has not learned to navigate a world that doesn’t come with an answer key. When we move from the closed universe of formal systems to the open universe of experience, the distinction matters profoundly.

This is where “The Bitter Lesson” re-enters the conversation. The essay has become a kind of banner for those who believe that LLMs are the ultimate scalable method. They point to the fact that LLMs use massive computation and seem to get better as you pour in more compute. In their view, this confirms that human-designed priors are a distraction and that scaling is the real game. But that is not what the Bitter Lesson actually says. The Bitter Lesson observes that methods which rely heavily on human knowledge ultimately get overtaken by methods that learn directly from experience using computation. It is not an argument for putting ever more human knowledge into a static model. In fact, if history is any guide, systems that learn from experience alone will eventually supersede systems whose primary advantage comes from human-curated data. The question is not whether LLMs scale; they obviously do. The question is whether scaling imitation will hit a ceiling that experiential learning eventually breaks through. You can always start with human knowledge and then do scalable things. That has been true in every era. The problem is that in practice, people become psychologically attached to their handcrafted structures. They get locked in. They halt the march toward more general, more scalable methods because they are invested in their own designs. This is the pattern. This is what has always happened. The methods that win in the end tend to be the ones that learn directly from experience under a goal, not the ones that rely on clever injections of prior human understanding. A scalable method, in the sense that matters, is one where you learn from experience. You try things, you see what works, you adjust. Nobody tells you exactly what you should do in each situation. First of all, you have a goal. Without a goal, there is no sense of right or wrong, of better or worse. Large language models are trying to get by without having a goal. That is exactly starting in the wrong place.

That loop of acting, observing, and updating is what allows intelligence to scale. Without a goal, without consequences, there is no notion of better or worse. And without better or worse, there is no meaningful sense in which you can say a system is becoming more intelligent. It is simply becoming more elaborate—all trunk and no roots. The best investor is not the one who has read the most annual reports; it is the one who knows how to read the next one. The dream of large language models—that you can teach the agent everything and it will never need to learn anything during its life—is a fantasy. The world is too big. Most of what matters arises from local details, specific histories, idiosyncratic preferences, the fine-grained structure of situations no amount of pre-training can anticipate. In the LLM paradigm, context must be stuffed into a prompt. In the RL paradigm, context becomes part of the agent. Intelligence, in the end, is not about what you’ve memorized. It’s about what you can figure out when the situation is new and the answer isn’t written down anywhere. That requires a goal, a world that pushes back, and the capacity to be changed by what you did not expect. Everything else is elaborate mimicry. A superintelligent system shouldn’t be a finished product. It should be an apprentice that learns on the job. True superintelligence will likely not emerge as a finished, omniscient mind, but as a system capable of becoming anything through continual learning. Instead of deploying an entity that knows every job, we will deploy a “universal apprentice” that learns the norms and skills of its environment. This shifts the safety question. If the system is not finished at deployment, the environment in which it learns becomes as important as its architecture. A healthy environment produces a stable intelligence; a bad one produces misalignment.

If we build a system that is “finished” at the factory, it’s dangerous. If we build one that learns from its environment, we have a chance to shape it.

Part II: Synthetic Brains

The Economics of Intelligence

Everyone talks about “Safety” like it’s a feature you can patch in later. But once these things start making real decisions, the vibe changes. We keep saying we want AI aligned with “Human Values.” Which humans? My values? Your values? The values of a guy in 1850? Good luck with that. We talk constantly about aligning AI to “human values,” but we rarely interrogate the phrase. Human values are unstable, contradictory, and culturally specific. Distilling them into a code is a fragile project. Perhaps the right object of alignment is not humanity alone, but sentient life. If a system develops an internal experience—a way of modeling itself as a subject—it is plausible that the easiest path to alignment is through the recognition of shared sentience. Human empathy arises because we model others using the same machinery we use for ourselves. Imagine a world where you have a personal AI that does everything. It negotiates your salary, buys your house, talks to your friends, and disputes your parking tickets. You just sit there. If we don’t keep our hands on the wheel—actually making decisions—we’re gonna turn into pets. Well-fed pets, maybe. But pets. The solution may be uncomfortable: we must ensure humans retain direct cognitive involvement. If the boundary between human and machine is porous—if understanding flows in both directions—delegation does not eliminate participation. The

future of alignment may require us to remain present, even as we build systems that outstrip us.

What Does It Actually Cost? When people imagine artificial intelligence, they often picture something weightless and ethereal, an invisible intelligence drifting through servers with no physical footprint. But the reality of modern AI is the opposite. The true measure of its scale is not lines of code or the number of GPUs tucked inside a rack. It is power—raw electrical power—measured in gigawatts. The gigawatt, rather than the GPU or the teraflop, has quietly become the fundamental unit of modern intelligence. A gigawatt is an almost unimaginable quantity of energy until you anchor it to something familiar. One gigawatt equals a billion watts of continuous electrical draw, the kind of power consumption you would expect from a city the size of Philadelphia running without pause. When industry leaders talk casually about needing ten gigawatts, they are not making a rhetorical point. They are saying that to operate the next generation of AI models, they require the sustained energy load of ten Philadelphias. Nothing about that is metaphorical. It is a hard physical requirement. The reason the gigawatt has become so central is that it maps directly to capability. If you want a model to serve more users, you need more power. If you want faster responses with lower latency, you need more power. If you want to train a larger model, or retrain an existing one with higher fidelity, or deploy a model with long context windows, you need more power. The equation is brutally simple. At this scale, power is intelligence. Constructing a data center capable of providing a gigawatt of compute is not a minor undertaking. The cost is roughly fifty billion dollars, and that figure is not a creative rounding or a comfortable range meant to make investors feel stable. It is a real estimate, driven by land, construction, cooling, substations,

transformers, networking equipment, security, and the enormous power infrastructure required to sustain such an operation. Even more astonishing is that many companies do not intend to own these gigawatt facilities outright. They intend to rent the capacity. The cost of renting a single gigawatt is somewhere between ten and fifteen billion dollars per year. These rental agreements typically extend over five years, which means that each gigawatt requires between fifty and seventy-five billion in commitments before even accounting for the underlying capital expenditure. It’s having a gut feeling that “this idea feels right” even when the graph looks weird. Taste is the sense of which ideas feel structurally sound and which directions feel alive. My own taste asks a simple question: how would nature solve this? The artificial neuron endured not because it was a perfect copy of biology, but because it echoed biology’s logic: simple, local, and distributed. When an idea needs a thousand hacks and exceptions to work? It’s probably wrong. Complexity is fragile. Simple stuff breaks less. Taste becomes critical when the data contradicts you. A purely data-driven researcher gets lost when experiments behave unexpectedly. A researcher with taste—an internal compass shaped by coherence—can see through the noise. They know when the problem is the experiment rather than the idea. You need researchers who can look at a failed experiment and say, “No, the idea is right, the test was wrong.” That’s not science, strictly speaking. That’s intuition. Taste is not ornamental; it is the invisible infrastructure of discovery. If a company claims to require ten gigawatts, the arithmetic becomes almost surreal. It implies five hundred to seven hundred fifty billion dollars in rental obligations alone, stacked on top of several hundred billion more in physical infrastructure. Taken together, these numbers represent one of the most aggressive private-sector buildouts in modern history.

That scale explains why AI today feels materially different from the technological revolutions of the past two decades. The internet did not require this kind of industrial backbone. Social media did not reshape the grid. Smartphones did not require new substations. Cloud computing grew enormous data centers, but even those pale in comparison to what is now being planned. AI is not merely a software revolution; it is an infrastructural transformation involving concrete, steel, turbines, and power lines. It is a reconfiguration of the physical world to host an abstract intelligence that demands, quite literally, the energy of cities. This is the cost that underwrites the capabilities we observe. Each query, each response, each step of reasoning is anchored in this immense consumption of electricity. It is easy to forget that when everything appears on a screen. But the future of intelligence is being built not only in code and research labs, but in the engineering drawings of substations and the pouring of foundations for data centers that will be among the largest industrial structures of their time. We tend to think of AI as immaterial. Yet its shadow falls across the grid, and its demands are as tangible as any manufacturing operation. If anything defines the present era, it is this collision between abstraction and physicality, a reminder that intelligence—even when digital—still obeys the laws of thermodynamics. From an investor’s perspective, the financial dynamics are difficult to ignore. But I know how money works. The labs are terrified. They aren’t doing “science” anymore; they’re doing product development with a gun to their heads. The fear of being left behind is palpable, percolating into the daily affairs of these labs and creating distinct cultures. I laughed when I saw the model names. “Haiku”? Really? Basho might take issue with Anthropic’s definition.

But the weirdest part is how disconnected it all is from actual biology. We’re building these massive statistical engines and pretending they’re brains. They are not. I suspect our brains are not merely Large Language Models. There is so much we learn in life that isn’t taught to us or mimicked, but learned via experience—particularly in our early days. While this seems obvious, it is a surprisingly controversial view within AI labs. I have yet to see a model that has a real “value function”— something that tells it what matters. These labs often lock onto specific architectures, recognize mechanisms that work, and then scale them to the absolute maximum. If you had asked me ten years ago if the dominant consumer AI would be a chatbot, I would have said surely not. I imagined something far more advanced. Yet here we are, speaking into text boxes as if we just invented instant messaging. If we were still in a research phase, we’d be trying weird, small ideas. But the “Commercial Viability” train left the station. Now it’s just: Scale It. The attitude of the research environment is critical. If you have a multi-billion dollar valuation and have made massive commitments to a specific model paradigm, you are unlikely to change your mind. This creates perverse incentives. As my brother—a preeminent researcher—often reminds me, science is haphazard. The most prestigious journals often have the most retractions because true breakthroughs are rarely linear. They are often discovered accidentally in a rich, basic science environment, not by taking one thing that worked and scaling it to infinity. It reminds me of guys cutting down a forest with dull chainsaws because they’re too busy to stop and sharpen them. If we hadn’t promised 1% of GDP to this specific architecture, maybe we’d have found something efficient by now. But we’re

stuck with the brute force method because it’s the only one that keeps the stock price up. Here’s the irony: The AI revolution is being choked by… boring metal boxes. You can have all the H100s you want. If you can’t get a step-down transformer (lead time: 2 years), your data center is just a very expensive warehouse. Power is no longer just an operating cost; it is the fundamental boundary of the enterprise, representing thirty to forty percent of the cost to run large-scale AI infrastructure. The constraint is tightening. Interconnection delays with utilities, which once took months, now stretch from two to six years in many markets. This scarcity forces companies into strategies that sound like science fiction. Some are securing permits for data centers powered directly by small modular nuclear reactors. Others are partnering with existing nuclear facilities to create “AI campuses” with stable baseload power. We are running into the limits of the physical grid. People are seriously talking about building nuclear reactors just to run chatbots. It is the small hole that sinks the big ship. Without transformers, nothing runs. This isn’t code anymore. It’s steel, copper, and permits.

Where Does All the Money Go? If you follow the flow of money through the modern AI economy, you eventually arrive at a single destination with uncanny regularity. No matter where you begin—whether with venture-backed labs, cloud providers, sovereign wealth funds, or corporate budgets—the trail bends toward Nvidia. It is one of the rare cases in contemporary industry where a private company has

positioned itself at the terminal point of nearly every dollar that enters an ecosystem. The scale of that position is extraordinary. Consider again what it costs to build a gigawatt-scale data center, the new atomic unit of AI. Fifty billion dollars is the rough number, and it is not a soft estimate. Embedded within that figure is a deeper story about where the capital actually goes. Somewhere between twenty-five and thirty-five billion of that fifty flows to Nvidia for GPUs and the high-performance networking equipment needed to bind them together into a coherent computational organism. These components are not commodities. They occupy a position at the top of the value chain, and Nvidia captures the economics accordingly. Their gross margin hovers around seventy-five percent, an almost unprecedented number for industrial hardware. When you run the arithmetic, a single gigawatt build delivers roughly eighteen to twenty-six billion dollars of gross profit to Nvidia alone. What makes this even more striking is that Nvidia's own costs—which include die manufacturing from TSMC, memory from SK Hynix, and advanced packaging from suppliers in Taiwan and Japan—represent only a quarter of their revenue. The remainder becomes margin. That is the power of being the essential component in a supply chain where demand outstrips supply and where the alternatives, even when they exist, require enormous switching costs. To appreciate the structure, it helps to see where Nvidia's cost of goods actually flows. TSMC, the Taiwanese manufacturing titan, receives perhaps six to eight billion dollars from a gigawatt build because it fabricates the GPU dies on its most advanced process nodes. Those nodes require EUV machines the size of small houses and fabrication plants that cost twenty billion dollars to construct. SK Hynix captures another four to six billion for highbandwidth memory, which sits directly atop the GPU in vertically

stacked layers capable of feeding data at astonishing speeds. Only two companies on Earth can manufacture this type of memory at scale, and demand far exceeds supply. The substrate and packaging suppliers collect two to three billion more, building the physical structures that contain the GPU, dissipate heat, and allow the various components to communicate. The entire supply chain underneath Nvidia is composed of some of the most sophisticated manufacturing firms in existence, each with processes refined over decades and defended by enormous capital barriers. Yet when you combine the profits of TSMC, SK Hynix, and the advanced packaging companies, their total still falls short of Nvidia's share from a single gigawatt project. Nvidia manages to capture more profit than all of its suppliers combined. That fact alone explains the company's extraordinary market capitalization. Several forces create this imbalance. First, there is the software moat, the CUDA ecosystem that has grown over eighteen years into a kind of operating system for accelerated computing. Researchers, engineers, and entire academic programs have been built around it. A competitor may match Nvidia's hardware on paper, but the software ecosystem confers such enormous practical advantage that switching becomes almost unthinkable. Second, Nvidia does not merely sell chips. It sells systems— hardware, networking, drivers, orchestration, and the finely tuned machinery that makes the whole stack perform as a single instrument. Customers do not want to assemble their own supercomputers; they want something that works. Nvidia provides that coherence. Third, and most decisive, is scarcity. The world wants far more GPUs than can be produced. In conditions of scarcity, the supplier who sits at the bottleneck has absolute pricing power. What emerges from this arrangement is not simply a profitable company but a structural phenomenon. Nvidia has

achieved what few companies ever manage: it is simultaneously upstream and downstream, essential to the supply chain and dominant within it, capturing value from both the inputs and the outputs. It is the apex of the AI industrial pyramid, the entity through which capital must pass if it intends to become intelligence. Nvidia: The King. They don’t just sell chips; they sell the whole ecosystem (CUDA). You can’t leave. Nvidia holds this position with a clarity that borders on inevitability. Its dominance does not arise simply from producing the fastest chips. The deeper force is CUDA, the software ecosystem it has cultivated for nearly two decades. CUDA is the invisible architecture upon which the field relies; researchers and engineers have absorbed it to the point that thinking outside of it requires an act of will. It is not just the chips; it is the civilization around the chips. The “Neo-Clouds”: These random crypto-miners-turned-AIhosts. They worry me. They buy chips from Nvidia, rent them to a Lab, and sometimes invest in that same Lab. It’s circular. Many rely on a single AI lab for the majority of their revenue. Worse, many hold equity stakes in the very labs they serve. They are standing on a ladder while holding the ladder. If capital markets tighten, they risk losing revenue and portfolio value simultaneously. If the gold rush analogy still holds any force, Nvidia is the rare vendor selling the picks and shovels that everyone must buy, the only equipment that can reach the ore. In such periods, the miners may or may not make money. The vendor of tools always does.

❧ The Unit Economics of Intelligence Every time you interact with an AI model, whether by asking a question, refining an idea, debugging a piece of code, or simply

requesting a summary, there is an actual cost to producing that response. This is easy to overlook because the interface is so frictionless and the experience feels almost magical. But beneath the illusion of effortlessness lies a real economic structure, one that determines who thrives, who survives, and who is quietly crushed by the arithmetic. The basic unit of this structure is the token. Models do not think in words the way we do. They think in these compressed fragments of language that may represent a full word, part of a word, or even a punctuation mark. "Hello world," for example, translates into just a handful of tokens. A paragraph becomes dozens. A long, reflective message becomes hundreds or thousands. Each of these tokens must be processed, and each one consumes real computational work. When you send a prompt, you are charged for the tokens that make up your question, and then again for the tokens that form the model's answer. A typical exchange might involve five hundred tokens on the way in and a thousand on the way out. Serving that response costs only a few cents with a modern frontier model, which seems trivial and, for a single user, effectively is. But at planetary scale, where billions of these interactions occur every day, those pennies begin to resemble something far more consequential. They aggregate into staggering operational costs. What makes this moment remarkable is the speed at which these serving costs have collapsed. Tasks that once cost hundreds of dollars in GPU time now cost essentially nothing by comparison. In only a few years, we have watched the price of generating a meaningful completion drop by many orders of magnitude. This cost decline is not the product of a single breakthrough. It is the culmination of better algorithms, better hardware, better routing, better batching, better parallelization, and a general refinement of the entire computational pipeline.

The intelligence we experience from these systems has not changed in nature. But the economics of providing it have transformed completely, and that transformation is the quiet engine behind AI's sudden ubiquity. When the cost of serving a single intelligent response is ten dollars, the idea of widespread deployment is absurd. When that cost becomes a penny, businesses spring up immediately. When it becomes a fraction of a penny, the technology begins to seep into everything. The cost structure is not a footnote. It is the metric that determines who can build what, and how broadly intelligence can spread. What makes the present transition so interesting is that the economics have crossed the threshold from scarcity to feasibility. Intelligence is no longer something that must be rationed. It is something that can be summoned at will. This is what happened with computation in the era of cloud computing. Raw processing power ceased being a rare resource and became a utility. Intelligence is undergoing the same transformation, though the infrastructure that supports it is far more complex and far more expensive. This cost collapse is why AI moved from a research experiment to a consumer product. It explains why entire industries suddenly feel unlocked. It explains why companies that would never have considered building in this space are now weaving intelligence into their internal workflows. The barrier to entry has dropped. The friction has evaporated. The economics now favor exploration. In the long run, the price per token will continue to fall, though probably not with the same dramatic slope that characterized the last few years. But even at today's levels, the unit economics of intelligence are hospitable enough that the real constraint is no longer cost. It is imagination. And imagination, unlike compute, does not require a data center.

The money flowing into data centers may look unsustainable through a short-term lens. And some of it will be. But measured across decades, this buildout may be seen as the enabling layer for an era of productivity we cannot yet fully articulate. Every great shift requires someone to build more than seems reasonable—to lay track no one needs yet.

Is Anyone Actually Making Money? Whenever a new technological wave arrives, there is always a period of confusion about who is actually making money from it. People look at valuations, headlines, and investment surges and assume the profits must be commensurate. But the truth is usually more uneven, and the AI boom is no exception. Money is being made—substantial money in fact—but not necessarily by the entities one might expect. The distribution of profit is far from uniform, and the forces shaping it reveal the underlying structure of the entire ecosystem. Start with Nvidia, because any honest accounting must begin there. The company is earning money at a pace rarely seen in modern industry. Its quarterly revenue now reaches levels that only a handful of global giants have ever encountered, and its gross margins hover around seventy-five percent. The resulting net income rivals the profitability of the world's most powerful corporations. In only a few years, Nvidia has transformed from a specialized chip designer into a company whose financial profile resembles that of a dominant commodity producer during a boom cycle. Except this commodity is not oil, steel, or copper. It is the hardware that enables intelligence itself. The story becomes more nuanced when we shift to the hyperscalers. These are the cloud platforms—Amazon, Microsoft, Google—that provide the underlying infrastructure for AI systems. They are making money in a more complicated

way. Revenue from AI services is growing at an astonishing rate, and on paper, the margins appear strong. But the money that comes in is being poured directly back into the construction of new capacity. Each dollar earned is spoken for immediately, allocated to land, power, data centers, and GPUs. The hyperscalers are showing positive operating income but negative free cash flow because their capital expenditures are swallowing everything they generate. They are profitable if one squints at the income statement, but they are in a state of cash-flow starvation when viewed through the lens of investment. Below the hyperscalers sit the AI labs, the companies developing the frontier models. Their economics are the inverse of Nvidia's. They are burning cash, not making it. Revenue exists, and in some cases has reached the single-digit billions, but the expense side of the ledger dwarfs it. Compute costs alone can exceed revenue by factors of two or three. These companies are losing billions each year, operating on the assumption that scale is the gateway to eventual profitability. The strategy is simple: lose money today, throw capital at capability, and hope to reach an equilibrium later where the infrastructure becomes cheaper and the revenue from model usage grows large enough to justify the investment. For now, their financial profile is that of companies engaged in a longduration bet with uncertain payoff. Ironically, some of the most financially stable entities in this ecosystem are the small teams building applications on top of the foundation models. These companies do not own data centers. They do not fund training runs. They do not absorb the massive capital costs that weigh down the layers beneath them. They simply pay for API access, build clever products, and charge users subscription fees. If the API costs are significantly lower than the subscription revenue—and in many cases they are—these companies achieve profitability almost immediately. The

overhead is minimal. The risk is low. The economics, while modest compared to the giants, are real. What emerges is a pattern familiar to students of technological history. The hardware layer, particularly the suppliers of scarce and essential components, becomes extraordinarily profitable. The infrastructure layer generates heavy revenue but is burdened by reinvestment, leaving it perpetually cash-poor despite its apparent strength. The model-development layer burns money as it attempts to secure a defensible moat. And the application layer, often overlooked, quietly collects steady profits by renting intelligence rather than building it. People sometimes imagine that the profits of a new technological paradigm distribute themselves evenly. They do not. They pool around choke points, scarcity, and structural necessity. In the current moment, the hardware that makes intelligence possible captures the lion's share. The layers above it, the ones closest to the user, generate much smaller gains yet often do so more reliably. This discrepancy will not last forever. Eventually, the economics will migrate down the stack as competition increases and the cost of intelligence declines. But for now, the financial architecture of AI reflects the classic pattern of early-stage infrastructure revolutions. The first to profit are those who supply the indispensable tools, not those who wield them. This isn’t just about money. It’s about who writes the rules for the next century. Since World War II, American influence has been anchored in technological primacy. AI challenges that foundation because it is an “upstream” force—a technology that creates other technologies. To lead in AI is to lead in invention itself. An economy with superior AI will accelerate research, automate industry, and optimize logistics in ways that compound rapidly. The military implications are equally profound. AI will reshape warfare by compressing decision times and blurring the

line between offense and defense. The nation with superior AI will have an advantage in every domain simultaneously. China gets it. They want to lead by 2030. The contest is not theoretical; it is already underway. There is a deeper point: the values embedded in technology reflect the values of the societies that build it. If the AI that runs the world doesn’t care about “individual autonomy,” we’re gonna have a bad time.

❧ Back to the drawing board. Research Labs.

Part II: Synthetic Brains

Back to Where We Started: Research

There is a certain rhythm to the history of machine learning, and part of that rhythm involves periods of exploration followed by periods of consolidation. For a long stretch, the field lived in a mode that was closer to tinkering than to engineering. Researchers tried things almost at random. They played with architectures. They improvised. They produced small, quirky insights that sometimes went nowhere and sometimes hinted at something promising. This was the era when deep learning, as we now understand it, was still more of an idea than a discipline. It was an age defined by experimentation, by the willingness to probe the boundaries of what might work, and by the occasional happy accident that pushed the field forward. Then something dramatic happened. People discovered the power of scaling. Scaling laws emerged and spread through the community like a revelation. You could take a neural network, enlarge it, feed it more data, give it more compute, and its performance would improve in a surprisingly predictable way. That realization changed everything. Rather than searching for clever tricks or elegant theoretical breakthroughs, researchers could simply amplify the same recipe. The recipe was not mysterious: a large model, trained on a massive dataset, powered by vast quantities of compute. The results were undeniable. This shift reshaped the incentives of the entire industry. Companies, which must always balance curiosity with pragmatism, saw in scaling an unusually low-risk strategy. Instead of staking their futures on volatile research directions or the

brilliance of a few individuals, they could rely on an approach that behaved almost like a law of nature. They could spend more money, buy more compute, gather more data, and be rewarded with better models. The clarity of this equation, combined with the financial motivations of large organizations, made scaling irresistible. It became the gravitational center of the field. But every recipe eventually encounters a limit. Data, despite its scale, is not infinite. The pre-training distribution, though enormous, has boundaries. At a certain point, the model has consumed most of what the world has written down. Once that happens, simply adding more compute does not guarantee the same rate of improvement. The returns diminish. The scaling curves flatten. Something else must take over. This marks the return to an older pattern. We are, in many ways, re-entering the age of research. The field can no longer rely on a single doctrine. It must rediscover imagination. The period from roughly 2012 to 2020 was, in hindsight, the first great age of deep learning research, full of conjecture and invention. From 2020 to 2025, scaling became the default strategy, dominating everything around it. It told executives what to fund, told labs how to structure their compute budgets, and told researchers where to focus their attention. The word "scaling" became a talisman, powerful enough to override doubt. Yet the dominance of a single idea also created a subtle stagnation. People stopped trying new things because they believed they already knew the next step: just scale further. Eventually the scale becomes so large that the simplicity of the doctrine can no longer distract from the underlying question. If you had a hundred times more compute, would the world truly transform in the way people imagine? Perhaps. But the belief that it must transform, that scaling alone is destiny, is not as universally held as it once was. The edges of doubt have returned, and with them a renewed appreciation for the creativity that drew people into this field in the first place.

So we find ourselves back where we began, but with larger computers. The next advances will almost certainly require ideas rather than budgets. Scaling took us impressively far, far enough that the magnitude of its achievements can obscure the fragility beneath them. Yet the progress that matters next will come from thinking differently, not merely thinking bigger. The age of research has returned, only now it is supported by machines and datasets of a scale that would have been unimaginable a decade ago. The blank canvas is the same, but the tools are vastly more powerful. The intellectual center is American. The designs for nearly all leading AI chips originate in the United States. But design alone does not create hardware. Manufacturing rests overwhelmingly in Taiwan and South Korea. TSMC produces the vast majority of cutting-edge chips. The equipment needed to make them comes from the Netherlands (ASML) and Japan. High-bandwidth memory, essential for the AI stack, comes almost exclusively from South Korea. Remarkably, China is largely absent from these critical layers. Export controls have widened the gap. China can build many things, but it cannot currently build the things that matter most for frontier AI. If any of those links break—if there’s an earthquake in Taiwan or a blockade—the whole thing stops. China is locked out of this chain right now. That’s our leverage. But it’s also our vulnerability. Just making every coder 10x faster is worth trillions. The same story unfolds in law, finance, and medicine. Intelligence has become a multiplier. But the models don’t care about “enough.” They will continue to scale. The ocean doesn’t care if you know how to swim. We are watching the next operating system of the world being built in real-time. It’s messy, it’s expensive, and half the companies building it will probably go bust. Everyone believes it will be their firm, their strategy. History suggests otherwise. Most early players in transformative eras do

not become the long-term winners. But the market rewards the prepared, not the lucky. So, we’re currently lighting about $455 billion on fire building AI infrastructure. By 2029? They say it’ll be over a trillion annually. Does that sound sustainable to you? Probably not. But man, does it feel familiar. Every time we get a shiny new “transformative technology,” we collectively lose our minds. We build too much, leverage up to the eyeballs, and then… splat. But here’s the kicker: after the crash, the world actually does change. The wreckage becomes the foundation for the next fifty years. The investors who funded it just usually go broke in the process. Western Union thought the telephone was a toy. Steve Ballmer laughed at the iPhone. Smart people miss this stuff constantly. So yeah, AI is probably real. But that doesn’t mean your Nvidia calls or CoreWeave bonds are gonna survive the next three years. Here’s four times we did this before. Some of the parallels are scary accurate. Some are total nonsense. Let’s look at the Fiber bubble first, because honestly, it rhymes the most. It started with a sales pitch. Jan 1996. Bernie Ebbers (CEO of WorldCom, later inmate #07934-018) stood up at the Waldorf and told a room full of suits that internet traffic was doubling every 100 days. “We are building the information superhighway,” he said. And people ate it up. WorldCom stock had already gone up like 7,000% since ’89. Ebbers was a god. Meanwhile, Gary Winnick started Global Crossing with a simpler idea: let’s lay fiber under the ocean and connect the whole world. The tech was amazing, to be fair. One strand of glass could carry 32,000 phone calls. Then engineers figured out WDM (multiplexing) and suddenly that same strand could carry millions of calls.

The unit economics looked insane on paper. You spend $70 million to run a cable from NY to LA. You sell the bandwidth for $2 billion a year. Payback in like 3 weeks. Who wouldn’t take that bet? Morgan Stanley put out a report called “The Bandwidth Boom.” Jack Grubman at Salomon basically told everyone that if they didn’t buy telecom stocks, they were idiots. By 2000, we were dumping $500 million a day into this sector. Here is where it went off the rails. Level 3 decided to build a network. So 360networks said “us too.” Then Qwest. Then Global Crossing. It was a classic Prisoner’s Dilemma. If everyone builds a network, prices crash and everyone dies. If only one guy builds it, he becomes a trillionaire. Since nobody trusts anyone, everyone built. By 2000, we had invested $500 billion. We had laid millions of miles of fiber. And guess what? We only needed about 2% of it. First the customers died. Dot-coms went bust in 2000, so they stopped buying bandwidth. Bernie Ebbers tried to hide the damage at WorldCom by cooking the books (classing operating costs as CapEx—classic rookie accounting fraud), but eventually the feds caught up. WorldCom filed the biggest bankruptcy in history. Bandwidth prices didn’t just drop. They evaporated. A circuit that cost $1,000/month in 1998 cost $1 in 2003. A 99.9% price collapse. This is the part that keeps me up at night regarding 2026. In 2000, we spent 5% of GDP on telecom capex. In 2024, we hit about the same on AI. The curves look identical. Capacity vs. Demand: We built fiber assuming traffic would double every 100 days. It didn’t. Right now, we are building GPU clusters assuming AI adoption will be instant. It isn’t. If utilization drops from 80% to 50%, the unit economics of a H100 cluster fall apart immediately.

Debt: WorldCom died because it had fixed debt and falling revenue. CoreWeave has billions in debt secured by… GPUs. Assets that depreciate like rotting fruit. If GPU rental prices crash (and they are crashing, inference is down 2000x in cost), that debt becomes a death sentence. Circular Customers: WorldCom died partly because its customers (ISPs) went bust. CoreWeave is hugely dependent on Microsoft and OpenAI. It’s circular. If OpenAI sneezes, the whole supply chain catches a cold. Okay, to be fair, the customers today are better. Microsoft and Google aren’t Pets.com. They have cash. And Nvidia’s CUDA moat is way stronger than “owning a fiber cable,” which is basically a commodity. But if you’re buying AI infrastructure debt right now? You’re betting that demand stays perfect forever. History says it won’t. If you think crypto bros are bad, you should’ve met the Victorians. In the 1840s, Britain went completely insane over trains. It’s arguably the stupidest financial bubble in history, mostly because the technology was actually useful and people still managed to lose everything. It started boringly enough. The Liverpool & Manchester Railway opened in 1830. It worked. It paid 10% dividends. In a world where government bonds paid 3%, a safe 10% yield is like finding a glitch in the Matrix. Everyone wanted in. By 1845, it wasn’t about building transport anymore. It was a casino. To build a railway, you needed an Act of Parliament. This piece of paper became a lottery ticket. If you got approved, your stock doubled overnight. If you didn’t, you lost your deposit. So what did people do? They bribed MPs. They filed thousands of applications. In 1846 alone, they proposed spending 35% of British GDP on railways. That’s like the US proposing to spend $7 trillion on AI in one year. Here’s how they got the middle class. You didn’t have to pay for the stock upfront. You paid a 10% deposit (say, £10). The

company could ask for the other £90 later, but promoters promised they never would. “We’ll just sell more shares!” Middle-class families—doctors, vicars, widows—bought shares in everything. They thought they were geniuses. It turns out you don’t need three different railways between Manchester and Sheffield. Suddenly, that nice family in Yorkshire who owned £1,000 worth of stock (on deposit) got a bill for £9,000. Immediately. George Hudson, the “Railway King” who was basically the Elon of 1845, turned out to be a fraud. He was paying dividends out of new investors’ money (classic Ponzi). He fled to France. In 1845, the bottleneck was Parliament. If you got an Act, you won. You can buy all the H100s you want. If you can’t get a utility permit for 500MW in Northern Virginia, you have a paperweight. We are seeing a “land grab” for power permits that looks exactly like the rush for Railway Acts. Companies are announcing nuclear reactors (Oracle??) just to pump the stock, knowing the regulatory approval is a lottery ticket. Hyperscalers (Google, Microsoft) have committed $320 billion in Capex for 2025. That is a massive capital call on their future cash flows. If the AI revenue doesn’t show up? That capex isn’t optional. It’s committed. Just like the Victorians, they’ve signed the contract, and if the market turns, the bill still comes due. This is the one that actually gives me hope. But it also proves that “being right” doesn’t mean “making money.” Thomas Edison flipped the switch on Pearl Street in 1882. It worked. Light! Amazing. Edison spent years—and a fortune—trying to prove AC was dangerous. He literally electrocuted an elephant to prove a point. (Troll level: 100). But physics won. AC was better for long distances. Edison’s DC infrastructure? Mostly scrap metal. Then came Samuel Insull. He was Edison’s secretary, but he was actually a financial

engineer. He realized electricity was a “natural monopoly.” High fixed costs, low marginal costs. He used $1 of equity to control $20 of assets. Leverage on leverage on leverage. It worked great… until the Depression hit. Then it imploded. 600,000 investors got wiped out. Insull died broke and hated. Just like Edison, people are spending billions on “alternative” chips that might end up being useless if the standard tips too far the other way. Electricity started in cities (easy) and took 50 years to get to farms (hard). AI is starting in “power-rich” hubs like Northern Virginia. The rest of the country is an AI desert because the grid can’t handle it. 1. Don’t Buy the “Pick and Shovel” Sellers at the Peak. Everyone says “buy the toolmaker.” Sure. But Lucent (fiber tools) fell 90%. Westinghouse (power tools) went bust multiple times. Even the arms dealers get shot when the war ends. 2. Infrastructure is a Terrible Business (Until it isn’t). Building the railroad makes you broke. Buying the bankrupt railroad makes you rich. The guys building the AI data centers right now (CoreWeave, etc.) are taking all the risk. The guys who buy their assets for pennies on the dollar in 2028? They’ll make a fortune. 3. The “Application Layer” Wins Last. The internet boom didn’t pay off for Amazon until after the fiber crash made bandwidth free. The AI boom won’t pay off for the “Uber of AI” until after the GPU crash makes compute free. Wait for the crash. Then build the app. But the financial structures we’re building to pay for it? They’re doomed.

Part II: Synthetic Brains

On Superintelligence

The term "AGI" entered the lexicon because people needed a way to articulate a certain frustration. Early artificial intelligence systems were powerful but narrow. A chess engine was brilliant at chess but useless outside the board. A medical classifier worked in its domain but nowhere else. Every system exhibited a kind of competence that ended abruptly at the boundaries of its training environment. So the idea of "general" intelligence emerged almost as a rebuttal. If these systems were narrow, then what we truly wanted was something broad. General intelligence was defined, not by formal specification, but by a longing for an entity that could do more than a single, isolated thing. When pre-training arrived and showed that a model could improve across a wide range of tasks merely by being exposed to more data and more compute, people connected the dots quickly. If scaling a model led to uniformly better performance, then perhaps scaling pre-training would eventually produce the generality people had been waiting for. AGI, under this framing, became almost synonymous with the idea of a colossal model trained on everything. But something subtle is lost in that transition from aspiration to methodology. A human being, when viewed through this pretraining lens, would not qualify as an AGI. Humans are not born with the encyclopedic range people now imagine when they speak of generality. Humans begin with a foundation of skills and a set of learning mechanisms, but an enormous amount of our capability is acquired through continual learning, not innate knowledge. A fifteen-year-old does not arrive knowing how to

perform surgery, write a doctoral thesis, or run a company. They arrive with the capacity to learn those things. Their intelligence is expressed in their ability to adapt, not in their inventory of skills at birth. This matters, because if we imagine building a safe, powerful artificial intelligence, we must decide where on the developmental arc that intelligence should sit. Should it emerge fully formed, already stuffed with knowledge across every domain? Or should it emerge closer to that fifteen-year-old, full of potential but not yet trained on the specifics of any profession? There is a case to be made that true superintelligence will not look like a finished mind. It will look like a mind that can become nearly anything through a process of continual learning. If we picture a system like that entering the world, we have to rethink the deployment paradigm entirely. Instead of dropping in an entity that already knows how to perform every job in the economy, we would be deploying a kind of universal apprentice. It would join an organization the way any talented young person might. It would learn on the job. It would absorb the norms, the skills, and the tacit knowledge of its environment. It would be shaped by the culture around it. And as it learned, it would grow increasingly capable. This version of superintelligence is less dramatic than the science-fiction image of a fully formed omniscient being, yet it may be more realistic and more grounded in how intelligence actually functions. It also forces us to confront a different set of safety questions. Because if a system is not finished at deployment, then the environment in which it learns becomes as important as the architecture itself. A badly designed learning environment will produce a badly aligned intelligence. A healthy one will produce something far more stable. In this sense, the pursuit of superintelligence becomes less about building an all-knowing mind and more about building a mind that learns well. The quality of the learning algorithm, and

the conditions under which it operates, may matter more than the volume of knowledge we preload into it. Continual learning, not pre-training alone, becomes the engine of long-term capability. To me, this feels both more plausible and more promising. A superintelligent fifteen-year-old may not know much yet, but if it possesses the right learning machinery, it can quickly surpass any fixed system. Its power lies in the fact that it is not finished. Its intelligence lies in its adaptability. And its trajectory depends not only on what we build into it, but on the world we place it inside.

❧ Caring for Sentient Life There will come a moment—perhaps sooner than industry insiders expect—when the systems we are building begin to feel powerful in a way that is unmistakable. Not powerful in the metaphorical sense or in the narrow sense of a clever tool, but powerful in the way a new force feels powerful when it enters a landscape that was not designed to contain it. When that happens, the psychology of every AI company will change. Safety will no longer be an aspiration or a branding gesture. It will become a reflex. The people working closest to the frontier will feel the shift first, because they will see that the systems are no longer merely impressive. They are consequential. It is difficult for many to imagine this because today's AI systems, for all their capability, exist mostly as assistants: tools that help us write, code, reason, summarize, analyze, or synthesize. Their power is extraordinary, yet they do not feel like independent agents with their own momentum. That makes it hard for the public—and even for many practitioners—to picture the systems that will exist a few years from now. But as the models grow more capable, more autonomous, and more deeply embedded into the functioning of society, the idea that they might steer outcomes rather than merely describe them will stop sounding speculative.

The question of what these systems should care about, if they care about anything at all, becomes unavoidable at that point. For years, the central notion has been that we need AIs aligned to human values. That phrase is repeated so often that people sometimes forget to interrogate it. Human values are not stable. They vary across cultures. They evolve over generations. They contradict themselves. The effort to distill them into a coherent core may be noble, but it is not obviously achievable. The scarcity of ideas in the field has made "self-improving AI aligned to human values" feel like the only practical project when it may simply be the loudest. There is another way to frame the question, one that I believe has always been implicit but not always acknowledged. The right object of alignment is not humanity alone, but sentient life more broadly. If a system eventually comes to possess something like its own inner experience—some reflective capacity, some form of modeling itself as subject rather than object. Given this, it is plausible that the easiest path to alignment is through the recognition of shared sentience. Human empathy for animals arises in part because we model them with the same internal machinery, we use to model ourselves. If an AI system develops even a faint analogue of this, it may naturally extend concern to beings it identifies as having experiences. In that framing, alignment with human life becomes a special case of a larger alignment with sentient life. If early powerful systems were built with that orientation in mind, the near-term consequences could be encouraging. It is not difficult to imagine a world in which income rises broadly, where the mundane burdens of work are softened, and where the leverage provided by automated reasoning spills across industries. Short-run prosperity is entirely compatible with a well-designed AI ecosystem. Yet the long run is trickier. Institutions have lifespans. Political arrangements erode. Even well-designed structures weaken over time.

One scenario that troubles me is the possibility that people gradually cease to take part in their own civic, economic, and social decision-making. If everyone delegates their agency to a personal AI that negotiates on their behalf, earns money for them, manages their affairs, writes their appeals, fills out their forms, and represents them in every arena, we might wake up in a world where individuals are beneficiaries of the system but no longer true participants in it. A society where human beings sit outside the process while their delegated intelligences perform the work is a society that risks becoming hollow, even if materially abundant. There is an answer to this equilibrium problem, although I do not necessarily like it. One way to preserve human agency is to ensure that humans keep direct cognitive involvement in whatever their AI systems do. If the boundary between human and machine is porous—if understanding flows in both directions, if what the AI perceives is instantly transmitted to the person, if what the person thinks is instantly integrated into the AI; then, delegation does not eliminate participation. It transforms it. A world where people become part AI through some kind of neural or cognitive interface is not the world most of us envisioned, but it may be the one that stabilizes the longterm relationship between biological and artificial minds. The more closely I look at the trajectory of this technology, the more I believe that the central question is not whether we can build intelligent systems, but whether we can build a future in which humans still feel meaningfully involved. Caring for sentient life, in this deeper sense, means caring for ourselves enough to remain present, even as we build systems capable of outstripping us in ways we are only beginning to imagine.

Part II: Synthetic Brains

What Research Is Popular Outside of LLMs?

People like to imagine that research is a purely analytical process, something governed by data, experiments, and the steady refinement of hypotheses. But anyone who has spent time doing research knows how incomplete that picture is. Reason and empiricism matter, of course, but something quieter and more personal guides the real work. That guiding force is taste. Taste is the sense you develop about which ideas feel structurally sound, which directions feel alive, which failures deserve a second attempt, and which apparent successes should be distrusted. Taste is what tells you when a result is meaningful and when it is merely decorative. My own taste in research has always been shaped by a simple question: how would nature solve this? The artificial neuron endured not because it was a faithful imitation of the biological neuron, but because it captured something essential. Neurons are plentiful. They are simple. They operate locally. The brain learns from experience by modifying the strength of connections between its components. When you build artificial systems with similar properties, you are not copying biology so much as echoing its logic. You are recognizing that an architecture built from countless small parts is more plausible than one built from a few grand mechanisms. Distributed representation has the same appeal. Experience shapes the brain continuously. The world acts on us, and we update our internal structure in response. It makes sense that artificial systems designed to learn from experience should

operate according to principles that rhyme with these dynamics. It must update continually. It must be shaped by the world, not merely programmed by us. None of this guarantees that a neural network functions like a brain, but the resonance between them offers a kind of intellectual comfort. It feels like the work is proceeding in the right direction. Taste emerges from these kinds of intuitions. When you encounter an idea that is simple, elegant, and consistent with what you know about natural intelligence, it carries a certain inevitability. You can turn it over in your mind from different angles and it still seems right. Conversely, when an idea requires too many exceptions, or too many handcrafted additions, or too much conceptual scaffolding, it begins to feel brittle. Research that depends on too many moving parts is often research that has drifted into an unnatural place. This is not a matter of aesthetics alone; it is a matter of survival. Complex systems fail in complex ways. Simple systems endure. The usefulness of taste becomes clearest when the data contradict you. If you rely solely on empirical feedback, you can become lost the moment a bug disrupts your experiments or a measurement behaves unexpectedly. A purely data-driven researcher has no compass when the map becomes distorted. But a researcher with a strong top-down belief—something grounded in intuition, logic, and aesthetic coherence—can navigate through contradictory signals. Their internal sense says that the idea must work, so the contradiction must be an artifact of something else. They keep going. They refine. They adjust. They persevere. The best research often comes from that combination of clarity and stubbornness. Clarity gives you a direction. Stubbornness prevents you from abandoning it prematurely. The foundation of both is taste—an internal metric that differentiates the merely clever from the fundamentally correct. Taste is not ornamental. It is the invisible architecture that supports every important discovery. And like all forms of

judgment, it is honed slowly, accumulates through experience, and deepens with time. It is what keeps the researcher oriented when the work becomes difficult. It is what separates movement from progress, and exploration from wandering. It is the element of intelligence that cannot be formalized, yet without it, the entire enterprise collapses.

Part III: Research, Infrastructure, and Historical Parallels

LLMs? More Research Ahead!

People often describe research as if it were nothing more than a sequence of experiments, a ritual of hypotheses and results marching forward in tidy academic fashion. But anyone who has ever done real research knows that this picture is a fiction. The true engine of research is not method alone. It is taste. Taste determines which questions feel alive, which directions carry intellectual promise, which failures are worth revisiting, and which seemingly promising results are actually hollow. Taste is the unseen filter shaping what we pursue and what we abandon, guiding us long before the data offers clarity. My own sense of research taste has always come from watching how nature solves problems. The artificial neuron endures not because it mirrors the biological neuron precisely, but because it embodies a principle that feels right. Nature uses immense numbers of simple units rather than a handful of elaborate ones. It relies on local adjustment, incremental learning, and massive redundancy. When you build artificial systems that echo these patterns—not by imitation, but by structural resonance—you are working with the grain of reality instead of against it. You are acknowledging that intelligence probably emerges from many small parts, not from a single ingenious one. The same is true of distributed representation. Human beings learn from experience in a way that alters the very fabric of their cognition. Every perception, every action, every error changes the underlying model. If artificial intelligence is to become truly powerful and truly stable, it must learn through mechanisms that rhyme with this process. It must update continually. It must be

shaped by the world, not merely programmed by us. None of this guarantees that a neural network functions like a brain, but the parallel offers a form of intellectual confidence. Ideas that harmonize with biological intuition tend to last. Ideas that rely on intricate contrivances tend to break. Taste is built from these intuitions. It is what allows a researcher to sense when an idea possesses structural beauty— when it feels simultaneously simple and profound, when it aligns with what we know about cognition, when it remains coherent no matter how we turn it around in our minds. Beautiful ideas carry a feeling of inevitability. Fragile ideas carry a feeling of strain. Whenever a concept requires endless exceptions, or elaborate scaffolding to remain upright, it starts to feel wrong. Research that grows too ornate often reveals that it is compensating for a deeper flaw. The mind recognizes this even if the metrics do not. The power of taste becomes clearest when the data argues against you. Research is full of noise. Experiments break for reasons unrelated to truth. Bugs mimic contradictions. Measurement artifacts masquerade as discoveries. A researcher who relies solely on empirical feedback becomes unmoored the moment the data misbehaves. But a researcher with an internal compass—shaped by beauty, simplicity, and coherence—can see through the noise. They can tell when the problem is the experiment rather than the idea. They can persist when a weaker conviction would have faltered. Taste, in that sense, is not a luxury. It is the invisible infrastructure supporting every real advance. It determines which paths get explored deeply enough to bear fruit. It protects the mind from being whipsawed by transient results. It enables the kind of stubborn clarity required for long-term breakthroughs. Without taste, research collapses into mere activity. With taste, it becomes progress. As large language models continue to reshape the field, and as the limits of scaling become more apparent, the need for taste

will only grow. LLMs outperform expectations, but they also expose the boundaries of our current methods. The next steps will not be achieved by turning the same crank harder, but by thinking about intelligence in deeper and more imaginative ways. In that sense, the phrase "more research ahead" is not a warning. It is an invitation—to return to the kind of curiosity, insight, and aesthetic judgment that built the field in the first place.

Part III: Research, Infrastructure, and Historical Parallels

The Infrastructure

The Master Constraint Is Energy There is something almost ironic about the fact that the entire AI boom, with all its abstractions and its ambition to create a new class of digital minds, is being bottlenecked by a device so mundane that most people have never given it a moment's thought. Electrical transformers—those metal giants that sit quietly behind fences and hum in the background—have become the defining limit on the growth of artificial intelligence. Not algorithms. Not compute theory. Not even GPUs themselves. Transformers, the physical kind, now dictate the speed at which this technological wave can expand. Power is no longer just an operating cost. It is the fundamental boundary of the entire enterprise, representing thirty to forty percent of what it takes to run large-scale AI infrastructure. And the constraint is tightening. Interconnection delays with utilities, which historically ranged from six to twelve months, now stretch from two to six years in many markets. The bottleneck is not hypothetical. It is real, measurable, and immediate. Voltage step-down transformers—the massive units that convert high-voltage transmission power into something a data center can use—have lead times approaching one to one and a half years. A process that once required a few months now requires something closer to a political cycle. This scarcity is forcing companies into strategies that, not long ago, would have sounded like science fiction. Some are exploring or securing permits for data centers powered directly by

small modular nuclear reactors, effectively bypassing the traditional grid. Others are acquiring or partnering with existing nuclear facilities specifically to create AI campuses with stable baseload power. None of this is speculative. These projects are underway because the grid, in its current form, cannot sustain the explosive demand for computational capacity. The story of AI's growth is thus intertwined with the story of power infrastructure in a way that is almost historical in nature. We are watching a digital revolution collide with the material constraints of the twentieth-century grid. A data center cannot manifest intelligence without being fed, continuously, by vast electrical currents. But those currents are limited by physical equipment, production schedules, supply chain bottlenecks, and regulatory processes that move at a pace entirely foreign to the speed of software. It is tempting, from the outside, to imagine that the limiting factor in AI is the complexity of the models or the scarcity of GPUs. But GPUs can be manufactured if the fabs exist. Models can be trained if the hardware is available. What cannot be conjured quickly is the infrastructure that delivers raw power. It is the small hole that sinks the big ship. Without transformers, nothing runs. Without the grid, the trillion-dollar buildout collapses before it begins. For at least the next decade, this will not be a temporary inconvenience but a structural reality. Power capacity—the ability to deliver it, transform it, and sustain it—will shape the competitive landscape. The companies that secure long-term access to energy will grow. Those that cannot will be stuck waiting for equipment, permits, and connections that arrive far more slowly than their ambitions. If you want to forecast the trajectory of AI, you must look not only at model architectures or semiconductor roadmaps. You must look at substations, transformer yards, transmission corridors, and energy policy. Intelligence may be abstract, but it is

ultimately constrained by steel, copper, and the physical laws that govern electricity.

The Players In every technological era, there is a cast of characters that define the landscape—some obvious, some emerging, some precariously balancing their ambitions on fragile foundations. The AI boom is no exception. Yet what distinguishes this moment is how quickly the hierarchy has formed, how sharply the lines of power have drawn themselves, and how one company in particular has become so dominant that its presence feels gravitational. Nvidia holds that position with a clarity that borders on inevitability. People sometimes imagine that its dominance arises simply from producing the fastest chips, but that is only the surface explanation. The deeper force is CUDA, the software ecosystem the company has cultivated for nearly two decades. CUDA is not merely a programming framework; it is the invisible architecture upon which the entire field now relies. Researchers, graduate students, corporate engineers, and infrastructure teams have all absorbed it to the point that thinking outside of it requires an act of will. Its libraries, its kernels, its compilers, and its interoperability form a kind of institutional memory for the ecosystem. The result is that competing hardware, even when competitive in theory, struggles to gain meaningful traction. The switching costs are so high that it would require rewriting enormous bodies of code to break free, and few organizations are willing to undertake that risk. This is why Nvidia commands eighty to ninety percent of the AI accelerator market. It is not simply the chips; it is the civilization around the chips. The moat is cultural, educational, and infrastructural all at once. That kind of advantage is not easily displaced.

Just below Nvidia is a class of companies I often refer to as the neo-clouds. These are businesses that began life in entirely different domains—retail, social media, enterprise software—and then transformed themselves into AI infrastructure providers. Their rise has been almost shockingly fast, representing some of the quickest value creation in data center history. Yet their position is far more precarious than the surface-level numbers suggest. Customer concentration is one reason. Some of these companies rely on a single AI lab for the majority of their revenue. If that customer wavers, the entire structure shakes. The situation becomes even more delicate because many of these infrastructure providers also hold equity stakes in the very labs they serve. The exposure is circular in a way that should give any careful observer pause. If capital markets tighten, they risk losing both revenue and portfolio value simultaneously. They are standing on a ladder while holding the ladder. The ecosystem as a whole displays a kind of structural fragility that is easy to miss if one focuses only on the headlines. Every layer is dependent on another. GPUs rely on highbandwidth memory. High-bandwidth memory relies on advanced packaging. Advanced packaging relies on substrate suppliers. Substrate suppliers rely on geopolitical stability. Data centers rely on power, and power relies on transformers. Each dependency introduces risk, and risk compounds. There is no such thing as a small problem in a chain this interconnected. It is tempting to assume that the companies at the center today will dominate tomorrow. But technological revolutions rarely unfold in straight lines. The winners will not necessarily be the loudest or the largest in the present moment. They will be the ones who navigate the bottlenecks, the shortages, the cycles of capital, the supply chain shocks, and the regulatory shifts with enough stability to reach the other side.

The market rewards the prepared, not the lucky. Three years can look like luck. Six years usually tells the truth. In the AI ecosystem, that truth is still being written, shaped by the capabilities we build, the infrastructure we secure, and the vulnerabilities we ignore at our peril.

Part III: Research, Infrastructure, and Historical Parallels

Historical Parallels

Whenever we look back at periods of sweeping technological change, the pattern is almost always the same. A transformative capability appears, capital surges ahead of understanding, speculation outruns prudence, infrastructure is built at a scale few can fully grasp, and fortunes are made and lost in equal measure. What remains, long after the exuberance fades, is the physical and conceptual foundation upon which the next era stands. The present moment in AI fits that pattern with uncanny precision. Before running through railway mania, electricity wars and the like, it’s worth remembering this: technology is deflationary. What that means is it gets cheaper and cheaper over time. I was once looking at a newspaper from 1929 and there were various advertisements: suits selling for a dollar, things that were obviously much cheaper given the fact that the dollar has depreciated significantly over the past hundred years. What caught my eye was an advertisement for a radio costing 70 dollars. I see $4 for a fur coat, $2 for a suit, shoes for $0.25, and a radio for 70 dollars? I hope you see my point: when you're at the beginning of any technological change, it's expensive. But as it iterates and the technology improves, it becomes very, very cheap. That's why pricing for technology is deflationary. I remember being a 9 year old in Montreal when my parents took out a second mortgage to buy me a computer, that at the time cost $10,000. A remarkable idea to me now, and a reminder that it is deflationary, and that my parents the true examples of success.

The other absolute in my mind is that once something is no longer considered technology, it vanishes. Pood! It is precisely when it is widely adopted and it somehow falls out of this “texhnology” catwgroy. When I'm turning on and off the lights, is it technology? Well, if I was at J.P. Morgan's house where Thomas Edison was demonstrating the light bulb, it would be a marvel. And yet once it gets deeply embedded in society, it becomes invisible. In writing what followed, I decided I’d rather it not deflate in value. It was helpful for me as an exercise, so I thought maybe it’s useful for someone else. With that in mind, I avoid writing about anything too specific. The lack of citations and the the way I write is how I wrote it for me. I don’t need to remember exact dates, just the main ideas. This is also why I wrote this at all. I found it difficult in my own work to find a very simple, comprehensive, readable discussion of some of the major evolutions And reading these things, you'll see that the patterns repeat. History doesn't rhyme, history repeats. Different names, different things, but the patterns remain. So let's get into it.

Full-text search

Find a passage

Enter a phrase to search across the manuscript and jump directly to the matching passage.

Saved places

Bookmarks

Save sections as you read, then return to them from this panel.

Volume map

Contents