Jim Keller: Moore's Law, Microprocessors, and First Principles | Lex Fridman Podcast #70

Length: 1h34m Link: https://www.youtube.com/watch?v=Nb2tebYAaOA

Top quotes

I read books. I've read a couple books a week for 50 years. When people write books they often take 20 years of their life where they passionately did something and condensed that down to 200 pages. That's kind of fun. And then you can go online and find out who wrote the best books. So then there's this wild selection process, and then you can read it and for the most part understand it. And then you can go apply it.

Elon's great insight is: People are how constrained. They think, "I have this thing, I know how it works, and little tweaks to that will generate something," as opposed to asking, "What do I actually want?" And then figure out how to build it. It's a very different mindset and almost nobody has it. link to quote

Notes

Q: What are the differences between a computer and the human brain. A: They're hard to compare but it's easiest if you look at a network of computers versus a single brain.

In computer engineering there's a relatively good understanding of abstraction layers that take us from atoms to transistors and all the way up to datacenters.

Instruction sets are stable for relatively long periods of time. The Intel architecture has been around for 25 years. 90% of execution time is spent on a set of only 25 op codes.

In-order execution has been replaced by fetching large numbers of instructions (>500), finding the dependency graph, and executing the independent micrographs.

The market for simple, clean and slow computers is zero.

Found parallelism - serial narrative being done out of order Given parallelism - parallel tasks that are independent like pixels, simple narratives

Found parallelism gives 10x improvement. The baseline of in-order execution takes on average 3 cycles per instruction. This comes down to 0.25 cycles per instruction with found parallelism and the predictability of the narrative.

Twenty years ago, prediction meant do what you did last time and this gave us 85% accuracy.
Then a 3-bit counter was added to track direction and a little magnitude which brought accuracy up to 92%.
Then there using execution history to see how you got to a branch.
Today we use something like a neural network which is doing deep pattern recognition of how the program is executing. That is done multiple different ways at the same time and there's a way to choose the best result.
What's coming is to look further out to understand whether the input changed for a certain computation. So like longer-time-scale independence.

Is building a processor art or science? When building a computer it's like 100 decision points with 100 options. Some people can make leaps, some can analyze well, some have intuition. A good team has lots of different kinds of people. There are interesting things to do at every level of that stack. Computer design is 99% perspiration but the 1% inspiration is really important.

Noise and determinism

A correct C program is deterministic.

When GPUs were developed for graphics they allowed varying answers.
When the High Performance Computing people started using GPUs, they really didn't like that.
These days there's plenty of noise in the data coming in so a lot of efficiency can come if allowing noise.
If you're testing something and you get a different answer each time, that's annoying for developers so everyone wants a switch to turn on determinism.

I design computers for people who want to run programs.

Now even though people want deterministic answers, the process runs a different way each time. You can run it 100 times and it'll never run the same way twice.

Organization and (re)design

Computers are built out of functional units. Organizational design becomes a computer architecture problem. He had a lot of fun reframing organizations.

Most people don't think simple enough. It's important to understand the difference between a recipe and understanding. A recipe gives the steps but understanding how to bake bread means biology, supply chain, grain grinders, yeast, physics, thermodynamics.

When people build and design things they frequently use recipes which have limited scope but if you have a deep understanding of cooking then you have a different way of viewing everything. Expertise is deep understanding, not a large collection of recipes. The thing is when you're dealing with people, recipes are unbelievably efficient to execute.

If you constantly unpacked everything for deeper understanding you'd never get anything done, and if you don't unpack understanding when you need to, you'll do the wrong thing. link to quote

Rewrite is faster but half as complicated.

If you want to make a lot of progress in computer architecture, you should do one from scratch every 3-to-5 years.

The tendency is to do it more like every 10 years. Any metric falling even though the whole processor is getting faster, is scary to sales people.

Diminishing return curves. The initial starting point is lower than the old optimized point but it'll go higher than the previous one. So you have short-term and long-term risk.

Moore's law

Keller's operational model is increase performance 2x every 2-3 years. How we deliver performance has changed. Today's shrink factor is 0.6 every 2 years, not 0.5.

It's always been gonna die.

When he first learned about Moore's law it was going to die in 10-15 years, then in 5 years, then in 10 years, then he decided not to worry about it any more.

People think of Moore's law as one thing but under the sheets Moore's law is thousands of innovations, an exponential curve made out of thousands of diminishing return curves.

A transistor today is 1000x1000x1000 atoms and maybe we can get down to 10x10x10 atoms so there's still a million times size improvement to go, plus people are working on using quantum effects.

He asked his team for a roadmap to 100x and after two weeks they only came up with 50x. So he said give it another couple weeks.

The point is you have to expect more transistors so you don't get swamped by the complexity of all the transistors you get. He shares an analogy of building a house out of bricks that keep becoming half the size.

People don't get smarter and teams can't grow that much.

Teams are limited to 100 people where everyone can know each other so designs have to get divided into pieces. Abstraction layers.

You have to think about when to shift gears on your abstraction layers.

Faster computers are used to build computers but refactoring is necessary for the design systems to keep up with transistor counts.

Computing climbing the mathematical ladder

Computing started with arithmetic, took on linear equations then matrix equations as systems scaled up. It's turning now to data topology problems.

Richard S. Sutton's paper

AI's path

Apply rule sets.
Deep search.
Train weight sets that we convolve across.
...

AI is going up this mathematical graph supported by computation and data sets.

The inner layers of a simple neural network that identifies cats is working with something like projections of a cat, elements like pointy ears and whiskers. In deep networks you can't tell what's being encoded but if you take a layer out it doesn't work.

Lex wants to consider if everything is search.

Given search space - searching in a sorted array or organized tree of information Found search space - creating a map as you search

Every order of magnitude changes the computation. - Raja Koduri

A difference in quantity is a difference in kind. Ant vs ant hill. Neuron vs brain.

What do you do with a petabyte of memory that wants to be accessed in a sparse way with the kind of computations that AI programmers want?

Early optimization is a problem. Once you've got it working, you can optimize to make it 2x faster but if you do it too early, you may waste your time. You optimize later if you need it but but should you be rewriting instead? That's creative tension.

Moore's law drove computers to be faster and smaller. Bell's law of computer classes says classes of computers (desktop, mainframe, mobile) form, evolve and may die out.

How does he feel about his role in computing? There are billions of people on this planet whose actions are unpredictable and independently happening all the time. If it weren't him someone would do it. Philosophers wonder how we will transform our world.

The two disciplines with highest GRE scores are philosophers and physicists. Both are trying to answer the question of why there's anything at all.

The universe seems highly repetitive at best. Complexity tends to increase. The physical world inherently generates mathematics from space, motion and calculations about those things. Computation started relatively pedestrian working with binary algebra but recently we do sophisticated computations where nobody understands how the answer came out.

A function was considered physical if it was predictive of new data sets. Now you can use machine learning to make predictions where any reasoning behind it is unknown.

Where are we in the universe's stack? A brain does something like 10^18th ops per second. But can a computer do 10^20th? Sure.

You think you know how you think but then you have all these ideas and you can't figure out how they happened.

If you meditate, the things you can become aware of is interesting. link to quote

What is consciousness? If you had only two neurons you'd have one motor, one sensory. At some point there's a planning system. Then there's an emotional system. We have massive numbers of these systems and a dream system nobody understands.

You can think in a way that those systems are more independent and the different parts of yourself can observe them.

Is the universe a simulation? If the universe is a computer, it's a weird one because it's so complicated to compute. Physics is like having 50 equations with 50 variables. When you get to the corner of everything its either uncomputable or uncertain. The simulation description seems to break when you look closely at it.

Will exponential improvement continue indefinitely? Imagine a smart world where everything knows you, recognizes you. The tranformations are going to be unpredictable. What does an exponential of a million mean? Nobody knows. https://en.wikipedia.org/wiki/Computronium

In terms of cost, it's all in the equipment to do it.

Elon Musk says first figure out what configuration you want the atoms in, then figure out how to put them there.

Elon's great insight is: People are how constrained. They think, "I have this thing, I know how it works, and little tweaks to that will generate something," as opposed to asking, "what do I actually want?" And then figure out how to build it. It's a very different mindset and almost nobody has it. link to quote

Self driving

The safety problem is mainly attention, which computers are great at. Computers are really good at detecting objects.

Your brain has theories about why someone cuts you off. If you tink that narrative is important then computers don't do that but if cars are ballistic objects and roads are fixed and given and you can map the world really thoroughly, then at some point computer autonomous systems will be way better than humans.

The key to robots is to maximize the givens.

After we've taken a trip for the fiftieth time we're on autopilot. Autonomous cars are always operating on givens like autopilot but they never stop paying attention.

Progress disappoints in the short-run but surprises in the long run.

Autonomous driving is gonna be a $50 solution that nobody cares about. Like GPS was "like wow" but is in everything today and nobody cares.

On the pressure from regulators around autonomous driving, he was concerned that regulators would write in particular technology but they said they were interested in the use cases and scenarios and had all the data about which injured or killed the most people.

Elon's also interested in freeing time and attention as well as safety.

The goal is to be 10x safer than people so it seems correct to scrutinize the safety bar at parity.

For AI if a general purpose processor is the baseline, a GPU gives 5x, specialized accelerators give another 2x. AI accelerators have an advantage because they're nailing the algorithm while still being programmable.

Craftsman's work

To build a good car computer there's a lot of safety processors and sensors to connect with. Elon wants to put one in every car so the cost constraint is great. All this adds up to craftsman's work.

Craftsman's work is about all these details of how to make something at each stage. Most engineering is craftsman's work and people really like to do that kind of work. Even digging ditches.

If the steps are complicated and you're good at them, it's satisfying to do them. If you're intrigued while you're doing these things you learn while you're doing it and it's fun.

Reduction to practice - abstraction layers that when reduced to practice become transistors and wires.

Factory work is not as simple as people think. Building car is hard. Placing a piece of trim within x number of seconds is too complicated to do without training and experience.

Driving cars is easy for humans because we've been evolving for billions of years. -Lex, link to quote

You think you have an understanding about what the first principles of something is and you'd talk to Elon about it and you didn't even scratch the surface. Elon has a deep belief that no matter what you do, it's a local maximum.

He tells a story of a guy who made a better electric motor and Jim didn't seem that impressed and he wondered why. Jim then said, "When the superintelligent aliens come and they going to be looking for you?" "Where is he? The guy. The one that built the motor." But craftsman's work is hard and satisfying. (so appreciate that, not the haymaker)

Elon's really good at repeatedly asking for a deeper principle, getting past the how constraints.

When they first landed two SpaceX rockets at Tesla we had a video projector in the big room and like 500 people came down and when they landed everybody cheered and some people cried. It was so cool. But how did you do that? That was super hard. Then people say it was chaotic, really? To get out of all your assumptions, you think that's not going to be unbelievably painful?

Imagine 99% of your thought process is protecting your self-conception and 98% of that's wrong. Now you got the math right. How do you think you're feeling when you get down to that one bit that's useful and now you're open and you have the ability to do something different. I don't know if I got the math right, it might be 99.9 but, it ain't 50.

I read books. I've read a couple books a week for 50 years. When people write books they often take 20 years of their life where they passionately did something and condensed that down to 200 pages. That's kind of fun. And then you can go online and find out who wrote the best books. So then there's this wild selection process, and then you can read it and for the most part understand it. And then you can go apply it.

My brain has this idea that you can question first assumptions but I can go days at a time and forget that so you have to circle back to that observation..

Weirdly we operate half-a-second behind reality. Nobody understands that one either, it's pretty funny.

Are you afraid of super-intelligent AI? We already have a highly stratified society, so suspects the domains of interest for AI will be so different from most of ours that it won't be dangerous. Is it a problem living with something on the planet that's smarter than you? That's a privileged/smart-person viewpoint because most of the planet has been living under that assumption for their whole life.

Society does have stress about the 1% but "know yourself" seems the proper dictum. There's so much unexplored space at every level.

What is the meaning of life? It seems to be what it does. The universe makes us and we do stuff.