← Back to Transcripts

Stanford AI Club: Jeff Dean on Important AI Trends

2025-11-25


AI Summary

Core Thesis

AI's transformational journey, highlighting algorithmic, hardware, and learning improvements
AI's transformational journey, highlighting algorithmic, hardware, and learning improvements (infographic)

Jeff Dean's presentation at the Stanford AI Club underscores the transformative strides Artificial Intelligence has undergone over the past 15 years, highlighting innovations from foundational neural networks to modern multimodal models. By weaving together algorithmic improvements, scalable hardware, and self-supervised learning, AI models have significantly enhanced capabilities across fields like language processing and computer vision, setting the stage for a future filled with AI-assistive advancements.


Deep Dive

The Scaling Paradigm: From CPUs to TPUs

Evolution of computational hardware from CPUs to TPUs
Evolution of computational hardware from CPUs to TPUs (diagram)

The journey of AI over the past decades has been deeply intertwined with the evolution of computational hardware. Initially reliant on traditional CPUs, machine learning tasks now demand the power and flexibility of specialized hardware like tensor processing units (TPUs). This shift marks a significant departure from discrete operations to dense linear algebra capabilities optimized for low precision, enabling massive scale-ups in training efficiency and inference speed. As Jeff Dean notes, replacing CPUs with TPUs can achieve 15 to 30 times faster processing and up to 80 times more energy efficiency, setting a new benchmark in how AI computations are approached and executed in data centers. The implications are profound, offering Google the ability to leverage large-scale models without prohibitive costs, thereby fostering rapid innovation and application.

Algorithmic Innovations: Attention Mechanisms and Sparse Models

Impact of attention mechanisms and sparse models on AI
Impact of attention mechanisms and sparse models on AI (chart)

Central to the burgeoning landscape of modern AI are techniques such as attention mechanisms and sparse modeling, which radically redefine how models utilize computational resources and process information. The introduction of the transformer architecture, encapsulated famously in the phrase "attention is all you need," highlights a pivotal shift where models leverage contextual states more efficiently than previous recurrent methods. This efficiency not only reduces computational overhead but also achieves higher accuracy. Sparse models further build on this efficiency, activating only a subset of neuron parameters tailored to specific tasks, thereby optimizing computational costs and expanding model scalability. Dean's exposition on Google’s Gemini models showcases the potential of these architectures in crafting adaptable and robust systems fit for myriad applications.

The Economics of Scale: Training and Model Implementation

As AI models grow more sophisticated, the economics of their scaling become a critical concern, influencing both design and deployment strategies. Dean illustrates this with Google's strategic approach using tools like Pathways, which orchestrates large-scale computations across multiple high-speed networks. This not only optimizes resource allocation but also democratizes access to expansive computational environments through simplified abstractions. Furthermore, advancements such as model distillation highlight the economic benefits of training massive models using minimal data input, providing substantial accuracy boosts even in smaller implementations. Such methods significantly impact how companies plan resources for AI development, emphasizing cost-effectiveness without compromising on performance.

Self-Supervised Learning: The Backbone of Language Models

In the realm of language processing, self-supervised learning has emerged as a cornerstone methodology, transforming vast amounts of unlabelled text data into valuable training inputs. This paradigm utilizes context-based predictions to enhance model understanding and linguistic capabilities, as seen in performances on tasks like the Stanford Blank task. The capacity to predict missing words empowers models with nuanced comprehension skills, fostering breakthroughs in language modeling accuracy and practicality. This technique exemplifies both the scalability of training input and the versatility of language models in applications ranging from simple translation tasks to complex reasoning challenges.

The Multimodal Model Revolution: Gemini's Vision

Google's latest endeavors in multimodal models, as epitomized by the Gemini series, represent a bold leap forward in AI capability. These models break boundaries not just by processing textual data but by integrating diverse inputs—video, audio, and imagery—into a cohesive system that can produce similarly varied outputs. This goes beyond traditional AI tasks, venturing into realms like media creation and dynamic coding solutions, showcasing a future where AI systems can operate as general-purpose assistants across domains. Dean's enthusiasm for these developments is a testament to their potential in making profound changes in how experts and laypeople alike interact with technology, offering personalized and scalable solutions to increasingly complex problems.


Bottom Line

Jeff Dean's discourse paints a vivid picture of AI's metamorphosis into a transformative force across industries, propelled by innovations in computational hardware, algorithm design, and training methodologies. Attention mechanisms and multimodal integration redefine what AI models can achieve, providing broader accessibility and operational versatility. As AI continues to evolve, there is a dual imperative to harness its potential for societal good while vigilantly safeguarding against its risks, especially in misinformation and privacy concerns. For stakeholders in AI, these revelations demand a critical look at how technologies can be responsibly developed and integrated into everyday life, shaping a future that balances innovation with ethical considerations.

Full Transcript

Stanford AI Club: Jeff Dean on Important AI Trends

[00:00] For a quick intro on Jeff, Jeff joined Google in 1999 as his 30th employee, where he built some of those foundational infrastructure that powers on the modern internet, including not produce, Big Table, and Spanner. Jeff went on to found Google Brain in 2011, where he developed and released TensorFlow, one of the world's most popular deep learning frameworks. Jeff now serves as the chief scientist of Google DeepMy and Google Research, where he leads to the Gemma team.

[00:03] here today. And you take away. Thank you. Fantastic. Okay. So what I thought I would do today is talk to you about important trends in AI, sort of a whole bunch of developments that had happened mostly over the past 15 years or so. And, you know, how those kind of have fit together well into building sort of the modern, capable models that we have today.

[00:06] This is presenting the work of many, many people at Google, and some of it is also from elsewhere. But I'm sometimes just the messenger, sometimes a collaborator and developer of some of these things. So first, a few observations. I think in the last decade or so, machine learning has really completely changed our expectations of what we think is possible with computers. Like 10 years ago, you could not get very natural,

[00:09] speech recognition and conversations with your computer. They weren't really very good at image recognition or understanding what's in visual form. They didn't really understand language all that well. But what has happened is we've discovered that a particular paradigm of deep learning-based methods, neural networks, and increasing scale, has really delivered really good results as we've scaled things up. And along the way we've developed really

[00:12] new and interesting algorithmic and model architecture improvements that have also provided these massive improvements. And these are often kind of combined well. So even bigger things with even better algorithms tend to work even better. The other thing that's been a bit of a significant effect in the whole computing industry is the kinds of computations we want to run and the hardware in which we want to run them have dramatically changed. Like 15 years ago, mostly you cared about how fast

[00:15] was your CPU, maybe how many cores did it have? Could it run Microsoft Word and Chrome or, you know, traditional, you know, hand-coded program computations quickly? Now you care, can it run interesting machine learning computations with all kinds of different kinds of constraints. Okay, so a rapid-fire advance, or whirlwind tour of 15 years

[00:18] of machine learning advances. How did today's models come to be? It's going to be like one or two slides per advance. There's often an archive link or a paper link that you can go learn more. But I'm going to try to give you just the highest level essence of why was this idea important, and what does it help us with? OK, but I'm going to even go back more than that. I'll go back to like 50 years. Neuromelance, turned out these are a relatively old idea.

[00:21] notion of artificial neurons that we have weights on the edges and we can sort of learn to recognize certain kinds of patterns actually turns out to be really important. And then combined with that, back propagation as a way to learn the weights on the edges turns out to be a really key thing because then you can do end-to-end learning on the entire network from some error signal you have. And so this was kind of the state of affairs when I first learned about neural

[00:24] senior year of college. I got like really excited. I'm like, oh, this is such a great abstraction. It's going to be awesome. We could build really great pattern recognition things and solve all kinds problems. So I got really, really excited and I said, oh, I'm going to do a senior thesis on parallel training of neurons. And so what I ended up doing was like, well, let's just try to use the 32 processor machine in the department instead of a single machine and we'll be able to build really

[00:27] neural networks. So it's going to be really great. And I essentially implemented two different things that we now call data parallel and model parallel training of neural nets on this funky hypercube-based machine, and then looked at, you know, how that scaled as you added more processors. So it turns out I was completely wrong. You needed like a million times as much processing power to make really good neural nets, not 32 times.

[00:30] writing this thesis and then I went off and decided to do other things in grad school. But if this always kind of had a little inkling in the back of my mind, this could be an important introduction. So in 2012, I bumped in actually to Andrew Ng in a micro kitchen at Google. I'm like, oh, hi Andrew, how are you? He's like, what are you doing here? And he's like, oh, well, I'm starting to spend a day a week at Google and I haven't really figured out what I'm doing yet.

[00:33] here, but my students at Stanford are starting to get good results with neuralettes on, you know, various kinds of speech problems. I'm like, oh, that's cool. We should train really big neural networks. So that was kind of the genesis of the Google Brain project, was how do we scale large training of neural networks using lots and lots of computation. And at that time, we didn't actually have accelerators in our data centers. We had lots of

[00:36] lots of CPUs with lots of cores. So we ended up building this software abstraction that we called disbelief, in part because people didn't believe it was going to work. But this ended up supporting both model parallelism and also data parallelism. And in fact, we did this kind of funky, asynchronous training of multiple replicas of the model on the right hand side, where before every step with a batch of data, one of the

[00:39] replicas would download the current set of parameters, and it would kind of crunch away on doing one batch of training on that and computer gradient update. That's the delta w there, and send it to the parameter servers who would then add in the delta w to the current, the parameters that it was hosting. Now, this is all completely mathematically wrong because at the same time, all the other mathematical model replicas were also computing gradients and asynchronously adding them into this shared set of parameter state.

[00:42] So that made a lot of people kind of nervous because it's not actually what you're really supposed to do, but it turned out of work. So that was nice. And we had systems where we'd have 200 replicas of the model all turning away asynchronously and updating parameters. And it seemed to work reasonably well. And we also had model parallels where we could divide very large models across, you know, many, many computers and end up with, you know. So this system enabled us in 2012.

[00:45] to train 50 to 100x larger neural network than anyone had ever trained before. They look really small now, but at that time, we were like, oh, this is great. And so one of the first things we used this system for was what's known as the cat paper, which is where we took 10 million random frames from random YouTube videos and just used an unsupervised

[00:48] a representation that we could then use to reconstruct the raw pixels so of each frame and the learning objective was sort of trying to minimize the error in the reconstruction of the frame given the input frame to don't need any labels and in fact the system never saw any label data for the unsupervised portion but what we found was that at the top of this this this model you'd end up with neurons that

[00:51] different kinds of sort of high-level concepts, even though it had never been taught, you know, what a cat was. There was a neuron where the most, the strongest stimulus you could give that neuron was something like that. And so it had sort of come up with the concept of a cat just by being exposed to that. And also, there were also other neurons of like human faces or the backs of pedestrians or things like that. And perhaps more importantly, we got very large,

[00:54] increases in state of the art on the more thinly traded image net 22,000 category benchmark. Most people competed in the one you usually hear about is the 1,000 category one, but they're like, well, let's do the 22,000 one. And so we got actually like a 70% relative improvement in the state of the art on that. And what we are also able to show is that we did unsupervised pre-training. We actually got a pretty significant

[00:57] increase in the accuracy. We also started to think about language and looking at how we could have nice distributed representations of words. So rather than representing words as discrete things, we wanted to have a neural net-like representation for every word and then be able to learn to learn those representations so that you end up with these high-dimensional vectors that

[01:00] word or phrase in the system. And when you do that, we had a few different objectives for how do you train this. One way you can do it is use the middle representation, use the word in a sequence of words, and use its representation to try to predict the other nearby words. And then you can get an error signal and back propagate into the representations for all the words. And if you do this and you have a lot of training data, which is just raw text that you need to train this

[01:03] on, then what you find is that the nearby words in the high dimensional space after you train it are all quite related. So cat and Puma and Tiger are all nearby. But also, interestingly, we found directions were kind of meaningful. So if you subtract these vectors, you end up going in the same direction to change the gender of a word, for example, regardless of whether you start at King or you start at man, you end up being able to do that. And there's other directions for like past tenses of verbs and, you know, future tenses of verbs. So that was kind of.

[01:06] of interesting. Then my colleagues, Ilya, Oriel, and Kwok worked on using LSTMs, so these kind of recurrent, long, short-term memory models to work on a particularly nice problem match to action where you have one sequence and you're going to use that to predict a different sequence. And it turns out this has all kinds of uses in the world. So one use that they focused on in the

[01:09] the paper was translation. So you have an English sentence, say, and you're going to try to predict the French sentence. And you have a bunch of training data where you know the correct French translation of an English sentence. And so you end up using that as a supervised learning objective to then learn good representations in the recurrent model in order to do this translation task. And if you see enough English-French sentence pairs and use the sequence

[01:12] based sequence to sequence based learning objective, then you end up with a quite high quality translation system. It turns out you can use this for all kinds of other things as well, but I will not talk about that. So one of the other things we started to realize as we were getting more and more success in using neural nets for all kinds of interesting things and speech recognition and vision and language was that, well,

[01:15] I did a bit of a back of the envelope calculation with a, we had just produced a really high quality speech recognition system model that we hadn't rolled out, but we could see that it was, you know, much lower error rate than the current production speech recognition system at Google, which at that time ran in our data centers. And so I said, oh, well, if speech recognition gets a lot better, people are going to want to use it more. And so what if 100 million people want to start to talk to their phones for three minutes a day?

[01:18] random numbers pulled out of my head. And it turned out if we wanted to run this high quality model on CPUs, which is what we had in the data centers at that time, we would need to double the number of computers that Google had in order just to roll out this improved speech recognition features. So I said, well, we really should think about specialized hardware, because there's all kinds of nice properties for neural net computations that we could take advantage of by building specialized hardware.

[01:21] they're very tolerant of very low precision computations. So you don't need like 32-bit floating point numbers or anything like that. And all the neural nets that we'd been looking at at the time were just different compositions of essentially dense linear algebra operations. Matrix multipleized vector dot products and so on. So if you can build specialized hardware that is really, really good at reduced precision linear algebra, then all of a sudden you can have something that's much more efficient.

[01:24] And we started to work with a, you know, a team of people who are chip designers and, you know, board designers. And this is kind of the paper we ended up publishing a few years later. But in 2015, we ended up having these TPUV1, so the tensor processing unit, which was really designed to accelerate inference, roll out into our data center. And we were able to do a bunch of nice sort of empirical comparisons.

[01:27] and show that it was 15 to 30 times faster than CPUs and GPUs of the time, and 30 to 80 times more energy efficient. So, and this is now the most cited paper in ISCA's 50 year history. Excited about it. And then working with that same set of people, we realized that we also wanted to look at the training problem because inference is like a nice, you know, small scale problem where you can have a, a, a, you know, a

[01:30] at that time a single PCIE card you could plug in to a computer and have a whole bunch of, you know, models that run on that. But for training, it's a much larger scale problem. And so we started to design much, essentially machine learning supercomputers around the idea of having low precision, a high-speed custom network, and sort of a compiler that could map high-level computations onto the actual hardware.

[01:33] up with a whole sequence of TPU designs that are sort of progressively faster and faster and larger and larger. And our most recent one is we've changed our naming scheme. It's no longer what you might expect. It's called ironwood. But the pod sizes for this system are, you know, 9,216 chips all connected in a 3D torus. And quite a lot of bandwidth.

[01:36] and capacity. And if you compare that to TPUV2, which is our first ML super computing pod, it's about 3,600 times the peak performance per pod compared to the first one, which, to be fair, was only 256 chips instead of 9,000. But still, every individual chip is also much faster. And it's also about 30 times as energy efficient as the TPV2. Now, some of that comes from scaling of process nodes and so on, but some of it just comes.

[01:39] from, you know, looking at energy consumption in all kinds of ways and building really energy efficient systems. Another thing that's happened is open source tools have really enabled the whole community. So we developed TensorFlow as a successor to our internal disbelief system, which we'd used for hundreds or thousands of kinds of models and fixed a bunch of things in it that we didn't like and decided to open source it when we first started building it.

[01:42] A bunch of people were working a little bit later on a system called Torch that used a language called Lua and didn't get very popular because most people don't want to program them or did not know Lua. But then they built a version called Pytorch that was Python-based that really had a lot of success. And another team at Google has been building a system called Jax that has this nice functional way of expressing machine learning computations.

[01:45] the whole community in lots of ways, like many different kinds of applied ML things are doing, using some of those frameworks. Researchers are using those, and so on. In 2017, several of my colleagues worked on this attention-based mechanism, building on some earlier work on attention, but coming up with this really nice architecture that is now at the core of most of the sort of exciting language-based models that you're seeing today.

[01:48] And their observation was really, unlike an LSTM where you sort of have a word and you consume that word by updating your internal state and then you go on to the next word. Their observation was, hey, let's not try to force all that state into a vector that we update every step. Instead, let's just be able to save all those states we go through, and then let's be able to attend to all of them whenever we're trying to do something based on the context of the past.

[01:51] And that's really kind of at the core of the attention is all you need in the title. And what they were able to show was that you could get much higher accuracy, this is from the paper, with 10 to 100x less compute, and in this case, 10 times smaller models. So this is the number of parameters on a log scale for a language model to get down to a particular level of loss. And what they were able to show is that 10 times fewer parameters in the log scale,

[01:54] a transformer-based model would get you there and also in other data in the paper they showed 10 to 100 x less compute. Another super important development has been just language modeling at scale with self-supervised data. There's lots and lots of text in the world. You know, self-supervised learning on this text can give you almost infinite numbers of training examples where the right answer is known because you have some word.

[01:57] that you've removed from the view of the model, and then you're trying to predict that word. And there are a couple of different flavors. One is auto-regressive where you get to look to the left and try to predict what's the next word, given all the words that you've seen before that. So, Stanford blank, Stanford, and the true word is university. So you make a guess for this word. If you get it right, great. If you get it wrong, you know, then you can use that as an error signal to then do that.

[02:00] back propagation through your entire model. And you know, looking at that first blank, it's not necessarily obvious that's going to be university, right? Could be Stanford is a beautiful campus or something. And so all the effort you put into doing this kind of thing makes it so the model is able to take advantage of all this context and make better and better predictions. There's another objective you can use where you get to look at a whole bunch more context, both to the left and right,

[02:03] and you just try to guess the missing words. So if you've ever played Madlibs, it's a bit like that. You know, the Stanford Blank Club, Blank Together, Blank and Computer Blank Enthusias. So you can, some of those you can probably guess, some of those are harder to guess. But that's really kind of the key for doing self-supervised learning on text, which is at the heart of modern language models. Turns out you can also apply these transfer

[02:06] former based models to computer vision. And so another set of my colleagues worked on, you know, how can we do that? And what they found again was, you know, bold-faced things are the best result for a particular row. And what they found was these two were theirs in varying sizes of configuration. But roughly, you know, four to 10, four to 20 times less compute, you could get to the best results.

[02:09] So again, algorithm improvements make a big difference here because now all of a sudden you can train something much bigger or use less compute to get the same accuracy. So I and a few other people really started to encourage some of our colleagues and gather a small group of people to work on much sparser models. Because we felt like in a normal neural network, you have the entire model activated for every,

[02:12] example or every token you're trying to predict. And that just seems very wasteful. It'd be much better to have a very, very large model and then have different parts of it be good at different kinds of things. And then when you call upon the expertise that's needed in the model, you only activate a very small portion of the overall model. So maybe one to one to five percent of the total parameters in the model are used on any given prediction. And again, we were able to see

[02:15] that this was a major improvement in time to compute to act to a given level of accuracy. That's like this line here L and M showing about an 8x improvement in training reduction in training cost compute for the same accuracy. Or you could choose to spend that by just training a much better model with the same compute budget. And then we've continued to do a whole bunch of research

[02:18] on this models because we think this is quite an important thing. And indeed, most of the models you hear about today, like Gemini models, for example, are sparse models. In order to support sort of more interesting kind of weird sparse models, we started to build computer abstractions that would let us map, you know, interesting ML models on to,

[02:21] the hardware where you didn't have to think as much about where particular pieces of the computation were located. So Pathways was the system we built that was really designed to be quite scalable to simplify running these really large-scale training computations in particular. And oh, well, so one thing, like if each of these is one of these TPU pods, there's a super high-speed network between the chips and the pod.

[02:24] that will span multiple pods. And so then the orange lines are sort of the local data center network in the same building that you can use to communicate between adjacent pods. And then maybe you have multiple buildings on the same campus where you have some network between the two buildings, the purple line, and you can even run computations where you're using multiple metro areas

[02:27] a long distance high-speed link to communicate between multiple metro areas. And one of the things pathways does is it orchestrates all this computation, so you don't, you as an ML researcher, don't have to think about, okay, which network link should I use? It sort of chooses the best thing at the best time and it deals with failures of what happens if one of these chips goes down or one of these pods goes down, things like that. And one of the things that it provides as an objection is we have a layer,

[02:30] underneath jacks that is a pathway's runtime system. And so we can then make a single Python process look like a jacks programming environment that instead of having four devices has 10,000 devices. And you can use all the normal jacks machinery to express, okay, I'd like to, you know, run this computation on all these devices. So another of my colleagues, a set of my colleagues worked on how can we,

[02:33] sort of use better prompting of the model to elicit better answers. And, you know, one of their observations was if you, in this case, we're giving the model one example of a problem, and then we're asking it to solve a different problem, but similar to the example we gave it. And if you just tell the model, here's the example problem, and it just is told to give the answer, like, you know, the answer is nine, then it doesn't do as well,

[02:36] as if you give the model sort of some guidance that it's supposed to show its work and demonstrate that in the first problem. And then it will actually go ahead and emit its, you know, show its work for the actual problem you're trying to get it to solve. And, you know, one way of thinking about this is because the model gets to do more computation for every token it emits, in some sense it's able to use more compute in order to arrive at the answer.

[02:39] also is helpful for it to be able to reason through problems kind of step by step, rather than trying to just internally come up with the right answer. And, you know, this paper showed that you got pretty significant increases in accuracy on GSM8K, which is like a middle school math benchmark, kind of like these problems, if you use this chain of thought prompting versus standard prompting. Now remember, this was

[02:42] three years ago, right? And we're really excited that we've now gotten 15% correct on eighth grade math problems of the form. You know, Sean has five toys and for Christmas he got two more. So we've made a lot of progress on math in the last few years. Another important technique turns out to be a technique I worked on with Jeff Hinton and Oriole Vigniols called distillation. And the idea here,

[02:45] is when we're doing this sort of next word prediction, you know, if you're doing self-supervised learning, you perform the concerto for blank and the correct answer in the text you're training on is violin. But it turns out if you have a really good neural network already, you can use that as a teacher. And the teacher will give you a distribution of likely words for that missing word.

[02:48] And so it turns out that you can use this distribution to give the student model much more information when it gets something wrong, right? Because it's, you know, it's likely the word is violin or piano or trumpet, but it's extremely unlikely it's airplane. And that rich signal actually makes it much easier for the model to learn quickly. And in particular, what we showed in this paper was that,

[02:51] this was a speech data set, so we're trying to predict the sound in a frame of audio correctly. And the baseline, if you use 100% of the training set, you could get 58.9% on the test frames. But if you only use 3% of the training data, then you get only 44% test frame accuracy. So a huge drop in accuracy. But if you use the soft targets and a distillation process,

[02:54] then 3% of the training data, you can get 57% accuracy. And so this is why this is such a super important technique, because you can train a really, really large model, and then you can use distillation to take a much smaller model and use the distillation targets to give you a really high-quality small model that approximates quite closely the performance of a large model. Okay, and then in the 2020s,

[02:57] I guess I should say. People have been doing a lot more reinforcement learning for post-training. So once you've already trained a model on these self-supervised objectives and so on, you then want to sort of encourage the right kinds of behavior from your model. And you want to do that in terms of things like the style of the responses. Do you want to be polite? You can give it reinforcement learning feedback or give it examples of being polite and sort of do training on that to sort of coax the

[03:00] a polite kind of answers out of the model and suppress the less polite ones. Safety property might want the model to just not try to engage with people on certain kinds of topics. But then you can also enhance the capabilities of the model by showing it how to tackle much more complex problems. And these can come from many different sources. So one is reinforcement learning from human feedback. You can use human feedback on the outputs of the model where a human can say, yeah, that's a good answer. No, that's a good answer.

[03:03] a bad answer, yes, that was a good answer. And using lots of those signals, you can get the model to approximate the kinds of behaviors your human reward signal is getting you. RL from machine feedback is you can use machine feedback from another model, often called a reward model, where you prompt the reward model to judge, you know, do you like answer A or B better, and use that as an RL signal. But then probably one of the most

[03:06] important thing is an RL and verifiable domains like math or coding. So here you can try to generate some sort of solution to a mathematical problem, and let's say it's a proof, and then you have a verifiable domain, so you can run a more traditional proof checker against the proof that the model has generated, and then the proof checker can say, yes, that's a correct proof, or no, that's an incorrect proof, and in particular it's wrong in step 73 or something. And that can give

[03:09] positive reward to the model when it reasons correctly. You can also do this for coding where you give reward for code that compiles, and then even more reward for code that compiles and passes the unit tests that you have for some coding problem, and you just have a whole slew of problems you ask the model to try to solve and get rewards for when it solves it. And so this enables the model to really explore the space of potential solutions, and over time it gets better and better at exploring that space.

[03:12] Okay, so there's been all kinds of innovations at many different levels, you know, many of which I just talked about. But I think it's important to realize everything from the hardware to the software abstractions and model architecture, training algorithms. All these things have all come together and really contributed. And I'm way behind time, so I'm going to speed up. Okay, so we've been working on Gemini models at Google, which kind of combine a lot of these ideas into pretty interesting.

[03:15] we think models. And our goal with the Gemini effort is really train the world's best multimodal models, use them all across Google, and make them available to external people as well. And just this week we released our 3.0 our pro model. We wanted it to be multimodal from the start to take all kinds of different modalities as input and also produce lots of modalities as output. We've been adding more modalities. This is from the original Necroproport.

[03:18] We've since added the ability to produce video and other kinds of things, audio. We believe in having a really large context length so the model can look at sort of lots of kinds of pieces of input and reason about it or summarize it or refer back to it. That's been pretty important. You know 2.0 sort of built on a lot of these kinds of ideas and was a quite capable model 2.5. You know, we're a quite capable model 2.5. You know, we know, we're a quite capable of, you know, we're a very important.

[03:21] was also quite a good model. And then just to show you how far the mathematical reasoning has come, we used a variant of the 2.5 Pro model to compete in the International Mathematical Olympiad this year and also last year. But this year, it was like a pure language model-based system. And we solved five of the six IMO problems correct, which gets you a gold medal there. And there's a gold medal.

[03:24] a nice quote from the IMO president. So the way the IMO works is there's two days of competition. Each day you get three problems. The third of the problems on each day, so problems three and problem six are the hardest. And this was problem three, which we did get correct. We didn't get problem six correct. But we got all the other ones correct. And so this is the problem statement. This is the input to our model. And

[03:27] this is the kind of output the model is able to produce, which kind of goes on. I think the judges like the elegance of our solution, which is nice. It goes on for a little while, and, you know, therefore we have proved QED. So I think it's pretty good to sit back and appreciate how far the mathematical reasoning capabilities these models have come

[03:30] John has four rabbits and, you know, got two more. How many rabbits do you have now? And then earlier this week, we released our Gemini 3 models. I'm really excited about it, as you can see. You know, and it performs quite well on a bunch of different benchmarks. There's like way too many benchmarks in the world. But, you know, it's a good way to benchmark to assess how good is your model relative to other ones, especially for, you know, ones that are maybe more.

[03:33] more interesting or haven't been leaked under the internet quite as much. We're number one in the LM arena, which is a good way of assessing sort of in a non-benchmark-based way where you allow a user to see two random anonymous language model responses to a prompt they give, and then the user can say, I prefer A or B. And over time, you get an aggregated score from that because you can see if your model generally

[03:36] preferred to other models. One of the things that's really happened is we had a huge leap in web dev style coding versus our earlier model. I'm going to skip this. I'm going to skip this. Well, I'll show you that. So this is an example of, you know, the word Gemini skateboarding or the word Gemini surfing. So it's actually generating code for animating all these kinds of

[03:39] things. floating to a view of a landscape. Here it is as a forest. I like that one. So the sort of, you can give very high level instructions to these models and have them write code. And it doesn't always work, but when it works, it's kind of this nice magical feeling. Here's another good example. This is, you know, someone had a whole bunch of recipes in various forms, some in Korean, some in English. And they,

[03:42] they, you know, basically just said, okay, I'm going to scan them all in. I'm going to take photos of them. Great. There we go. They're all in there. Translate and transcribe them. Awesome. Okay. And they're all transcribed. And then our next step is going to be, let's see if we can be, let's see if we can be.

[03:45] create a bilingual website using these recipes. So, go, we've now done this, and we've generated some nice imagery for it, and there you go. So now there's your website with your recipes. So it's kind of nice. It combines a whole bunch of capabilities of these models to end up with something that might be kind of useful. Users generally seem to be enjoying this. Yeah, I mean, there's lots of quotes on the web.

[03:48] We also launched a much better image generation model today. So that's been kind of exciting. People seem to really like it. It can do pretty crazy things. So you can give it, for example, turn this blueprint into a 3D image of what the house would look like. Or take the original attention as all you need figure, and please annotate it with all the important aspects of the house would look like.! Or take the original attention is all you need figure, and please annotate it.

[03:51] happen in each different spot. Mustafa is one of the people who work most on the nanobanano work. So one of the things that's interesting about it is it actually reasons in intermediate imagery. So, and you can see this in the thoughts if you use AI Studio. So the question is, you know, tell me which bucket the ball lands in, use images to solve it step by step. And so this is what the model does.

[03:54] it sort of does what you might think. The first, it rolls down there. And oh yeah, then it's going to roll the other way onto RAM3. Then it's going to roll onto RAMP 5 and then it's going to be in B. It's kind of cool. I mean, that's kind of how you would mentally do it. You know, it's pretty good at infographic and things. So it can, you know, annotate old historical figures and tell you things. Uh, I posted this.

[03:57] image of the solar system, you know, as an example, show me a chart of the solar system, annotate each planet with one interesting fact. So that's the image we did. Turns out if you do that, people are really sad. Especially people my age are a little bit younger. So that's okay. So make this 21-9 to add Pluto and add a humorous comment. You know, the former planet got demoted to dwarf planet status.

[04:00] about it. Perfect. We're so back. Okay. So in conclusion, I think, I hope you've seen in your own use of these models and also in what I've presented, that these models are really becoming quite powerful for all kinds of different things. Further research and innovation is going to continue this trend. It's going to have a dramatic effect on a bunch of different

[04:03] different areas, you know, in particular health care education, scientific research, media creation, which we just saw, misinformation, things like that. And it potentially makes really deep expertise available to many more people, right? Like if you think about the coding examples, there are many people who haven't been trained in how to write code and they can get some, you know, a computer assisted, and their vision can help them generate interesting, you know, websites for recipes or, you know,

[04:06] whatever but done well I think our AI assisted future is bright but I'm not completely oblivious like the areas like misinformation is a potential area of concern actually John Hennessy and Dave Patterson and I and a few other co-authors worked on a paper last year that kind of touched on all those different areas and look and interviewed domain experts in all those areas and you know look at ask them what their opinions were and how can we make sure that we get all the amazing benefits

[04:09] in the world for health care and education and scientific research, but also what can we do to minimize the beneficial downsides from misinformation or other kinds of things. So, that's what I got.