OpenAI’s Deep Research Team on Why Reinforcement Learning is the Future for AI Agents
OpenAI’s Isa Fulford and Josh Tobin discuss how the company’s newest agent, Deep Research, represents a breakthrough in AI research capabilities by training models end-to-end rather than using hand-coded operational graphs. The product leads explain how high-quality training data and the o3 model’s reasoning abilities enable adaptable research strategies, and why OpenAI thinks Deep Research will capture a meaningful percentage of knowledge work. Key product decisions that build transparency and trust include citations and clarification flows. By compressing hours of work into minutes, Deep Research transforms what’s possible for many business and consumer use cases. Hosted by: Sonya Huang and Lauren Reeder, Sequoia Capital Mentioned in this episode: Yann Lecun’s Cake : An analogy Meta AI’s leader shared in his 2016 NIPS keynote
- Published
- Published Feb 25, 2025
- Uploaded
- Uploaded Jun 11, 2026
- File type
- POD
- Queried
- 00
Full transcript
Showing the full transcript for this episode.
AI-generated transcript with timestamped sections.
[00:00] a lesson that I've seen [00:01] people learn over and over again in this field is like, you know, we think that we can [00:05] do things that are smarter than what the models do by writing it ourselves. But as the field progresses, the models come up with... [00:11] better solutions to things than humans do. The probably number one lesson on machine learning is you get what you optimize for. And so if you're able to set up the system such that you can optimize directly for the outcome that you're looking for, [00:24] the results are going to be much, much better than if you sort of try to glue together models that are not optimized end to end for the tasks that you're trying to have them do. [00:33] My long-term guidance is that I think reinforcement learning, tuning on top of models is probably going to be a critical part of how the most powerful agents get built. [00:44] *music* [01:01] We're excited to welcome Issa Fulford and Josh Tobin, who lead the deep research product at OpenAI. [01:07] Deep Research launched three weeks ago and has quickly become a hit product used by many tech luminaries like the Collisons for everything from industry analysis to medical research to birthday party planning. [01:18] Deep Research was trained using end-to-end reinforcement learning on hard browsing and reasoning tasks, and is the second product in a series of agent lunches from OpenAI, with the first being operator. [01:28] We talked to Issa and Josh about everything from deep researchers' use cases to how the technology works under the hood to what we should expect in future agent lunches from OpenAI.
[01:38] Isa and Josh, welcome to the show. [01:42] Thank you. Thank you so much for joining us. Excited to be here. Thank you for having us. So maybe let's start with what is Deep Research? Tell us about the origin stories and what this product is doing. So Deep Research is an agent that is able to [01:55] search many online websites and it can create very comprehensive reports. It can do tasks that would take humans many hours to complete and it's in ChatGPT and it takes like five to 30 minutes to to answer you and so it's able to do much more in-depth research and answer your questions with much more detail and specific sources than regular ChatGPT responses would be able to do. [02:19] It's one of the first agents that we've released. So we released Operator pretty recently as well. [02:25] Deep Research is the second agent, you know, will release... [02:28] many more in [02:29] in future. [02:31] What's the origin story behind deep research? Like, when did you choose to do this? What was the inspiration and how many people work on it? Like, what did it take to bring this to fruition? Good question. This is before my time. Oh, yeah. So I think maybe around a year ago, we were seeing a lot of success internally with this new reasoning paradigm and training models to think before responding. And we were focusing a lot on math and science domains. [03:01] of new reasoning model um [03:04] regime unlocks is the ability to do longer horizon tasks that involve like agentic kind of you know abilities
[03:11] And so we thought, you know, a lot of people do tasks that require a lot of online research or a lot of external context. And that involves a lot of reasoning and discriminating between sources. And you have to be quite creative to do those kinds of things. [03:24] And I think we finally had models or a way of training models that would allow people [03:29] us to be able to tackle some of those tasks. So we decided to try and start training models to do first browsing tasks. So [03:39] using the same methods that we use to train reasoning models, but on more real-world tasks. Was it your idea? And Josh, how'd you get involved? At first, it was me and Yash Patil, who is at OpenAI. He's working on a similar project that [03:55] will be released at some point, which we're very excited about. [03:57] And we built an original demo. And then also with Thomas Simpson, who's one of those people who... [04:04] Just... [04:05] is an amazing engineer. Like, [04:07] Well... [04:08] dive into anything and just you know get loads of things on so it was very fun yeah and i joined more recently i uh [04:14] I rejoined OpenAI about six months ago from my startup. [04:19] I was, uh, eye-opening eye in the early days, and, um, [04:21] was looking around at projects when I rejoined and [04:24] got very interested in some of our agentic efforts, including this one, and got involved with that. [04:30] Amazing. Well, tell us a little about who you built it for. [04:33] Yeah, I mean, it's really for anyone who does knowledge work as part of their day-to-day job or really as part of their life. [04:41] So we're seeing...
[04:43] A lot of the usage come from people using it for work, doing things like [04:47] you know, research as part of their jobs. [04:51] for understanding [04:53] markets, companies, [04:56] Uh... [04:56] Real estate. A lot of scientific research, medical. I think we've seen a lot of medical examples as well. [05:03] Um, [05:04] And one of the things we're really excited about as well is this style of like, [05:08] I just need to go out and spend [05:10] many hours doing something that, you know, where I have to do a bunch of web searches and collate a bunch of information is not just a work thing, but it's also useful for shopping and travel as well. [05:21] So we're excited for the Plus launch so that more people will be able to [05:25] try deep research and maybe we'll see some new use cases as well. It's definitely one of the products I've used the most over the last couple weeks. It's been amazing. [05:31] Using it for work? For work, definitely. Also for fun. What are you using it for? Oh, for me? Oh my goodness. [05:37] um so i was thinking about buying a new car and i was trying to figure out when the next model was going to be released for the car [05:43] And there's all these speculative blog posts, like there's patterns from the manufacturer. And so I asked Deep Research, can you break down all the gossip about this car and then all of the facts about what they've done, what this automaker said before? And... [05:55] It put together an amazing report that told me maybe wait a couple months, but [05:58] This year. [05:59] Like in the next few months, it should come out. Yeah. Like one of the things that's really cool about it is it's like it's not just for going broad and gathering all of the information about a source, but it's also... [06:09] Really good at finding, like, [06:11] very obscure [06:12] like weird facts on the internet.
[06:14] Like if you have something very specific you want to know that you might not just turn up in the first page of search results. It's good at that kind of thing too. [06:21] So that's cool. [06:22] What are some of the surprising use cases that you've seen? [06:24] - Ooh. - I think the thing I've been most surprised by is how many people [06:30] are using it for coding. Yeah. Which wasn't really a use case I'd considered, but I've seen a lot of people [06:35] on Twitter and in various places where we get feedback. [06:39] using it for coding and code search and also for finding the latest documentation on a certain package or something and helping them write a script or something so yeah i'm like i'm kind of embarrassed that we didn't think of that as a use case because it's like you know for chat gpt users it seems so obvious but um it's i know it's impressive how well it works [06:56] How do you think the balance of business versus individual use case will evolve over time? Like you mentioned the plus launch that's happening, you know, in a year's time or two years time. Would you guess this is mostly a business tool or mostly a consumer tool? [07:09] I would say hopefully both. I think it's a pretty general capability. [07:14] which, and I think it's something that we do both in work and in personal life. Yeah, I'm excited about both. I think the magic of it is like... [07:22] Um, [07:23] It just saves people a lot of time. [07:25] Um, [07:26] you know, if there's, [07:27] something that might have taken you hours or in some cases we've heard like days. [07:31] Um, [07:32] People can just put it in here and get, you know, [07:35] 90% of what they would have come out up with on their own um [07:39] And so... [07:40] Yeah. [07:41] I tend to think there's more tasks like that in business than there are in personal, but
[07:47] I mean, I think for sure it's going to be [07:49] part of people's lives in both. [07:51] It's really become the majority of my usage for chat. I just always pick deep research rather than normal. [07:57] So what are you seeing in terms of consumer use cases, and what are you excited about? [08:01] I think a lot of [08:02] shopping. [08:03] travel recommendations. [08:06] I personally used [08:07] the model a lot. I've been using it for months to do these kinds of things. We were in Japan for the [08:13] for the launch of Deep Research. So it was very helpful in finding restaurants with very specific... [08:18] requirements and [08:20] finding things that I wouldn't have necessarily found. [08:23] Yeah, and I found it like when you have... [08:25] something... [08:27] It's like the kind of thing where, you know, if you're shopping maybe for something expensive or you're planning a trip that... [08:32] is special or you wanted to spend a lot of [08:35] Yeah. [08:36] that you want to spend a lot of time thinking about. [08:39] for me you know i might go and spend hours and hours like trying to read everything on the internet about this one this product that i'm interested in buying um [08:48] like scouring all of the reviews and the forums and stuff like that. And Deep Research can put together kind of like something like that. [08:56] very quickly. [08:58] And so it's really useful for that kind of thing. The model is also very good at instruction following. So if you have [09:04] a query with many different parts or many different questions. [09:08] So if you... [09:09] You want the information about the product, but you also want comparisons to all other products. And you also want information about reviews from...
[09:18] you know, Reddit or something like that. You can give loads of different requirements and it will do all of them for you. [09:23] Another tip is just ask it to format it in a table. You'll usually do that anyway, but it's really helpful to have a table with a bunch of citations and things like that for all the categories of things that you want to research. Yeah, there are also some features that hopefully will get into the product at some point, but the model is able to... [09:42] underlying model is able to embed images so it can find images of the products. And it's also, this is [09:48] not a consumer use case but it's able to create graphs as well and then embed those in its response so hopefully that will come to chat to you soon as well [09:58] Nerdy consumer use case. Yeah. Speaking of nerdy consumer use cases, also like personal personalized education is a really interesting use case. Like if there's. [10:10] if there's a topic that you've been meaning to learn about, [10:12] you know, if you are [10:14] need to brush up on your your biology or or you know you want to learn about like like like some some world event it's um it's really good at you know put put in all the information about what you feel like you don't understand what aspects of it you wanted to go do research on and it'll put together a nice report for you. [10:31] One of my friends is [10:33] considering starting a CPG company. And he's been using it so much to find similar products to see if specific names are already, you know, the domains already taken, market sizing, like all of these different things. So that's been fun to, he'll share the reports with me and I'll read them. So it's been pretty fun to see. Another like fun use case is it's really good at finding like
[10:56] a single like obscure fact [10:58] on the internet, like if there's like a [11:00] you know, like an obscure TV show or something that you want to, you know, to like find like one particular episode of or something like that, it'll go and it'll go deep and find [11:11] but like one reference to it on the web. [11:13] Oh, yeah, my brother's friend's dad had this very specific... [11:19] In fact, it was about some Austrian general who was in power during a certain, a death of someone during a battle, like a very niche question. And apparently chat LGBT had previously answered it wrong, and he was very sure that it was wrong. So he went to the public library and found a record and found that it was wrong. [11:36] And so then Deep Research was able to get it right. So we sent it to him and he was excited. What is the rough mental model for, you know, what Deep Research is excellent at today? And, you know, where should people be using... [11:52] The O series of models, where should they be using deep research? [11:55] What deep research really excels at is if you have [11:59] a sort of detailed description of what you want. [12:03] and [12:03] In order to get the best possible answer, it requires reading a lot of the intranet. [12:07] um [12:08] If you have kind of like more of a big question, [12:12] It'll help you kind of clarify what you want, but it's really at its best when there's a specific set of information that you're looking for. [12:19] And I think it's very good at synthesizing... [12:22] information it encounters. It's very good at finding information [12:25] specific, like hard to find information.
[12:29] but it's maybe less, and it can make kind of, [12:33] some new insights, I guess, from what it encounters, but I don't think it's not making new scientific discoveries yet. And then I think... [12:42] using the O-series model for me if I'm asking for something to do with coding usually it doesn't [12:51] required knowledge outside of what the model already knows from [12:54] pre-training, so... [12:56] I would usually use O1 Pro or O1 for coding. [13:00] or 03 Mini. Hi. [13:02] And so Deep Research is a great example of where some of the new product directions for OpenAI are going. I'm curious. [13:08] To the extent you can share, how does it work? [13:11] The model that powers deep research is a fine-tuned version of O3, which is our most advanced reasoning model. [13:19] And we specifically trained it on... [13:22] hard browsing tasks that we collected as well as reason other reasoning tasks [13:28] And so... [13:29] It also has access to a browsing tool and Python tool. So through training, end to end on those tasks, it learned strategies to solve them. [13:39] Um, [13:39] and the resulting models could have [13:42] online search analysis. [13:44] Yeah, like intuitively the way you can think about it is [13:47] Um, [13:47] you make this sort of [13:49] this request, ideally a detailed request about what you want. [13:52] The model thinks hard about that. [13:54] It searches for information. It pulls that information and it reads it. [13:58] and understands how it relates to that request and then decides...
[14:01] what to search for next in order to get kind of closer to the final answer that you want. [14:06] And it's trained to do a good job of [14:08] pulling together all of those [14:10] all that information into a nice tidy report with citations that point back to the original information that I found. [14:17] Yeah, I think what's... [14:18] knew about... [14:19] deep research as... [14:21] and agentic capability is that [14:24] because we have the ability to train end to end. There are a lot of things that [14:29] that you have to do in the process of doing research that you couldn't really predict beforehand. So I don't think it's possible to write some kind of language model program or script that would be as flexible as what the model is able to learn through training, where it's actually reacting to live web information and based on something it sees, it has to change its strategy and things like that. So we actually see it doing pretty creative searches. [14:53] You can read the chain of thought summary and I'm sure you can see sometimes it is very... [14:59] Very smart about how it comes up with the next thing to look for. So John Carlson had a tweet that went somewhat viral. You know, how much of the magic of deep research is, you know, real-time access to web content? And how much of the magic is in kind of chain of thought? Yeah. [15:15] Can you maybe shed some light on that? [15:18] um, [15:19] I think... [15:21] It's definitely a combination. [15:22] I think you can see that because there are other starch products available. [15:25] that don't [15:26] necessarily, they weren't trained end-to-end, so... [15:31] won't be as flexible in responding to
[15:34] Um, [15:35] responding to information in the counters won't be as creative about how to [15:39] solve specific problems because they weren't specifically trained for that purpose. [15:43] So it's definitely a combination. I mean, it's a fine-tuned version of 03. 03 is a very... [15:47] smart and powerful model a lot of the analysis capability um [15:52] is also from the underlying O3 model training. [15:56] So I think it's definitely a combination. [15:59] before OpenAI was working at a startup and we were, [16:04] dabbling in building agents [16:06] kind of the way that I see most people describe building agents on... [16:10] on a [16:11] on the internet, um, which is essentially, you know, you, uh, [16:15] You construct this graph of operations, and some of the nodes in that graph are language models. [16:20] And so the language model can decide what to do next. But the overarching logic of the sequence of steps that happen is [16:27] defined by a human. [16:29] What we found is that [16:30] It's really, it's like a powerful way of building things to get quickly to a prototype, but it falls down pretty quickly in the real world because it's very hard to anticipate all the scenarios happening. [16:41] that the model made this [16:43] and think about all the different branches of the path that you might want to take. In addition to that, the models often are not the best decision makers at nodes in that graph because [16:54] they weren't trained to do to make those decisions. They were trained to do things that look similar to that. And so I think the... [17:00] The thing that's really powerful about this model,
[17:04] is that it's trained directly end-to-end to solve the kinds of tasks. [17:08] that, uh, [17:10] that users are using it to solve. So you don't have to set up a graph or... [17:14] make those node-like decisions on the architecture on the back end. It's all driven by the model itself. Yeah. Can you say more about this? Because, you know, it seems like that's one of the very opinionated decisions that you've made, and clearly it's worked. There's so many companies that are building on your API, kind of prompting to, you know, to, you know, solve specific tasks for specific users. Do you think a lot of those applications would be better served by kind of having... [17:40] you know, train models end-to-end for their specific workflows? [17:44] I think if you have a very specific workflow that is quite predictable, [17:49] it makes a lot of sense to do something like Josh described. But if you have something that [17:54] has a lot of edge cases or [17:57] um, [17:58] it needs to be quite flexible, then I think... [18:00] Something similar to deep research is probably a better approach. Yeah, I think, like, the guidance I give people is... [18:07] The... [18:08] the one thing that you don't want to bake into the model is like kind of hard and fast rules. Um, [18:13] Like if you have, you know, a database that you don't want the model to touch or something like that, it's better to encode that in... [18:19] human-ridden logic, but [18:21] I think it's kind of like... [18:23] a lesson that I've seen [18:24] people learn over and over again in this field is like, um, [18:27] You know, we think that we can [18:29] do things that are smarter than what the models do by writing it ourselves. But, uh, [18:34] In reality, like usually the like as the field progresses, the models come up with
[18:38] better solutions to things than humans do. [18:42] And... [18:42] Also, like... [18:44] you know, the like, probably like number one lesson on machine learning is like, you get what you optimize for. And so if you if you're able to set up the system such that you can optimize directly for the outcome that you're looking for, [18:56] the results are going to be much, much better than if you sort of try to glue together models that are not optimized end to end for the tasks that you're trying to have them do. [19:05] my like like long-term guidance is that um [19:07] I think reinforcement learning, tuning on top of models is probably going to be a critical part of how the most powerful agents get built. [19:16] What were the biggest technical challenges along the way to making this work? Well, I mean, maybe I can say as like an observer rather than someone who was involved in this from the beginning. But it seems like kind of one of the things that Issa and the rest of the team worked really, really hard on and was kind of like one of the... [19:32] hidden keys to success was like, [19:34] making really high quality data sets. [19:36] Um, [19:37] It's, you know, another one of those like age old lessons in machine learning that people keep relearning. But the quality of the data that you put into the model is probably the biggest determining factor in the quality of the model that you get on the other side. [19:48] and then have someone like [19:50] Edward Sun, who's other person who works on the project, who just... [19:56] Any data set, he will optimize. So that's a secret to success. Find your N-word. Great machine learning model training. [20:06] How do you make sure that it's right?
[20:08] Yeah. So... [20:09] That's obviously a core part of... [20:12] this... [20:14] model and product is that we want it to be [20:16] users to be able to trust the outputs. [20:19] So... [20:20] Part of that is we have citations. [20:23] And so users are able to see where the model is citing its information from. And we... [20:31] during training, that's something that we actually... [20:34] like try and make sure it's correct, but it's still possible for... [20:38] the model to make mistakes or hallucinate or trust a source that maybe isn't the most [20:43] trustworthy source of information. So that's definitely an active area where we want to continue improving the model. [20:49] How should we think about this together with, you know, O3 and Operator and other different leases? Like, does this use Operator? Do these all build on top of each other or are they all... [20:59] kind of a series of different applications of O3. [21:02] Today, these are pretty disconnected, but you can imagine where we're going with this, which is like... [21:11] the ultimate agent [21:13] that [21:14] - Yeah. [21:15] people will have access to. [21:17] at some point in the future should be able to do [21:19] Um, [21:20] not just web search or using a computer or any of the other types of actions that you'd want [21:26] like kind of a human assistant to do, but should be able to fuse all of these things in a more natural way. [21:31] Any other design decisions that, you know, you've taken that are maybe not obvious at first glance? [21:37] I think one of them is the...
[21:39] the clarification flow. So if you've used deep research, [21:42] The model will ask you questions before starting its research. [21:46] And usually ChatGBT, maybe it'll ask you a question at the end of its response, but it usually doesn't have such a... [21:52] that kind of behaviour up front [21:54] And that was intentional because... [21:56] And you will get the best response from the deep research model if the prompt is very well specified and detailed. And I think that it's not the natural user behavior to give all of the information in detail. [22:09] The first prompt... [22:10] So we wanted to make sure that if you're going to wait five minutes, 30 minutes, that your response is, [22:17] as detailed and [22:18] satisfactory. So [22:20] um, [22:21] we added this additional step to make sure that the user provides all the detail that we would need. And I've actually seen a bunch of people on Twitter saying that they have this flow or that they will talk to O1 or O1 Pro [22:33] to help... [22:35] Um, [22:36] make their prompt more detailed. And then once they're happy with the prompt, then they'll send it to deep research, which is interesting. Um, [22:42] So people are finding their own workflows for how to use this. [22:46] So there's been three different deep research products launched in the last few months. Tell us a little about what makes you guys special and how we should think about it. [22:56] And they're all called deep research, right? They're all called deep research. Yeah, not a lot of naming creativity in this field. [23:02] Um... [23:03] I think people should try all of them for themselves and get a feel. I think the difference in quality, I think they all have pros and cons, but I think the difference will be clear.
[23:14] Um, [23:15] But what that comes down to is just the way that [23:18] this model was built. And the, [23:21] the sort of the effort that went into constructing the data sets and then the, the engine that we have with the O-series models, which allows us to just, [23:31] um, [23:31] optimize... [23:33] models to make things that are like really smart and really high quality. [23:36] We had the O1 team on the podcast last year, and we were joking that OpenAI is not that good at naming things. I will say this is your best named product. Deep research is. At least it describes what it does, I guess. [23:48] That's great. [23:49] So I'm curious to hear a little bit about where you want to go from here. You have deep research today. What do you think it looks like a year from now? And what maybe are complementary things you want to build along the way? [23:57] Well, I've [23:59] Excited to... [24:00] expand the... [24:02] data sources that the model has access to. We've trained a model that's generally very good at browsing [24:08] public information, but it should also be able to [24:12] to such private data as well. [24:14] And then I think just pushing the capabilities [24:17] Um, [24:18] for this, it could be better at browsing, it could be better at analysis. [24:22] And then thinking about how this fits into our agent roadmap more broadly, [24:26] I think the recipe here is... [24:29] Um, [24:30] something that's going to scale to a pretty wide range of use cases, things that are [24:36] Uh, [24:37] can surprise people how well they work. But this idea of you take a state-of-the-art reasoning model, you give it access to the same tools,
[24:45] that humans can use to [24:47] to do their jobs or to go about their daily lives. And then you optimize directly, [24:52] for the kinds of outcomes that... [24:54] that you're looking that you want the agent to be able to do. That recipe, there's like really nothing stopping that recipe from scaling to more and more complex tasks. So I feel like, yeah, AGI is like an operational problem now. And I think, yeah, a lot of things to come in that general formula. [25:12] So Sam had a pretty striking quote of deep research will kind of take over a single dinner percentage of all economically viable tasks, valuable tasks in the world. How should we think about that? [25:24] I think of it as like... [25:26] It's [25:28] Deep research is not [25:29] capable of [25:31] doing all of what you do. [25:33] But it is capable of [25:35] saving you like hours or sometimes in some cases days at a time. [25:39] And so I think like, [25:41] What we're hopefully relatively close to is [25:44] deep research and the agents that we build [25:47] next and the agents that we build on top of it. [25:50] giving you, you know, [25:52] 1, 5, 10, 25% of your time back. [25:55] depending on the type of work that you do. [25:57] Amen. [25:59] I think you've already automated 80% of what I do, so. It's definitely on the higher end for me. We just need to start writing checks, I guess, yeah. [26:07] Are there entire job categories that you think are kind of more at risk is the wrong word, but like more in the in the strike zone for what deep research is exceptional? So, for example, I'm thinking consulting. But like are there specific categories that you think are more in the strike zone? Yeah, I used to be consulting. I don't think any jobs are at risk. Like I don't really think of this as like a labor replacement kind of thing at all. Like it's...
[26:30] But for these types of knowledge work jobs where like where you are spending a lot of your time kind of looking through information, making conclusions, I think it's a. [26:39] It's going to give people superpowers. [26:41] I'm very excited about a lot of the [26:43] medical use cases. [26:45] just the ability to [26:47] um, [26:48] find all of the literature or all of the recent cases, um, [26:52] for a certain condition. I think I've already seen [26:55] a lot of [26:56] doctors posting about this or they've reached out to us and said, oh, we used it for this thing. We used it to help find a clinical trial for this patient or something like that. So just people who are already so busy just saving money. [27:07] sometime or, um, [27:10] it's maybe something that they wouldn't have had time to do. And now they are able to have that information for them. Yeah. And I think the, like, [27:17] the impact of that is like maybe a little bit more profound than it sounds on the surface right it's not just like [27:22] it's not just like, uh, you know, getting 5% of your time back, but it's, [27:26] the type of thing that might have taken you four hours or eight hours to do, now you can do for... [27:32] you know, um, [27:33] a ChatGPT subscription in five minutes. [27:36] And so like what types of things [27:38] "Would you do if you had infinite time?" [27:40] that now maybe you can do like, [27:42] many, many copies of. [27:43] So, like, you know, should you do research on every single possible startup that you could invest in instead of just the ones that you have time to meet with, things like that? Or on the consumer side, one thing that I'm thinking of is, you know, the working mom that's too busy to plan a birthday party for her toddler. Like, now it's doable. So I agree with you. It's way more important than 5% of your time.
[28:03] It's all the things you couldn't do before. Exactly. [28:06] Where does this... [28:08] change about education and the way we should learn. And, you know, what will you be teaching your kids now that we're in a world of agents and deep research? [28:15] education has been out like one of the top few things that people use it for. Um, [28:20] I think it's, I mean, this is true for Chachabuti generally. It's like learning things by... [28:26] talking to, um, [28:28] an AI system that is able to personalize [28:31] uh, [28:32] the information that it gives you based on what you tell it or, uh, [28:36] Maybe in the future what it knows about you. [28:38] Um [28:38] Feels like a much more efficient way to learn and a much more engaging way to learn than like reading textbooks. [28:45] We have some lightning round questions. [28:47] All right. [28:47] Okay, your favorite deep research use case. [28:51] I'll say, yeah, like personalized education, just like learning about anything I want to learn about. [28:55] I've already mentioned this, but I think... [28:57] a lot of the [28:59] possible stories that people have shared about finding information about... [29:03] a diagnosis that they've received or someone in their family received have been really great to see. [29:10] Thank you. [29:11] Okay, we saw a few application categories break out last year. So for example, coding being an obvious one. What application categories do you think will break out this year? [29:20] I mean, clearly agents. Agents, I was going to say too. [29:24] Okay. 2025 is the year of the agent. [29:26] I think so. [29:28] And then how do you think about [29:29] What piece of content that you should recommend people reading to read to learn more about agents or where the state of AI is going?
[29:38] Could be an author, too. Training data. Yeah, this podcast. Not biased. [29:44] I think it's, it's like, um, it's so hard to keep up with the state of the art in AI, um, [29:50] I think the general advice I have for people is like, [29:54] Um... [29:54] pick one or two subtopics that you're really interested in and go like curate a list of people who are, who you think are saying interesting things. [30:01] about it and like how to find those one or two things that you're interested in. [30:04] Um, maybe actually that's a good deep research use case. Like, you know, go, go, uh, go use it to like go deep on things that you want to learn more about. [30:12] This is... [30:14] a bit old now but I think a few years ago I watched the [30:17] I think it's called like Foundations of... [30:20] RL or something like this from Peter Abiel. [30:23] And it's... [30:25] it's a few years old but I think that it was a [30:28] good introduction to reinforcement learning. Yeah, I would definitely second any content by Peter Abiel, my grad school advisor. [30:35] Oh yeah. [30:38] Okay. [30:38] Reinforcement learning, is it, you know, it kind of went through a peak and then felt like it was a little bit of a doldrum again and is peaking again. Is that the right read on what's happening with RL? [30:47] It's so back. Yeah. So back. Yeah. [30:50] Why, why now? [30:51] Because everything else is working. [30:53] Mm-hmm. [30:54] Like, I think if you... [30:56] Maybe people who've been following the field for a while will remember the Yon-Lukun cake analogy. If you're building a cake, [31:03] Um... [31:04] then most of the cake is... [31:06] the cake and then there's a little bit of frosting and then there's a few cherries on top. And the analogy was that like unsupervised learning is the cake.
[31:14] supervised learning is the frosting and [31:16] Reinforcement learning is the cherries on top. [31:18] when we in the field were working on reinforcement learning back in 2015, 2016, it was kind of like, [31:23] I think Jan LeCun's analogy, which I think in retrospect is probably correct, is that we were like trying to add the cherries before we had the cake. But now we have language models that are pre-trained on massive amounts of data and are incredibly capable. [31:37] Um, [31:38] We know how to, uh, [31:39] how to do supervised fine-tuning on those language models to make them good at instruction following and generally doing the things that people want them to do. [31:47] And so now that that works really well, it's very ripe to [31:50] to tune those models for any kind of use case that you can define a reward function for. [31:56] Great. Okay. So from this lightning round, we got agents will be, you know, the breakout category in 2025 and reinforcement learning is so back. [32:04] I love it. [32:06] Thank you guys so much for joining us. We loved this conversation. Thanks for having us. Congratulations on what was an incredible product, and we can't wait to see what comes of it. Thank you. [32:38] So [32:38] .
Want to learn more?
Ask about this episode