Transcript of ‘How Notion Cofounder Simon Last Builds AI for Millions of Users’

The transcript of AI & I with Simon Last is below.

Timestamps

Introduction: 00:01:57
How AI changes the way we build the foundational elements of software: 00:02:28
Simon’s take on the impact of AI on data structures: 00:10:07
The way Simon would rebuild Notion with AI: 00:13:05
How to design good interfaces for LLMs: 00:23:39
An inside look at how Notion ships reliable AI systems at scale: 00:28:22
The tools Simon uses to code: 00:35:41
Simon’s thoughts on scaling inference compute as a new paradigm: 00:38:16
How the growing capabilities of AI will redefine human roles: 00:49:10
Simon’s AGI timeline: 00:50:28

Transcript

Dan Shipper (00:01:59)

Simon, welcome to the show.

Simon Last (00:02:00)

Hey, thanks for having me.

Dan Shipper (00:02:01)

So for people who don't know you, you are the cofounder of Notion. This is, I think, at least as far as I could find, the first interview that you've done outside of internal interviews for Notion. So I really appreciate you coming on.

Simon Last (00:02:13)

Yeah, of course. I tend to keep a low profile, but I'm happy to do it.

Dan Shipper (00:02:15)

Great. And you're leading the AI initiatives at Notion. As far as I can tell, you were also really pushing AI before it became a thing internally, which is really interesting. And the place where I want to start with you is obviously Notion is really well known for building thinking tools, and you were building thinking tools before there were even thinking machines. The way that you went about that is you created a text editor and hooked it up to a relational database, and you thought a lot about how to create the right primitives that to allow people to interact with that in a really flexible way to build whatever they wanted to build or think however they wanted to think. And that was in a pre-AI era. And so where I wanted to start with you is to ask you what you think the right primitives are for thinking with AI?

Simon Last (00:03:01)

Yeah, that's a good question. I think, yeah, probably helpful to start with what are the new primitives maybe. So the way I think about it is we've got the foundation models or the model itself, which, I think of like a thinking box where you can give it a bunch of context and some task and then it goes and does one thing for you. It could involve some reasoning and it could involve formatting it as an action—so doing something. And then the other tool is like embeddings—just like really good semantic search. So that's why I think those are the new primitives that didn't really exist before.

I think a lot of the same primitives still matter a lot. Obviously, a relational database, it's a pretty fundamental concept. If you're trying to track any information, it's pretty useful to do that. You don't just want to shove it into a text file. You want in a structured format that's consistent that you can query that they can connect things. The good news is all the primitives still matter. But now you can plug in these thinking boxes on top to actually automate some of the tasks that a human would do in the past. And especially things that are cumbersome and you don't want to do.

The way I think about the primitives that connect to AI. You've got databases, you have a UI around the database that a human can look at and the AI can use. The permission model is really important as well. There's a lot of coding agent tools coming out. It's super cool, but one issue with that is you don't really want to just make a Postgres database for you every time. What's the permission model? What can a reader write? How can I see the schema? It's actually really nice and important to have a permission model that the user can understand and control what I can read or write. I think a lot of the same primitives really matter and that I just think about where we're adding on top. Whereas before your database might've been essentially just data that you do manual data entry to plug it in or some lightweight integration but now you can actually put this reasoning box on top and much more fluidly transform information or pipe it in and out or do reasoning steps on top of it.

Dan Shipper (00:05:06)

What do you think about chat as one of the primitives? And do you think that's going to continue to be a main way that we interact with these tools? Or are there other primitives that are going to become more important?

Simon Last (00:05:13)

Yeah, I think chat is—probably some version of it is probably here to stay. The human interface is just so intuitive. You just talk to it. The big issue with chat is that you get this empty text box and most people don't know what to type in there. It's really great for someone that wants to explore the system and figure out what it can do. But not so great if you just want it to do some tasks for you. And this is actually true—not just of chat, but anything. This is actually one of Notion’s biggest challenges is that there's a lot of features and it actually takes a little bit of exploration to figure them out. We call them tool-makers—people that are interested in exploring the boundaries of the tool and making their own little custom software. But, one big discovery for us over the years is most people just don't care about that. They just want a solution to the problem that they have. And they just want to be presented to them. And don't really have the patience to go figure out this complex tool, which is totally understandable. And I think chat is like a low-level primitive where it makes sense to have, but the real goal is to connect people to some workflow or use case that's solving their problem. And it's probably not the best interface all the time.

Dan Shipper (00:06:31)

Yeah. We do a lot of work with big companies and I see that all the time there is, probably five to 10 percent of their people are like, they want to play around with chat. They want to learn how all the AI stuff works. And then everyone else will let me just do my job. And usually I think what works is like, letting those 5 to 10 percent like find the workflows and then give the workflows to everybody else, so that they don't have to chat with it, or they can start with a chat. That's pre-filled like the common things that they're doing. One of the interesting things I think about is chat and I'm curious what your thoughts are like, often in UI pre-AI, you had to make the updates to the state of the application. Checking your radio box, wherever it's discreet, it's either checked or it's not checked. And it's also usually along 1 dimension but with chat, you can move in a fuzzier, more continuous way through multiple dimensions at a time. Have you thought about that sort of change from discrete to continuous or single dimension to multidimensional? And like, how do you think those things work together best?

Simon Last (00:07:33)

My mental model is that, unless we're talking about embedding sliders or making it funnier sort of thing, excluding that, where the actual parameters continuous, my mental model for this is: You have your software state and you think of it like it's a JSON Blob and then you can have UI controls to manipulate that. And like you said, it's typically just editing this key to be false instead of true or something like that. And the user can only do one thing at a time then I think of the AI as you can give it some high level instruction and then it can go execute a sequence of commands, like a cascade of things, which are turning lots of the knobs. So that's how I think. Yeah, I guess the user's mental model can be fuzzier, but ultimately it still maps all the way down to what are the knobs that it's turning? It's just that it's okay. Maybe the user has a fuzzier understanding of it. And then it's going and doing 10 things for you. And then it still works in the same way. And also, it introduces this new challenge of explaining to the user what happened, especially if it's a complex state.

Dan Shipper (00:08:30)

Yeah. What do you think about that? Or what have you found in doing that?

Simon Last (00:08:33)

I think about it like, what is the thing that's changing and what is the most efficient, understandable way to present that. One that we've explored in the past is asking it to do edits across multiple documents. And then, we essentially just came up, it's nothing too crazy, it's UX where it shows you groups of by the pages and then it shows you the diff across each one. And then you can go zoom in and I'll look at the ones that you care about, but it's pretty tough. It’s just a fundamentally hard problem. If it's doing something complicated and then explaining the complicated thing. Yeah, it's just hard.

Dan Shipper (00:09:07)

Yeah, that makes sense. Or, if it's explaining one of the things I find is that even if you get it to summarize what it did, the summaries are so high level it's saying a lot without saying anything at all. And getting it to be concrete enough, but not too detailed is a really difficult challenge for some reason.

Simon Last (00:09:25)

Yeah. I think that's probably just fundamentally hard. I think especially if the thing is complicated, you're not going to fully understand it maybe until you read the whole thing. And then I think depending on the use case though, you probably can go pretty far with calibrating that prompt to the appropriate level of granularity. So I think you can get pretty far by at least calibrating it— I guess maybe if you were to pick it apart, there's the problem of summarizing at the appropriate level of granularity. And maybe it's just missing an important detail that you actually wanted it to be included. And then maybe there's the more fundamental problem—you do want to reduce the information. And so it makes sense to draw some things.

Dan Shipper (00:10:08)

I want to go back to the relational database point. I think for me, the way that I've thought about, or the mental model I have for relational databases, and you may have a different one, is, it's more effective to have a schema for a relational database if you know what the data is going to be used for. So for example, it's easier to have a relational database for that CRM where I know I'm going to use it to keep track of customers, so I have a customer table. And what's interesting about embeddings is they are able to capture so many more dimensions of what piece of information is relevant to that you can use it for storing information in situations where you don't know what the information is going to be used for in the future. And I'm curious about how— Obviously, so far with Notion you've had to solve using a relational database to store information that you don't know what it's going to be used for. And I'm curious how you think, embeddings change that picture, if at all.

Simon Last (00:11:00)

That's a really good question. I'll first address the point that it's hard to design it when you don't know what it's useful for. I think that's a really good pointer: don't design schemas that you don't know what they're for yet. This is something that I've been playing around with—AI helping you design schemas. We've tried versions where it just comes up with all the properties you might want, it can come up with a lot of things and like not all of them are useful. I've had a lot more success with only giving the minimal schema that's required for the actual tasks that that user currently cares about. Each property should have a purpose that just really focuses the task and makes it more effective.

One point there: In terms of how I think about embeddings vs. deterministic querying, I think you're getting at—I don't know—I just think of them as two different tools that you have in your toolbox. Ideally you have both. And you can even maybe combine them. This is only something that we're working on a lot is Q&A over databases. And when do you turn to a deterministic SQLite query. And when you turn to an embedding, I think it really depends on the question. And sometimes you want one, sometimes one of the other—a lot of it is performance costs like latency concern.

You could just make everything embeddings or you could just map a model over every run of the database, every time. And then you don't need embeddings or SQL either, right? Everything's unstructured, but I think that would be undesirable from a performance perspective and also, yeah, it wouldn't be fully deterministically accurate, which I think people probably care about if you're like, how many sales did we do last quarter? Do you really want the model?

Dan Shipper (00:12:28)

I can make it up a little bit and get close to it but it won't actually be right.

Simon Last (00:12:34)

Yeah. It seems a bit scary. So it depends on the question. Let's say I have a customer database or something and I'm like, how many sales last quarter? Yeah. I really do want to call them an amount and then sum over it. But then if I'm saying, do we have any customers in the entertainment space or something? Maybe I want to be flexible on that. So yeah, I just think of it like these are just tools in the toolbox and you want both. And then the challenges in defining that routing or mapping layer of, figure out which tools best for the job and combine them and then presenting the user with the best result.

Dan Shipper (00:13:05)

You very famously, a couple of years, I think, into Notion’s life, went to Kyoto and stripped it all down and pivoted the company and it became what Notion is today. And I'm curious, let's just assume as a thought experiment that you're going to have a second Kyoto. You're going to go strip away everything that Notion currently is and rebuild it with AI, how would you do it? Or how would you think about it from scratch? What would you do differently now that these tools are here?

Simon Last (00:13:36)

Yeah. That's how I operate. When I'm thinking of a new project, I like to be pretty unencumbered by the way things work. But then the magic is also about, taking this unencumbered crazy idea, but then also, ideally you want an incremental roadmap for everything. So I think there's a lot of details in there. I don't just want to make crazy ideas. I want to actually ship stuff incrementally, but then still get to the crazy place. I think the really key, exciting thing to me is this thinking box where there's plenty of knowledge work tasks that people don't really want to be doing, or that are too expensive for them to do because you have to hire humans to do it, can we automate that stuff? One big principle would probably be there's less humans touching the database and that the AI should be managing it for you. We're talking about customers. Let's say you have a CRM-style thing. Ideally, you never need to update any of the fields, right? If the deal closes, it should know the amount based on your email. If someone talks in Slack about how the deal's at risk, that should be in the structure somewhere. You shouldn't need to update stuff. I think in the AI world, the database becomes more of an implementation detail, and hopefully the user interacts more with the processed outputs of it rather than the raw database itself. So maybe for sales, you really care about a daily progress bar or seeing something about the productivity of your retail people or something like that. And those should all be just presented to you directly. And then the database is just this background thing. That's it. Implementing things you care about.

Dan Shipper (00:15:09)

I love that. I think the first point is that you shouldn't have to interact with the database. What it reminds me of is there's this constant thing with Notion and with any other kind of tool like this, where, especially if you're doing it inside of a company, you're always like, is this up to date? Or you're always, there's 5 percent of things that make it into Notion, but then there's 95 percent that's completely unwritten. And I think companies operate better when more of that stuff is legible and more of that stuff is written down and it's and it's updated. I've always had this thing that I think companies should have like librarians that are just responsible for that, having worked in a big company. I was the guy for a particular product. And it had a huge salesforce and I had written all these documents about how the product should be sold and what the details are and whatever. My previous company was a co-browsing company and I sold it and I was the co-browse guy internally at Pega, this big public enterprise software company. And even though I'd written everything down, all the sales people were just like, you're the co-browse guy on chat. What about this question? I'd be like, see my doc. But I think one discoverability was really poor for them. And then two, there's always that thing in your head where you're like, is this up to date? And it seems like what you're saying is that, there's an opportunity now to take without someone having to do it to take a lot of that stuff that would ordinarily not be written down and get it into a format where it's recorded for other people to use. Is that kind of what you're saying?

Simon Last (00:16:27)

We're definitely excited about that as a use case. I think, yeah, like with the current Q&A with Notion and third-party connectors, you can get at least— If the salesperson goes and asks the question, they can ask the AI. That's pretty cool. But then, yeah, I think a lot of the times you didn't write the doc in the first place. And then once you write it, you want to maintain it. I think, yeah, those are both really interesting use cases that I think it'd be super exciting. I think a fun thing about these thinking boxes is now you can treat a knowledge base like a database where the operations on it can be semantic. I think that's pretty exciting. It's thinking about: How can pieces of information conflict with each other, and how would you resolve that?

Dan Shipper (00:17:02)

Yeah, that’s super interesting. I love the word thinking box and it makes me think, what is thinking? What do you think the boundaries are of what that thinking box can do vs. not do?

Simon Last (00:17:13)

Yeah, I don't think there's that many boundaries. The abstraction is pretty— I think the instruction's kind of complete. In a way already, it's more than just the models that kind of suck still you give it vision—assuming it's multimodal, right? There's not much more to it. I think maybe there's robot actuation commands, but for those represented in, in the same model too, abstractions are already complete. And then, I feel like the really critical thing is the critical shape of it is it has some context and then it has these tools that it can use. And then those tools produce observations and then you just loop on that. And that's an agent that I can do anything, assuming the model actually works, they don't yet. Depends on the use case, but—

Dan Shipper (00:17:55)

I don't know if you saw, but Anthropic dropped a new computer use model for Claude the other day. And one of the things that they touted in that release is that you don't actually have to explicitly identify tools. And instead, the model just understands that there's a browser in front of it that has certain things and the computer has applications that allow it to do things. And so the tools are implicit rather than explicit. What do you think are the trade-offs there and do you think explicit tools are actually what's best or should it be implicit and where?

Simon Last (00:18:25)

Yeah, super excited to see that. I mean on a technical level it's still a tool. It's just the tools are like, click this coordinate and type this. I guess that's true. So it's still just implemented as tools. It's just that the click tool is pretty powerful. Click and type. You can do a lot of stuff. And then the observation is to see what happened afterwards. I was super interested to see that it's something I've been expecting to start working and it seems like it doesn't quite work yet, early signs and it's cool that they're showing it to the world.

The way I'm thinking about that is that the task you want to give the AI is the most convenient way to do the task possible. So there's some quality constraint around whether it can do it. And then there's some like performance latency constraint, and then maybe something around like how users can observe and control it. So I think computer use is going to be very open-ended. Like you said, at least currently the quality seems much lower than if you were to give it a more specialized tool. The latency is very bad—super slow. You could get much better results by giving it a code API, for example, your goal is to download a recipe. You can have it go to Google Search and find the recipe. Or if you give it a recipe search API and it gives it credit, that's going to be done in like less than a second. And then there's the controllability thing, which is pretty important. I think, especially if it's doing something autonomous for you, I don't think like the shape of this that I feel like bullish on is it doesn't seem that interesting to me to have it control your computer while you're watching.

The interesting thing seems like you ask it to do something and then it goes and comes back to you when it's done and I can go do something else. It has its own computer. And I want to be able to control what it has access to, so I think that's pretty important. And if you're giving it a computer, it's pretty open-ended. So we need to develop some of those controls around it, but I'm excited about it. I think ultimately the way I think about it is it's just another tool in the toolbox and the ultimate answer probably looks like a mixture of when you can get an API that's much better. And that will always be better. And then when you can't, it's nice to have this escape hatch where it does stuff on a computer.That makes a lot of sense.

Dan Shipper (00:20:36)

I think that's totally right. For tasks where they're repeated and you know they're going to happen inside your application, like having a specific API that just does them really quickly, is great. Just you have muscle memory for figuring out how to pick up a glass. And maybe there's like a tool in your head that is really tuned for picking up a glass to drink. And then you fall back to this slower, more open-ended thing that can do much more for tasks where the tools that you are more specific can't can't handle.

Simon Last (00:21:05)

Yep. Yeah, I think another angle, too, is interesting, is the market dynamics angle. I think we might start seeing people shutting down or not wanting that. There's gonna be a race where you know, people who use the computer using an agent are gonna want it to access all their stuff. And then the companies that manage those tools might not want that. It's like a third party. There's already a whole industry around like preventing bots from accessing websites and stuff. So now bots are useful for real work. So what are we gonna do about that? I think I'm not sure how that's going to play out, but I think that'll be really interesting.

Dan Shipper (00:21:40)

What would be your guess?

Simon Last (00:21:42)

I think people are definitely gonna want to do this and they're going to have legitimate reasons to do so unlike in the past, where maybe it's like scamming or hacking. Now it's like, no, I'm actually trying to perform this task. I'm paying for your software. So you should let me do that. I think that makes sense. I could see a world where I think probably the ideal outcome is something where like everyone allows that, but then they get paid for it in some way. I don't know what the shape of that is exactly, but yeah, I think that's ideal. It's if you make some software that's valuable and people are using it in this way, somehow value should accrue to you.

Dan Shipper (00:22:21)

Do you think we're going to see a world where there's interfaces that are specifically for verified humans and then LLM friendly. I guess it's an API, it's something different from a traditional API, but interfaces built specifically for LLMs.

Simon Last (00:22:36)

Yeah. I think so. I feel like my job description is to design those—there's all these quirks. The quirks will go away over time. I saw someone tweet the other day that you're gonna have an alternate form of your website. That's just plain HTML with divs and buttons. I love that idea. I think it's a tricky race because on the one hand, the current models are not good at many things and you do need to design those custom things, but then as the models get better, maybe you need a bit less of that and maybe they can also just build their own. I think eventually the model can just build its own scaffolding, right? You give it something and it's like, alright, I'm going to make a whole Python code repo that it may be inside of that. It's going to figure out this problem that I was just saying where it's like, alright, which things can I use code for? That's better. And then which things do I need to do? call out to some browser—that's way less ideal, but I'll do it if I need to, I feel like the ultimate abstraction just closes over all of this. So, as a human, just whether it's using a code API or browser is an implementation detail.

Dan Shipper (00:23:42)

You said that part of your job is figuring out good interfaces for LLMs. What are the current properties of a good LLM interface?

Simon Last (00:23:49)

That's super fun. Okay. So yeah, there's a bunch of principles. So one is that you want to align to things the model's been trained on as much as possible. What does that mean? So just as a concrete example one way we've tried representing a notion page is as this XML tree, which is much more faithful to the way it's persisted. But the model just wants to speak Markdown. And so—

Dan Shipper (00:24:10)

Interesting. But they're prompted in XML?

Simon Last (00:24:15)