Episode 3: transcript

[00:00:01] Connor Leahy: When a human says something, there’s all these hidden assumptions. If I tell my robot to go get me coffee, the only thing the robot wants to do is to get coffee, hypothetically. It wants to go and get the coffee as quickly as possible, so it’ll run through the wall, run over my cat, throw grandma out of the way to get to the coffee machine as fast as possible. Then, if I run up, “No, no, no. Bad robot.” I try to shut it off. What will happen? The robot will stop me from hitting the off button, not because it’s conscious or it has a will to live. No, it will simply be because the robot wants to get the coffee. If it’s shut off, it can’t get me coffee. It will resist. It will actively fight me to get me coffee, which is of course silly.


[00:00:45] Stefano Maffulli: Welcome to Deep Dive AI, a podcast from the Open Source Initiative. We’ll be exploring how artificial intelligence impacts free and open-source software, from developers to businesses, to the rest of us.


[00:00:59] SM: Deep Dive AI is supported by our sponsor, GitHub. Open-source AI frameworks and models will drive transformational impact into the next era of software, evolving every industry, democratizing knowledge, and lowering barriers to becoming a developer. As this evolution continues, GitHub is excited to engage and support OSI’s deep dive into AI and open source and welcomes everyone to contribute to the conversation.

[00:01:26] ANNOUNCER: No sponsor had any right or opportunity to approve or disapprove the content of this podcast.


[00:01:31] SM: Welcome, Connor Leahy. Thanks for taking the time. Connor is one of the founders of EleutherAI, a collective group of researchers of artificial intelligence. He’s also Founder and CEO of Conjecture, a startup that is doing some interesting research on safety of AI. We’ll talk more about this. Welcome, Connor.

[00:01:50] CL: Thanks so much for having me.

[00:01:52] SM: Let’s start by explaining a little bit the history of EleutherAI. How they did it came to be and how did you come up with this idea?

[00:02:02] CL: The true story of how EleutherAI came about, during the pandemic, back in 2020, so everyone’s bored to tears, stuck at home. I was hanging out on an ML Discord server. It’s a chat server. There’s some paper that got published talking about GPT-3 models, big model training or whatever. I basically said like, “Hey, guys. Wouldn’t it be fun to do this?” Then someone else replied, “This, but unironically,” and the rest was history.

It very much started as just a fun hobby project of just some bored hackers. They were just hanging around, looking for something fun to do. At the time, the GPT-3 model was becoming well known. The paper was actually published a bit earlier. Now, the API was now becoming accessible to some people, so people were noticing this is really cool. We can always create things with it. It was very interesting with GPT-3 as a very specific AI model. it was just unprecedentally large. It had this huge supercomputer to build this model. That was a very interesting technical challenge.

It was also very interesting – the model, the final model, GPT-3 was really interesting for a lot of reasons. Basically, it started as more of a joke. Just like, we’re bored, let’s just mess around. We don’t have big supercomputers, so we didn’t expect to get very far. Yeah, things went a lot further than expected. We started to get more and more interest and awesome resources. We started to gather some models and stuff. Frequently, we started taking this more seriously. We thought, thinking more seriously about, what do we actually want to do? This is actually a good thing to do, etc., etc.

[00:03:40] SM: You basically put together a band of hackers and programmers, researchers from different places all around the idea of creating an alternative to OpenAI’s models?

[00:03:51] CL: That’s not how I would describe it. No, no. The name is very much tongue-in-cheek. Of course, we were all sad that we didn’t have access to GPT-3. Because GPT is cool. OpenAI is a big for-profit company with billions of dollars research and money and computers, and whatever. The goal of EleutherAI very much was always to be a group of independent researchers doing interesting work and, hopefully, useful work for the world. In particular, one of the reasons we thought this work was very promising is me and many other people at EleutherAI think that artificial intelligence is the most important technology of our time and, as it becomes more and more powerful and it can do more and more tasks, it will become a more and more powerful and dominant force in our society. It is very important to understand this technology.

That’s one of the reasons I now have a startup, where we work on researching safety of AI systems, how to make them more reliable, how to make them safer, how to make them not do things we don’t want them to do, which is a big problem with AI and only it will become more of a big problem. With EleutherAI, we saw, basically, was an arbitrage opportunity. We saw that there is a lot of cool research to be done with large models, and also, important research trying to understand these models, how do they work internally? How do they fail? And so on. There’s a lot of these opportunities.

Building an actual model like this is extremely expensive and technically difficult. You need very specific kinds of engineering skillsets. It’s very, very expensive. But, once you have built such a model, using it for experience is much, much cheaper, like magnitudes of order. We saw this opportunity that we could pay this one-time cost, in order to make this technology more accessible for academic researchers, safety researchers, people with less resources, that might be able to do valuable research with this kind of artifact.

[00:05:51] SM: Help me understand a little bit better what’s going on. There’s always this myth that only the very large corporations, or research institutes like, I don’t know, NASA-CERN can have the processing power and the money and the data and the knowledge to train these large models. You started from a large model, right? How did you get the first model built? Then, how do you progress?

[00:06:17] CL: It’s three main things that go into building a large model, which is data, engineering, and compute. Depending on what model you’re building, data may or may not be a bottleneck. For the kinds of models we were building, these are language models. Data really is much of a bottleneck. It’s still a pain to get the data together and whatnot and this is why we have pulled together. Our data center’s got compiled, which, and we also released. It’s not really a bottleneck.

The engineering can be a bottleneck in the sense that it’s not trivial, especially back then. Nowadays, there’s more open-source libraries and stuff that make this training of large models easier.

[00:06:53] SM: Wait. When you say back then –

[00:06:54] CL: Two years ago.

[00:06:55] SM: Two years ago. Okay, we’re not talking 30 years ago.

[00:06:58] CL: No. Like two years ago, this was very difficult. Even one year ago, this was still more difficult than it is today.

[00:07:04] SM: What changed?

[00:07:05] CL: Companies like Nvidia and Microsoft released a lot of the code, with libraries such as DeepSpeed and Megatron that make this stuff easier. It’s still not easy. Also, Facebook released the FSDP library and Fairseq, which helps training large models. It’s still not at all easy. The engineering is less hard than it was at the time. At the time, there was a few dozen people in the world who really knew how to make these and they existed only in these large corporations. I think, that’s still the case, that there’s maybe a few 100 people who really have experience building large models, hands-on and have a – a lot of ML, it’s like alchemy. It’s like dark magic. You have to know all the secret tricks to make things work. It’s getting better, but it’s still quite tricky.

The third component that goes into these kinds of models, compute is actually the biggest bottleneck. The amount of computation that goes into building something like GPT-3 is massive. It’s not like you can just run this on your CPUs. You need massive clusters of GPUs, all interconnected with high-end supercomputing grade hardware. You can’t do this on standard hardware. You need the supercomputer grade stuff, which is very expensive and quite tricky to use sometimes.

With EleutherAI, we had moved several, I’d say, phases. The first phase, we got our compute from what is called the TPU research cloud, which is a project from Google to give academic access to some of the TPU chips, which are specific chips for training ML models. They were quite generous with us with giving us access to pretty large amounts of these chips for doing our research. Our first models that we released, GPT Neo models were trained on this, including also, GPT-J, which was a later model that was also done by [inaudible 00:08:50] who was a AI contributor at the time.

We then, later, started working with a cloud company named CoreWeave, who are specialized GPU provider. We basically had a deal that we will help them test their hardware, debug things and stuff. In return, they’ll let us train our models on some of the hardware they were building. That resulted in the GPT Neo X model, which is the largest model we’ve released at this time. We have some potential new partnerships going on in the background right now. We’ll see if anything comes of that or not.

[00:09:26] SM: If I understand correctly, you’re saying that the engineering pieces are becoming more simpler, commoditized, almost, because of the releases of the big companies. They’re releasing code. Data, you mentioned, it’s not that much of an issue. It’s hard, but we’re talking text for these models. Yeah, we don’t get into the multiple petabytes of storage necessary. Then the third is seeing the hardware pieces. From the text perspective, the data that goes into the model, the training, how do you acquire it? What kind of volumes we’re talking about? Where do you get the data to start from?

[00:10:05] CL: For text data in particular, I’m not too familiar with other modalities. You need truly stupendous amounts of text. Rule of thumb is, you want a terabyte of raw text, which is a truly, unimaginably large amount of text. That is billions and billions –

[00:10:25] SM: Compressed? Compressed or uncompressed?

[00:10:28] CL: This is uncompressed. This is uncompressed. It’s a terabyte of uncompressed text is what you want to aim for or something. I think, the pile is uncompressed about 800 gigabytes of text, which is enough.

[00:10:39] SM: The pile is the starting point, the data, the raw data.

[00:10:43] CL: That is the dataset that we build for training our models, and we released. If you need to get 800 gigabytes of text of various sorts, that’s a place you can get it quite easily. The way we build the pile was a lot of it comes from Common Crawl, which is a huge just dump of Internet sites. I forgot who made it, but it’s massive petabytes of scraped websites and stuff that we then post-process, which we filter out spam, and then filter out the text from the HTML, and stuff like that. Then, the other part is a massive amount of curated data sets. We took lots of datasets that existed.

For example, we took data sets from Payton’s, or with the – or from various chat rooms, or whatever. I don’t remember everything’s in there. There’s a lot of medical texts, just like all PubMed. There’s a large amount of publicly available scientific documents, papers in biomedicine. Also, we took from arXiv, which is this pre-publication server, which has a huge amount of physics, math, computer science papers. We scraped all of that. Furlough that into text. The pile compared to other datasets is weighted more heavily towards scientific, technical data, less on the social media chats. There’s some of that too, but it’s much less. It’s not focused on that. A lot of it is very technical documents and such.

[00:12:06] SM: Right. You didn’t take Wikipedia, or material books from –

[00:12:09] CL: Oh, no. Wikipedia is in there, too. Yeah, there’s all kinds of stuff. You can read the paper, which, I think 40 datasets, 20, or 40 datasets are in there from all kinds of sources. Wikipedia, I think in total is a few gigabytes, maybe, of text, maybe four or something. You need a lot.

[00:12:28] SM: Right. You store it somewhere on the cloud. That’s not a big deal for now.

[00:12:32] CL: Funny story about that. It is currently hosted by BI, which is a pirate data hosting service. There’s literally a guy called The Archivist. I don’t know what his real name is. He’s completely anonymous. I think he might be an international fugitive. I don’t know. He’s just like, whenever we need to host something, we just tell him and he’s like, “Yeah, no problem.” Just hosted for us. That’s how we host our datasets, at least in our models, because he just has infinite storage. It’s a fun hacker story.

[00:13:01] SM: Sort of underlying the nature of the group. You have now, the pile, you have the trained model, you have the hardware, of course, you’ve got the competence to do all of this, and you have created a bunch of models that are somewhat replicating, or playfields and alternatives in some extent to OpenAI. You’re doing a completely different approach compared to them, though. They’re not releasing their models. They are keeping them behind an API for safety reasons. Or at least, that’s the story. Why are you releasing it? Are you not afraid?

[00:13:36] CL: That is a very good question, and the answer is, of course. Of course, you should be concerned when you build a new technology that has unprecedented capabilities and you use it, either as an API or you deploy it or you make it public, all of these things are things that should be considered. There is this meme that sometimes exists inside the scientific community that, as a scientist, you’re like, you have no obligations to the downstream effect of your work.

I think, that’s obviously bullshit. In my heart, I’m just like, I just want to science all the time, just build all the things and who cares? It’s fine. Let the politicians sort out how to use it. That’s just not how the world works. It’s not a good way to think about this. There’s sometimes this belief that people think EleutherAI’s stance is all things should be public, all the time, always. That is not our stance, and it’s never been our stance. I understand why people are confused about this. There are also several other groups that are vaguely associated with us, that is their opinion. I strongly disagree with them.

It’s always been the case with EleutherAI that we think there are some specific things in this specific instance, which we think it is net positive for these specific reasons to be released. I think in this specific instance of these specific things going on right now, it is more net positive for these models, various language models of various sizes, to be accessible for researchers to do certain types of research, than it would be not to be. We said from the very beginning, if I, for example, had access to a quadrillion parameter model, or something that’s completely unprecedented, we would not release it. Because who knows what that thing can do?

It does not seem a good idea to just dump something that no one knows about. There’s a very specific argument that we believe 99% of the damage done by GPT-3 was done the moment the paper was published. As the saying goes, the only secret about the atomic bomb was that it was possible. Then people are like, “Well, what if Russian disinformation agents use it?” I’m like, the paper is out there. If a few hackers in a cave can build these kinds of models, you think the Russian government can’t? Of course they can. Of course, they can just buy a supercomputer and train this stuff.

I think there are downstream effects of EleutherAI that may not have been a good idea. The way I see things and people disagree with me about this, is I think, very powerful AI is coming very, very soon. Human level AI is coming quite soon. I expect it, for various technical reasons, to have many properties in common with the models we’re seeing today. I don’t expect a full shift. I very much disagree with people who say like, “Oh, we’ve made no progress towards human level AI. These things aren’t intelligent. They won’t make any progress.” I fully disagree. I think these people are not paying attention, or are confused about what these things are actually capable of.

I expect that – studying these technologies that currently exist is very, very important. This is arbitrage opportunity. The models we released are much smaller, and much less capable than GPT-3. GPT-3 and many other groups have also had models of similar capacity internally and such. Now, there’s open-source models of GPT-3 size anyways, like the OPT models and the Blue model. It’s always been a very contingent truth is that we’re like, okay, releasing models like this will have some unknown consequences. People might use them for spam. People might use them for something I hadn’t even thought about before. Maybe someone will come up with some new use of this model that I had never thought about that was actually bad. Maybe they’ll come up with very positive uses. I don’t know.

I think, reasoning about how new technologies will affect the world is very hard ahead of time. I think there’s two conflicting parts inside for me. The one part is that, historically speaking, generally, every new technology, people are afraid of but then, when it’s actually deployed, it’s actually good. It’s in retrospect, really, I’m glad this technology was – Imagine if people tried to make electricity illegal, because, well, people could shock themselves, so we should have a license to have electricity in your home, or something like that. Obviously, that would have sucked.

[00:17:30] SM: There was that debate.

[00:17:30] CL: That debate did exist. I think we as modern people are quite happy that the optimists won that one. That’s a fully legitimate argument. That’s not a silly argument to be made. I think that’s a good argument to be made. There’s also the other one, which is the – there’s the argument to be made, hey, there’s some very specific risks we can see that are not hypothetical. Now, we can debate about how do these risks measure up against each other? For this very specific technology, right now, of these language models, I think the optimist’s side, in my opinion, has somewhat of an upper hand. That doesn’t mean that this applies to all technology.

Some people would say, “Well, okay, so you like electricity, Connor? Well, what about nukes? Those use electricity, right? You okay with those?” I’m like, whoa, whoa, whoa, whoa. Slow down. Yes, those do use electricity, but that’s a whole different class of thing. That’s what I mean when I say, it was never my or EleutherAI’s stance that everything always should be released all the time. Because who knows? Maybe tomorrow, OpenAI creates some model that has some crazy capability or some really dangerous capability that is super scary, and they shouldn’t release that.

Basically, at some point, someone will create an AI system that is truly dangerous, that’s actually dangerous, Not just spam or something, but is truly dangerous. I don’t know what that system is going to look like. I don’t know who’s going to make it, but it’s going to exist. I think, it would be great if it’s not impossible for them to not release that. I think, it would be great if we can accept that maybe some things we should be careful about. Whether or not it applies to this specific situation that we have in front of us today.

[00:19:14] SM: It’s probably not easy to imagine, but it could be something that slips out of a lab, the same way that the first internet worms were not contained by mistake and ended up creating new carriers for infections inside computers. When we chatted another time, you mentioned scary scenarios of AI is unleashed with a pile of money attached to them to do maximization of shareholder’s value.

[00:19:41] CL: Yeah. That’s one of the scenarios, for example, I take relatively seriously. What does AI do? Generally, the way, for example, we have game playing AI, right? Usually, this is what’s called reinforcement learning. The way this usually works is we have some functions that can score, and you can train the AI to do whatever actions maximizes the score. We get a high score in [inaudible 0:20:02] or whatever. If we just straightforwardly extrapolate this, where we’re going right now. Look at the AI technologies today versus two or three years ago. Nowadays, we have AI that can just type in a sentence and they’ll generate a full photorealistic image of anything you can imagine.

You have these GPT systems that can write full stories, or chat with you like a person, or you can have – or like the Minerva system recently, which can solve incredibly difficult math problems. It’s as good as a human, or even better. This incredible sophistication that did not exist two or three years ago. Now, let’s say that just continues. Let’s just take a naive attempt. We’ll say, okay, things went this fast the last two years. Let’s just imagine the next two years go just as fast, or faster, and then the next two years after that, and the next two years after that, and the next two years after that. Something has to give.

Either progress is going to slow down for some reason, or we’re going to see some crazy systems really, really soon. Systems that can optimize for very complex goals, that we can have assistance that we tell them to do a thing, and then they can log on to the Internet and just do those tasks. Now, we imagine we have these systems, they’re more and more powerful. Now, say we have some really big corporation, Google, or OpenAI, or whatever. The biggest system of these kinds ever, something that’s so powerful, it’s smarter than humans. It runs a million times faster. It’s read all books in history. It can do perfect IMO gold medal in mathematics, etc., etc. Then you give it some goal like, okay, make maximum profit. What will such a system do?

I think, if you meditate on that question a bit, the obvious things are not always good, or even mostly not good. I mean, if you’re trying to maximize share price of your company, well, why not just hack the stock exchange? Why not just put a gun to the stock exchange CEO’s head and say, “Increase my price right now.” Why not do all kinds of crazy things, blackmail people, or manipulate people, or that create a huge propaganda campaign?

[00:22:05] SM: Right. For humans, we have set norms and laws to prevent that.

[00:22:09] CL: Those also don’t always work. We have corporations doing illegal and bad things all the time. Well, now, let’s imagine we have a corporation that’s also a 1,000 times smarter than any other human. It’s better at hacking. It’s better at coming up with plans. It’s better at propaganda. It can generate images and videos and the voices, can impersonate anyone. These are all things that AIs already do. None of this is really – except the planning. None of this is really science fiction. We can already imitate voices. We can already generate arbitrary images.

There’s already hacking tools that use AI. There’s already math solving AI. All this stuff is already real. Now, we just have this put the parts together in our head and extrapolate. Then, it’s pretty clear that this could pretty easily lead to some pretty scary scenarios really quickly.

[00:22:55] SM: Absolutely. Yes. Without going even, as you were saying, too far out in the future, there are already cases where we can’t really distinguish the actions of a human versus the ones of an AI.


[00:23:11] SM: Deep Dive is supported by our sponsor, DataStax. DataStax is the real-time data company. With DataStax, any enterprise can mobilize real-time data and quickly build the smart, highly scalable applications required to become a data-driven business and unlock the full potential of AI. With AstraDB and Astra streaming, DataStax uniquely delivers the power of Apache Cassandra, the world’s most scalable database, with the advanced Apache pulsar streaming technology in an open data stack available on any cloud.

DataStax lives the open-source cycle of innovation every day, in an emerging AI everywhere future. Learn more at datastax.com.

[00:23:51] ANNOUNCER: No sponsor had any right or opportunity to approve or disapprove the content of this podcast


[00:23:56] SM: Now, I wanted to go back a little bit to the power of AI and the risks that it poses and talk about the mitigations. What you think we should be doing as a society to make sure that these systems don’t spin out of control, but don’t stop the progress. We can’t say, don’t do AI anymore.

[00:24:14] CL: Telling people not to do AI is hopeless. People can’t coordinate our own stuff like that. It’s way too profitable. There’s this archetype of the scientist where a problem, even if it’s bad, it’s just too sweet and has to be solved. There’s quite a number of people in the AI community, who themselves have admitted at several points in time, where they’re like, “Yeah, this might be dangerous, but I can’t help myself. It’s just too cool. I have to do it.” John von Neumann is quite known for having said stuff like that. Several modern AI people I’m not going to name have said similar things in public in the past.

Obviously, just shutting down AI or something, it’s neither feasible nor desirable. AI is also the most powerful technology of our time to improve our lives, to allow us to address tons of problems that we are currently facing. I think a massive amount, maybe even the majority of problems in our society, the bottlenecks to solving them is more intelligence. If we could just solve science faster, if we could just develop cures faster, if we could just do all these things faster and more efficiently, we could improve society immeasurably.

Imagine if our scientists just worked a 100 times faster. That would be insane. We would live in just in such an incredible world. That would make the world so much better, more than almost anything else in the history of mankind, if we could do this. Clearly, as tempting as that is, it is a double-edged sword. AI is a tool. It is not good nor evil. It’s just a tool. It’s just a technology. It is a system that can be both good and bad. I think abuse is definitely a possibility, but I’m far more concerned about the accident-type scenario.

We have totally well-intentioned people trying to build, like an AI scientist or something. They just aren’t careful, or just not aware that this could go wrong in some way, and accidentally build a system that does something totally different. Before they notice that something’s wrong, it’s already escaped onto the internet.

[00:26:08] SM: We have Skynet. No, that’s a movie.

[00:26:11] CL: The government is already – and militaries are already eyeing AI everywhere. Think of this as, first, there is a technical solvable problem, which is something called the alignment problem, which is the problem of how do we get an AI system to actually do what we want? This sounds trivial, but it’s actually really hard. Because what a human actually wants isn’t usually what he says, when a human says something.

There’s all these hidden assumptions. If I tell my robot to go get me coffee, the only thing the robot wants to do is to get coffee, hypothetically. It wants to go and get the coffee as quickly as possible, so it’ll run through the wall, run over my cat, throw grandma out of the way to get to the coffee machine as fast as possible. Then if I run up, “No, no, no. Bad robot,” and I try to shut it off, what will happen? The robot will stop me from hitting the off button. Not because it’s conscious or it has a will to live. No, it will simply be because the robot wants to get the coffee. If it’s shut off, it can’t get me coffee. It will resist. It will actively fight me to get the coffee, which is of course, silly. Of course, it’s silly.

You can imagine how systems that are deployed in the wild could get this property of, well, if it’s maximizing profit, well, shutting it down will not maximize profits. It better have a few backup copies running in the cloud, so it can’t be shut off. Then you have all these kinds of scary scenarios. What I personally focus on, it’s also what I do at Conjecture, is focusing on this technical problem. Okay, how could I even build a system that understands, do not do those things, that lets itself be shut off? That understands that, when I say, “Get the coffee,” I also mean, “Don’t run over grandma,” that understands that these are what I mean by that. It doesn’t do crazy, insane things whenever I ask for normal things. This is a really hard technical problem. Really hard. It sounds easy, but the deeper you go into it, the more you’re like, “Oh, shit. This is genuinely difficult and confusing as hell.” Because humans are confusing, right?

[00:28:28] SM: Right. Yeah, yeah.

[00:28:29] CL: We want all kinds of weird things. Humans are confusing, and the world is confusing. Yeah, things are complicated. The one thing I would want is just have a few more smart people working on this problem. I’m not even saying everyone should work on this. I’m not even saying everyone should drop everything to work on this. A few top professors could consider working on this problem. It’s a pretty cool problem. It’s an important problem. It’s clearly something that top AI professors would be perfectly suited to work on, and somehow very few are working on this. There’s a very small number of people working on this problem.

[00:29:06] SM: If I understand correctly, you’re thinking of the Asimov’s law of robotics embedded inside AI?

[00:29:14] CL: Unfortunately, the law, the three laws of robotics are selected for making interesting stories, not for actually working.

[00:29:20] SM: Absolutely.

[00:29:22] CL: Obviously, that was [inaudible 00:29:22]. Something like that that does work would be great.

[00:29:27] SM: Something that works that can prevent, if I may try to summarize to see if I understand correctly, you’re basically thinking at solving a problem of embedding some safeguards inside the code itself, inside the machines themselves, so that we can predict and we can expect that, if the lever of shutdown is pulled, it actually shuts down.

[00:29:49] CL: That’s what’s known as the stop button problem. If someone would find a solution to stop button problem, I would be over the moon. I would be so happy, because it’s actually genuinely very hard. It’s very, very hard to make a robot that is truly indifferent to being shut off. Because usually, what happens is either they try to resist being shut down, or they become suicidal and instantly shut themselves off. It’s very hard. No one knows how to do this. No one knows how to build a robot that doesn’t care, that will let you shut it down, that will not resist you, but also, would shut itself down. No one knows how to do this currently, mathematically.

I don’t think that’s the whole problem. I think there’s more problems, like inferring human preferences, like all these unspoken things. Having the conservatism, avoiding robots doing some crazy things, whatever. To be clear, I say robots, I don’t actually expect it to be robots. Like, artificial systems that are GPT-3 programs. Just robots is more evocative. Yeah, I think there’s a bunch of problems here that we just really don’t have answers to, but it seems like we should be working on.

The stop button problem is a pretty clear problem that just – more people should be trying to solve this. I think, in the whole world, there’s maybe 200 people working on this problem in total, as far as I’m aware, which seems like, there should be a few more people take this problem seriously. If they find out there’s a simple solution to it, great, awesome. Then I’m the happiest man alive. Let’s go. Currently, the way things look, there’s a lot of these problems that we don’t have answers for, and that’s kind of scary.

[00:31:19] SM: Yeah, it’s interesting, it’s probably not as sexy as others.

[00:31:21] CL: Yeah. It’s much funner to build the bigger system that solves all the problems, and it’s faster than all the other ones, and you make a lot of money off of and you raise a lot of VC money. Of course, it’s more fun to build bigger and bigger and bigger things. I totally get that. I am guilty of this myself in the past. Ultimately, if the thing doesn’t do what you want it to do, that’s going to be a problem.

[00:31:46] SM: Sounds a little bit like the same problem that computer software has with security. It’s always an afterthought, because it’s net cost rather than something that’s immediately perceived as bringing value.

[00:31:56] CL: Yeah. It’s funny, you bring that up. If I had one message to the wider world who may or may not listen to this, I think one of the biggest things, like one group of people that I wish would work on this problem, and as far as I could tell, aren’t going to work this problem, is security hackers. People working computer security applying their minds to AI safety is a clear fit. It’s a classic security problem. How do we get these systems to behave the way we want them to and not the way we don’t want them to? It’s a classic security problem. It’s a very novel, hard problem. You have to solve all these kinds of new challenges here. It seems like a perfect fit for the computer security community. I would love to see more people from the computer security world trying to tackle this problem.

[00:32:38] SM: How, for example, hackers should fix a bug inside a model?

[00:32:42] CL: Well, currently, we don’t know. Someone should try.

[00:32:45] SM: Okay.

[00:32:46] CL: We’re at the point where we have these super complicated systems, GPT, used in the wild and whatever, we just have no idea what’s going on inside of them. We have some ideas. It’s not like we can look at the code. That doesn’t tell us anything. Not really. There’s all these weird things happening internally. What is the computation internally doing? It’s not possible currently that we say, “Oh, we see a failure case in our model.” And we’re like, “Oh, that’s not good.” Then we’d go into the model and fix it. We can’t currently do this.

I don’t think this is a fundamental problem. I think if we develop the tools and the technologies, this is a thing we could learn how to do. There’s already some very early work in this direction. David Bau’s lab at MIT, for example, has published a paper not too long ago, where they managed to edit memories of language models. They, for example, made a GPT model believe that the Eiffel Tower is in Rome, instead of Paris, which is incredibly cool. That’s incredibly cool.

This is obviously how we should develop tools. Tools like this, where we can look at the memories, or edit them, and we can see how the internals of these models work. That’s a lot of what we do with Conjecture. We work on interpretability research. We try to take the inner parts of these networks, decompose them into understandable bits, and then see how can we see where failure modes come from? How can we edit these things? How can we manipulate them? How can we test them for safety features and so on?

This is a very early work. If you’re a young career researcher looking for some low-hanging fruit that haven’t yet been plucked, there is just an orchard. There’s a massive orchard of low-hanging fruit in interpretability and AI safety. I have such a huge list of projects that I wish we could do. I just don’t have enough time and I don’t have enough engineers to do. I think, it’s incredibly promising.

[00:34:32] SM: This is pretty awesome because you’re basically leaving us with a positive note by saying that one of the concerns that other speakers and people I’ve talked to have highlighted is how incredibly opaque these systems are. Once you build the model, you have a hard time, unless you retrain, which could be expensive. I don’t know if your output from the GPT-3 like is too sarcastic, or abuses of commas and does not know how to use punctuation correctly. How do you fix it without having to retrain the whole thing? Which is, as you were saying, it’s expensive. You’re basically saying that there are ways; there’s research going into the direction of looking into these artificial synapses, connections and tweak them in a way that we can predict or fix.

[00:35:23] CL: Yeah. There’s this meme that’s been around in the AI community for quite a while that neural networks are complete black boxes. It’s impossible to understand what’s going on. That is just false. That is just not true. I have overwhelming evidence that is just completely false. There is so much structure inside of neural networks. There are so many things you can understand inside of them. It doesn’t mean it’s easy. This is a very nascent level of research. I was skeptical about this too, two or three years ago. Now that I’ve actually worked on the problem for a while, and I’ve seen other people, I’m like, wow. Every time we put the effort into it and try to take apart and look at the different parts, there’s so many low-hanging fruit. There’s so much to be found. There’s so few people working on this problem.

There’s really just a handful of groups in the whole world really saying like, “Nope. We’re just going to try. We’re going to try to take these things apart.” I expect, over the next couple years, including some of the work, hopefully, from Conjecture, will show the computational primitives and the internal structure of these things that will allow us to look much more selectively understand what is going inside them. Can we edit these things as such? Will it be perfect? No, probably not.

[00:36:31] SM: Probably not.

[00:36:31] CL: I think there’s a massive amount of promise here that we’re just starting to unearth. I think, there’s real reason for optimism there. Will this solve the whole safety problem? Of course not.

[00:36:44] SM: Right. It’s a step.

[00:36:45] CL: It’s a really promising step forward. It’s like, we at Conjecture work on this problem quite a lot. We’re hoping to be actually publishing one of our results pretty soon, which I’m pretty excited about. We just really try to look at it and we found all this structure and all these pieces that you can understand and you can take apart. There are some groups out there that really are taking it seriously. I’m very optimistic that we will understand neural networks as white boxes, or way more, very soon.

[00:37:10] SM: That’s great to hear. What kind of resources do you need in order to understand that problem? We were saying that to train, you need a lot of data, engineering capacity, and compute to investigate inside the neural network. What do you need?

[00:37:26] CL: Creativity. You need to be creative because it’s a new field of research. Every time we come into a new field of research, you need to be creative. You need to come up with new ways of thinking about a problem. Luckily, you need much less resources than to train these models, because you use pre-trained models. You will need some GPU or something to do some of the research. That’s unfortunately, just the nature of ML research.

There’s a ton of research you can do by using EleutherAI models, for example, so you don’t have to retrain them from scratch. You can just use the feeder, and you can study the internal parts and do lots of interesting operations. The one thing you will definitely need is you need to actually learn the math. You need to actually know linear algebra, and you have to actually look at the internals of the model.

In the ML, we’ve gotten lazy. We’ve gotten lazy. We let the PyTorch handle all the linear algebra and stuff, and then we lose a real deep tacit understanding what’s going inside the model. Actually looking inside the models, what did these numbers mean? How do they combine? What’s the linear algebra here? This is not grad school level math. This is all undergrad level math, but really understanding what is going on inside the network and what is actually going on inside. Being comfortable with undergrad level, linear algebra and stuff like that is, I think, just incredibly undervalued. If you’re a postdoc, massive IQ, statistician, or algebraist, why not just take a shot at neural networks and see what the structure is inside of them? If you’re a formal systems PhD, and you have all this knowledge about formal languages and computability and whatever, why don’t you take a look at a GPT network and see what’s a transformer’s encode internally? Where the complexity properties? All these things I expect to lead to very interesting research.

[00:39:01] SF: Wonderful. This is a call for the next generation of geeks out there, math and linear algebra is what will prevent us from getting into Skynet situation. Alright. Connor, thank you.

[00:39:15] CL: My one shout out is: try your shot at linear algebra, interpretability, understanding neural networks. Shout out to the computer security world out there. Your skills are more required than ever, and I think this is going to be an extremely valuable field. Conjecture is not currently hiring. Hopefully, in the near future, we will be hiring. If you’re someone who’s very interested in interpretability, safety and/or an experienced computer security expert, you would be someone we might be want to talk to. Please feel free to reach out to us, conjecture.dev.


[00:39:49] SF: Thanks for listening. Thanks to our sponsor, Google. Remember to subscribe on your podcast player for more episodes. Please review and share. It helps more people find us. Visit deepdive.opensource.org, where you’ll find more episodes, learn about these issues, and you can donate to become a member. Members are the only reason we can do this work. If you have any feedback on this episode, or on Deep Dive AI in general, please email contact@opensource.org.

This podcast was produced by the Open Source Initiative, with help from Nicole Martinelli. Music by Jason Shaw of audionautix.com, under a Creative Commons Attribution 4.0 International license. Links in the episode notes.

[00:40:31] ANNOUNCER: The views expressed in this podcast are the personal views of the speakers and are not the views of their employers, the organizations they are affiliated with, their clients, or their customers. The information provided is not legal advice. No sponsor had any right or opportunity to approve or disapprove the content of this podcast.


The views expressed in this podcast are the personal views of the speakers and are not the views of their employers, the organizations they are affiliated with, their clients or their customers. The information provided is not legal advice. No sponsor had any right or opportunity to approve or disapprove the content of this podcast.

Keep up with Open Source

    We’ll never share your details and you can unsubscribe with a click! See our privacy policy.

    Other Episodes

    Episode 6: transcript

    EPISODE 6: How to secure AI systems “BD: Now we're in this stage of, 'Oh my, it works.' Defending AI was moot 20 years ago. It didn't do anything that was worth attacking. Now that we have AI systems that really are remarkably powerful, and that are jumping from...

    Episode 5: transcript

    “MZ: In order to train your networks in reasonable time schedule, we need something like GPU and the GPU requires no free driver, no free firmware, so it will be a problem if Debian community wants to reproduce neural networks in our own infrastructure. If we cannot...