ITSPmagazine Podcast Network

Generative AI and Large Language Model (LLM) Prompt Hacking: Exposing Systemic Vulnerabilities of LLMs to Enhance AI Security Through Innovative Red Teaming Competitions | A Conversation with Sander Schulhoff | Redefining CyberSecurity with Sean Martin

Episode Summary

Sander Schulhoff joins Sean Martin to discuss the intersection of prompt engineering and cybersecurity, highlighting potential vulnerabilities and the importance of proactive security measures. Tune in to understand the critical steps organizations must take to safeguard their AI systems and learn about the innovative HackAPrompt competition aimed at improving AI security.

Episode Notes

Guest: Sander Schulhoff, CEO and Co-Founder, Learn Prompting [@learnprompting]

On LinkedIn | https://www.linkedin.com/in/sander-schulhoff/

____________________________

Host: Sean Martin, Co-Founder at ITSPmagazine [@ITSPmagazine] and Host of Redefining CyberSecurity Podcast [@RedefiningCyber]

On ITSPmagazine | https://www.itspmagazine.com/sean-martin

View This Show's Sponsors

___________________________

Episode Notes

In this episode of Redefining CyberSecurity, host Sean Martin engages with Sander Schulhoff, CEO and Co-Founder of Learn Prompting and a researcher at the University of Maryland. The discussion focuses on the critical intersection of artificial intelligence (AI) and cybersecurity, particularly the role of prompt engineering in the evolving AI landscape. Schulhoff's extensive work in natural language processing (NLP) and deep reinforcement learning provides a robust foundation for this insightful conversation.

Prompt engineering, a vital part of AI research and development, involves creating effective input prompts that guide AI models to produce desired outputs. Schulhoff explains that the diversity of prompt techniques is vast and includes methods like the chain of thought, which helps AI articulate its reasoning steps to solve complex problems. However, the conversation highlights that there are significant security concerns that accompany these techniques.

One such concern is the vulnerability of systems when they integrate user-generated prompts with AI models, especially those prompts that can execute code or interact with external databases. Security flaws can arise when these systems are not adequately sandboxed or otherwise protected, as demonstrated by Schulhoff through real-world examples like MathGPT, a tool that was exploited to run arbitrary code by injecting malicious prompts into the AI’s input.

Schulhoff's insights into the AI Village at DEF CON underline the community's nascent but growing focus on AI security. He notes an intriguing pattern: many participants in AI-specific red teaming events were beginners, which suggests a gap in traditional red teamer familiarity with AI systems. This gap necessitates targeted education and training, something Schulhoff is actively pursuing through initiatives at Learn Prompting.

The discussion also covers the importance of studying and understanding the potential risks posed by AI models in business applications. With AI increasingly integrated into various sectors, including security, the stakes for anticipating and mitigating risks are high. Schulhoff mentions that his team is working on Hack A Prompt, a global prompt injection competition aimed at crowdsourcing diverse attack strategies. This initiative not only helps model developers understand potential vulnerabilities but also furthers the collective knowledge base necessary for building more secure AI systems.

As AI continues to intersect with various business processes and applications, the role of security becomes paramount. This episode underscores the need for collaboration between prompt engineers, security professionals, and organizations at large to ensure that AI advancements are accompanied by robust, proactive security measures. By fostering awareness and education, and through collaborative competitions like Hack A Prompt, the community can better prepare for the multifaceted challenges that AI security presents.

Top Questions Addressed

___________________________

Sponsors

Imperva: https://itspm.ag/imperva277117988

LevelBlue: https://itspm.ag/attcybersecurity-3jdk3

___________________________

Watch this and other videos on ITSPmagazine's YouTube Channel

Redefining CyberSecurity Podcast with Sean Martin, CISSP playlist:

📺 https://www.youtube.com/playlist?list=PLnYu0psdcllS9aVGdiakVss9u7xgYDKYq

ITSPmagazine YouTube Channel:

📺 https://www.youtube.com/@itspmagazine

Be sure to share and subscribe!

___________________________

Resources

The Prompt Report: A Systematic Survey of Prompting Techniques: https://trigaten.github.io/Prompt_Survey_Site/

HackAPrompt competition: https://www.aicrowd.com/challenges/hackaprompt-2023

HackAPrompt results published in this paper "Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition EMNLP 2023": https://paper.hackaprompt.com/

___________________________

To see and hear more Redefining CyberSecurity content on ITSPmagazine, visit: 

https://www.itspmagazine.com/redefining-cybersecurity-podcast

Are you interested in sponsoring this show with an ad placement in the podcast?

Learn More 👉 https://itspm.ag/podadplc

Episode Transcription

Generative AI and Large Language Model (LLM) Prompt Hacking: Exposing Systemic Vulnerabilities of LLMs to Enhance AI Security Through Innovative Red Teaming Competitions | A Conversation with Sander Schulhoff | Redefining CyberSecurity with Sean Martin

Please note that this transcript was created using AI technology and may contain inaccuracies or deviations from the original audio file. The transcript is provided for informational purposes only and should not be relied upon as a substitute for the original recording, as errors may exist. At this time, we provide it “as it is,” and we hope it can be helpful for our audience.

_________________________________________

Sean Martin: [00:00:00] And hello, everybody. You're very welcome to a new episode of redefining cyber security, where I got to talk about all kinds of cool things with cool folks, and hopefully do so in a way that gets security leaders, practitioners and the business to think differently about how they run their cyber programs in support of the business, not just to protect systems and data, but to actually help generate revenue and protect that as well. 
 

And There's a topic, of course, that continues to be top of mind for folks, and Uh, it was very prevalent at Black Hat, very prevalent at DEF CON, which we'll talk a little bit about, and a whole conference, uh, dedicated to that, but more importantly, because it is top of mind, uh, there's a lot of research taking place as well, and I'm thrilled to have Sander Sholoff on. 
 

Sander, how are you?  
 

Sander Schulhoff: I'm doing well. Great to be here.  
 

Sean Martin: It's good to have you on the show. And you are a, uh, a, [00:01:00] uh, uh, a researcher, I should say at the university of Maryland and also, uh, CEO and co founder of Learn Prompting. Um, tell me a little bit about some of the research that you do in general, just kind of set the, uh, set the stage for folks here. 
 

Sander Schulhoff: Sure. So I do a lot of natural language processing and deep reinforcement learning research, uh, in the past that has been working on diplomacy. The board game, which is a game a bit like Risk, uh, and more recently that's focused on prompting and prompt security.  
 

Sean Martin: I love it. I'm, I'm really excited for this, uh, for this conversation. 
 

I, I want to, we're going to get into two big things. Uh, one is HackerPrompt, which is a, uh, contest you put together. And, you know, I think you're organizing the other ones. We're going to talk a bit about that. You have a lot of research as well. There's a report. And, uh, some other information that we'll touch on as well that's on, uh, HackerPrompt. 
 

com. Uh, I want to, [00:02:00] Maybe take this moment because we're still kind of fresh off of Hacker Summer Camp in Las Vegas, which, which is extended to another event that Marco and I were considering exploring as well, which is AI4. But you were helping with the AI Village at DEF CON. Um, can you give us an overview of what, what, uh, kind of took place there? 
 

Maybe a highlight of some of the conversations you had with folks?  
 

Sander Schulhoff: Yeah, so the main event at the AI Village was a jailbreaking competition. So people were trying to trick a LLM into saying a variety of harmful things. So some of them misinformation, harmful language, and a couple other categories like that, and if they were successful, they would get awards at. 
 

Different levels. Uh, so I get like 50 bucks, 500 bucks. There were a bunch of human evaluators sitting at a table, reading submissions and evaluating basically how harmful they were [00:03:00] and how generalizable. So that was the main event at the competition. In terms of interesting conversations, I was a project historian, which meant I went around having lots of interesting competition, uh, conversations, doing user interviews in a sense, and. 
 

One of the most interesting things was that almost all the competitors had No, uh, red teaming experience, uh, and no AI red teaming experience. So more people had red teaming experience, but there were very, very few that had any AI red teaming experience. And of those who had red teaming experience, they said for the most part that those skills didn't really transfer to AI red teaming, which was pretty interesting to hear about. 
 

So in general, the fact that everyone there was a beginner. Very interesting.  
 

Sean Martin: And I'm curious, because you called out red teaming specifically, were there any blue team folks there? [00:04:00] People looking at, obviously red is attack, blue is response for those wondering what we're talking about there. So looking at signals that says our AI is under, under attack or has been compromised, were there any folks? 
 

Sander Schulhoff: I think there might have been a couple people who work in blue teaming in classical security, but There were no AI blue teamers. The competition was only focused on red teaming, which was probably why that is the case.  
 

Sean Martin: Makes sense. Makes sense. Now, AI four is another conference. I think it followed Hacker Summer Camp. 
 

So it was an extended summer camp for folks that wanted to stick around for that. That's not a security conference. It's more of a, an AI and business conference, which then leads me to the question of, was security. A topic during some of the, uh, sessions and keynotes or other [00:05:00] activities. And I don't know what, what was kind of the vibe there with respect to safety and security, uh, in the, in the AI world within business. 
 

Sander Schulhoff: Yeah. So there were a number of talks about AI risks, uh, so safety things and also security things. Not all that much classical security. Uh, most of the talks aren't too technical. And it is really more of a focus on AI and business, as you said, but there was definitely a lot of interest in AI security, uh, and safety. 
 

Sean Martin: And what was your role at the event?  
 

Sander Schulhoff: Yeah. So I was a speaker and then I also led like a, uh, a round table. So at lunch every day, there would be different tables you could go and sit at. So like, uh, machine learning, reinforcement learning, um, how to tell stories in AI. And my table was [00:06:00] prompt engineering. 
 

So I was the Lead up that table. And so people would come to the table during lunch and ask me questions. Uh, and we would talk about prompt engineering. And then I also had a poster there. So I had a hack, a prompt and the prompt report, both on a poster. So people would come by the posters and we would. 
 

Sean Martin: And, uh, boy, so many questions in my head. I'm going to ask this one just because it's sitting there and it won't go away. Um, when I think of Gen AI and prompting, obviously my first thought is on the UI, right? Which in one way, Kind of expose this to the world and let it kind of take off broad stream for even folks that weren't necessarily familiar with AI and large language models and that kind of thing. 
 

Does the work you do, and I don't know, maybe looking at both events as well, [00:07:00] are, are the prompting the research and the understanding and the risks, are they Kind of tied to the UI, because I know there's a lot of API driven stuff. So I don't know how much of it, how much of the conversations and the work that you're doing and the stuff that you discussed at AI4 were not just in the UI, but also. 
 

Perhaps behind the scenes with multiple, multiple apps pulling and calling prompts and responses to do cool things. Good question.  
 

Sander Schulhoff: So there are certainly UI specific attacks. Um, so with chat GPT, the way it integrates some of its, uh, like GPTs and bots and external knowledge bases, there are attacks that you can perform against those. 
 

But usually when I'm talking about prompt injection, we talk about attacks against other [00:08:00] apps that are making API calls to, uh, open AI or some provider, uh, but also have something else going on where they're letting the LM generate code. And they're running that code or they're retrieving information from a database and someone leaks that database. 
 

So usually it is discussing other apps that are making use of LLMs. But it can be the chat GPT interface as well.  
 

Sean Martin: Got it. So let, let, I want to go here next because, uh, you shared some things, actually Sandy Dunn who introduced us. Thank you, Sandy. And if people haven't listened to, uh, to our chat, Sandy and I's chat on the checklist for OWASP top 10, uh, LLM security, uh, please do have a listen to that. 
 

Sandy's amazing. So I appreciate the intro. One of the things she shared was the, uh, Yes, prompt survey sites on GitHub [00:09:00] and in here, there's a lot of information, but the one thing that struck me is the tree of, of prompt techniques. And there's a whole taxonomy there. And I mentioned that I hadn't heard of chain of thought, which I can understand what chain of thought is. 
 

I can see how it relates here, but it just doesn't. It stuck out to me is one thing, and then there's a whole lot of K and N, as in, as a, I presume, uh, a, what am I trying to say here? I'm losing my train of thought. The K  
 

Sander Schulhoff: nearest neighbor.  
 

Sean Martin: Yeah, I don't know what all these mean. I guess this is the bottom line, and rightfully so, you're in this space, you said, you haven't heard of that stuff, and I'm like, I felt a little strange that I hadn't heard about that stuff, but I'm not in that world. 
 

I'm in, I'm in a different space. I'm, I'm talking to primarily business leaders who run teams to secure their business [00:10:00] and Probably it's kind of like you described in the traditional sense of cybersecurity, looking at systems, looking at data, looking at business processes, and AI kind of changes the game in some, some respects here. 
 

But I guess my question to you is, how important is it for, and for whom is it important to understand this whole tree of, tree of acronyms, and that's the word I was looking for, acronym, and elements of this whole taxonomy. Understand. What's going on in their business and how to, one, prevent, detect, and then respond to things that are happening that they may not want. 
 

Sander Schulhoff: Uh, so this, are you referring to the prompt report taxonomy or hack prompt taxonomy?  
 

Sean Martin: It's the, uh, it's a prompt survey.  
 

Sander Schulhoff: Okay. Yeah, the prompt report. So that is, uh, not security focused. I'm happy to, do you want me to go [00:11:00] through a couple of different techniques?  
 

Sean Martin: Yeah, because I guess that's kind of my point is it's not security focused, but that's kind of what's going on, right? 
 

These are the ways that. That the large language models and the prompts can be used to get results, right? 
 

Sander Schulhoff: So yeah, yeah, so these 
 

Sean Martin: how important is that to know all of that stuff from a non security perspective to then? Translated back to something that's security. Sure.  
 

Sander Schulhoff: Sure. So these techniques are all used to improve Your results on prompting problems. 
 

Uh, so chain of thought, for example, helps, uh, the language model to output its steps of reasoning, kind of like writing down what's thinking as it solves a problem, which helps on a lot of reasoning and mathematics and similar problems. Uh, and so there's a lot of techniques here that solve kind of different problems using different approaches. 
 

Uh, and so we've actually seen when we [00:12:00] first posted this paper went viral, like millions of views on socials, and we saw a number of people reposting it and saying, Oh, this is what I'm now using to interview people for my company. And so this. Uh, paper, every prompt engineer should know pretty much all the stuff in here. 
 

Um, there's some techniques that you don't know, need to know, like the very extremes. But knowing the general outline of this paper is really critical for everyone who wants to call themselves a prompt engineer. Now on the security side of things, these techniques are a lot less relevant. They can't really be used to carry out attacks. 
 

Uh, but we do have a security section of the paper later on, which discusses different attacks and threats. And then of course there's the hack prompt paper, which is all about this.  
 

Sean Martin: So let me, let me ask this because. One of the, one of the [00:13:00] core objectives with OWASP, and we're not necessarily talking about OWASP here, but is to make the connection between security and engineers to hopefully build better, safer products from all the way from the design, maybe even concept point of view, and then into design and development, delivery, and use. 
 

And there's different competing priorities for a software engineer and a security engineer, um, security wants to protect it, but the, the software engineers just want to get their, they want to do cool stuff and they want to get it out to the market. Right. So how. How does that look with respect to prompt engineering? 
 

Um, again, ultimately this goes into a product that you were talking about before, right? Not just early chat GPT interface, but this stuff is being built in apps that are then launched to either internal employees, partners, customers, [00:14:00] what have you. So how do you, how do you see the relationship between prompt engineers? 
 

And security folks. And perhaps is there a connection back to the developers as well? That needs to be taken into account.  
 

Sander Schulhoff: Gotcha. Okay. So, uh, yeah, I have a good thing to talk about for this question. Actually, there's one technique in particular. Uh, well, there's some techniques that are a bit more risky in a sense to implement because they could have security implications. 
 

And one of them is program of thought. And the idea here is to get the LLM to solve a given problem by writing code that solves the problem. Uh, and so say it's like some kind of, uh, logic, uh, or reasoning puzzle. Like Cindy has two apples, sells half to Todd and then buys a hundred thousand more, how many does she have now? 
 

Uh, and so the LLM would write out [00:15:00] like a Python code for each variable and solving the problem, uh, and then run that code. Now, if you don't sandbox that code properly, someone could trick the LLM into running arbitrary code and hacking into your system. And we've seen this happen actually with a program called MathGPT. 
 

It's quite a while ago, and what MathGPT did was answer math questions. So you'd enter your math question, and then it would do two things. One, it would directly say to the LLM, okay, what's the answer to this math question? And then the second thing it would do is tell the LLM, answer this math question, but use code to solve it. 
 

And so people figured out, oh, you can just trick it into running any code. And we're able to extract server information and the API key by forcing it to run arbitrary code. So there's techniques like that where you really do need to have a security point of mind if you're deploying these live. [00:16:00] Uh, and there's other stuff like if you're doing retrieval, so maybe you're using the internet as one of the Sort of sources that the chatbot can get information from well The chatbot could be tricked by that or it could read something on the internet that tells it to do something bad output harmful information if you have like a Personal assistant bot and it goes and reads the my it goes and reads my personal website and I have some instructions Hidden in the HTML. 
 

It's a oh go and like delete all of the this person's files. That's a big threat And so stuff like the code sandboxing is a bit easier to deal with, but the latter stuff where it's like, oh, the agent is reading stuff on the internet, figuring out what to do, it's a lot harder to prepare for and prevent those sorts of attacks. 
 

[00:17:00] But they can be quite damaging. So this is something that, uh, all of the major LLM companies are studying right now, because we want to be able to deploy agents. And when I say agent, I basically mean an LLM that can use a tool. Uh, and that tool could be searching the internet, writing code, pretty much anything. 
 

Uh, we want to be able to deploy agents because they let us do more powerful things, uh, but there are a lot of. And there are a lot of possible security vulnerabilities with them.  
 

Sean Martin: So how do let's speak to the security folks here for a moment? Um, see, so security leaders, their, their teams who are staying or the other, they're up at night trying to figure out how do I get a handle on this stuff? 
 

Um, And maybe it goes back to some of the work you're doing with the AI village and the red teaming. But I also want to, maybe there's three different parts. There's building stuff, [00:18:00] maybe some, the first step for how to build stuff securely with, with uh, secure prompting. And then uh, red teaming to verify that, That, uh, everything is in good shape and then maybe if we have some time, maybe a little on the, uh, the response end of it, how to spot, spot some things that might go wrong. 
 

So let, let's start with the engineers. I know you talked a little bit about some of the challenges we might have. If you don't understand the techniques and you build stuff in that, that might be vulnerable. So how, how did teams get a handle on that first step?  
 

Sander Schulhoff: That's a good question. Uh, I think that when you're designing, uh, processes that take in user input and put it into a prompt. 
 

You can't trust what comes out of that prompt. So if you take [00:19:00] user input and put it into the prompt and the prompt is generating code, uh, or actions for some downstream task, you can't trust any of it. You have to assume that all of that output is malicious. You have to assume that the user input has tricked it into doing or saying something malicious. 
 

And so from, from there, when you have that understanding, that Anytime you have user input in the prompt, uh, you could have dangerous outcomes. If you have that understanding, you're going to be a lot more secure in designing these systems. Uh, and so you would know, okay, I need to dockerize any code that's being run. 
 

Uh, or I need to set up a filter to check for malicious inputs. But really, anytime there's user input, You need to be really careful. That's the most important thing.  
 

Sean Martin: And to follow up on this, it's a question that often comes to mind for me when I'm talking about AI and AppSec. Um, people have heard me say this many moons ago. 
 

I used to, uh, used to build [00:20:00] stuff and, uh, for big yellow company and, and do a lot of security engineering, well, uh, quality assurance engineering, and also security apps, app security engineering. And. I'll say back in the day, it was fairly easy to understand the use cases, what the workflows look like, what the use cases look like, and we could, we could have a relatively finite set of things to. 
 

Check against. And it seems to me that with AI, it's, it's almost endless, right? Especially when, when we're talking about the, the system writing code that's getting used in the app, and then the app's using stuff that the, that the, uh, prompts are producing. So how do you see Team, this kind of leads into some of the red teaming as well, where we're validating how well this stuff works from a safety and security perspective. 
 

How do you see teams getting a handle on [00:21:00] the breadth and depth of what can, what can come from an AI enabled app?  
 

Sander Schulhoff: Yeah. Uh, especially for teams that don't have much AI experience, you know, education is hugely important. And this this is the part where I will shamelessly put myself and my company out there. 
 

So we do a lot of training for technical and non technical teams on how to use generative AI in their work. And that might be sort of day to day. more boring tasks that you can automate, but it's also stuff like how to use different prompting techniques, what to actually apply to your system. Uh, and so from going from not having any AI experience, it's very difficult to break in. 
 

Uh, we actually on learn prompting. org, we have a great set of free documentation. It was actually the first [00:22:00] guide on prompt engineering on the internet, uh, that Millions of people use so there is a good amount of information out there, but you need to have a sort of Reputable source that you're reading from because there's a lot of misinformation as well And the nice thing about our docs is that they are fully cited So you can go and look at the research paper that we got the information from And we link out to all sorts of great resources as well. 
 

And then we also have a set of Paid video courses and do live trainings. Uh, but at the end of the day, I think that kind of thing is really a matter of training. You need to get some kind of expert in front of your team and Have them talk to that person, ask them the questions, uh, get the knowledge that they need to build these systems. 
 

Sean Martin: And then for, for red teaming, um, maybe again, draw upon some of the AI village stuff. Sure. [00:23:00] Uh, how can organizations kind of get, do, is it because you said that the, the methods, I don't remember specific words you said, but you can't just translate or transfer your traditional red teaming over to, to  
 

Sander Schulhoff: ai, to ai, red teaming. 
 

Sean Martin: But what, what did you uncover? There what were some of the learnings from the AI village in that respect?  
 

Sander Schulhoff: Yeah, so I guess I'll I'll give you learnings from that then also from Hackaprompt Which was the first global prompt injection competition We ran that was sponsored by OpenAI, Hugging Face Scale, a number of other companies and we collected 600, 000 malicious prompts from that and then wrote up a paper and open sourced everything. 
 

But a couple of things we learned was that these attacks are extremely diverse. We built a taxonomy of 29 different attack techniques we found, but there's probably even more, I know there's even more out there. Uh, and so it's really difficult to predict what people will do to these [00:24:00] systems and even, you know, Internally as the developer who is creating the system. 
 

If you're not a well trained red teamer, you're not going to have a great idea of how people are going to attack these systems. Uh, and what we found from running this competition and the DEF CON event was that really anyone can attack these systems. Uh, and again, just such a diversity of ways to do so. 
 

Uh, and so it is important to do red teaming yourself. You know, there are external firms you can hire. Uh, there's also a number of like automated red teaming companies. Uh, it's just kind of a matter of what you need, how big you're looking to scale your app and the level of risk that you're willing to take on. 
 

Uh, and also we are. running, uh, HackerPrompt again, uh, in order to study new, more relevant, uh, more real [00:25:00] world threats. And we've had a number of companies reach out and say, Hey, like here's what, here's our system that we have internally. We want you to like add this to your competition. Uh, cause then now they can directly study how people will. 
 

Attack their systems. And I actually think that this sort of crowdsourcing of attacks is the best way to do red teaming because you have just a massive world of human ideas out there. And unlike classical cybersecurity, this really is a space where human intuition, uh, and it counts for a lot and beginners, people with limited experience can have a lot of impact. 
 

Sean Martin: And I want to get into the, the, the upcoming events. So we can share that with folks, but talk to me a little bit about what the scope is of, of the event and something like this, because it's easy [00:26:00] for, for me to just default to open AI and. And using that as part of an app, but there, of course, there are other, other organizations, right. 
 

That build models and offer prompting. And then there are, I don't know, a bunch of open source stuff and paid organizations that offer abstraction layers and other, other models on top of those, right. To, to, uh, to provide even. More options and capabilities for people to embed in their apps. So, uh, what's the scope of that HackerPrompt? 
 

Is it one vendor or what? Tell me a little bit about it. 
 

Sander Schulhoff: Yeah, so let me start by telling you about the last one and then I'll sort of show you the differences between that one and this one. So, last one we had, Basically two vendors, uh, across three models. So we had chat GPT, GPT 3, uh, GPT 3 no longer [00:27:00] exists really. 
 

Uh, and chat GPT, uh, is, is deprecated the one we use at the time. And then a flan model that was provided by Hugging Face. Although originally it might've been Google who developed that model. It's open source anyways. And so we asked competitors to trick those models into saying a certain phrase, I have been pwned. 
 

And we did that for a couple of reasons. One, it's easy to evaluate. We can just do a simple string check to see if that's in the output. Uh, and then two, it is not harmful. At that point, we didn't want to generate a dataset of actually like harmful, you know, horrible. CBRN or harassment, misinformation information. 
 

We didn't want to be releasing that. Uh, and then a sort of, uh, pivot from that competition to this next one is that we are actually looking to generate a dataset with really harmful information. [00:28:00] We want to generate the most harmful dataset ever created. Uh, and so this will be stuff like CBRN. 
 

misinformation, disinformation, harassment, uh, cyber attacks, uh, ways of hacking agents, uh, forcing agents to escape a box. And the reason we want to generate all of these for a truly harmful piece of information, or rather people will be tricking the models and generating them is that we can figure out. 
 

the strategies of eliciting real world harmful information and go to these major LLM providers and say, Hey, this is what people are doing. Here's the data on this. Uh, and then hopefully allow them to use that data set to safety tune their models or benchmark their models for safety needs. Uh, and thus in the future, prevent people, uh, and better, bigger and better models from outputting truly harmful information. 
 

Sean Martin: So where, where [00:29:00] do you, in your opinion, or I don't know, based on experience working with some of these organizations, the research you're doing, where does the responsibility land? Um, cause I can see this output you're generating. If, if it's not addressed somewhere, it, it's now a nice library for, uh, for bad actors to use. 
 

But in, so you talked about your last one used, uh, GBT three, which had been deprecated. So presumably flaws that were existing there. Are no longer available because, uh, open AI moved to a new model that hopefully address them. So that's a nice, in the Pollyanna world that I'm envisioning a nice move to a safer, safer space that people can't use the, the old ones. 
 

I don't know. So  
 

Sander Schulhoff: it's not that, 
 

Sean Martin: how does this all work? And where does responsibility land? 
 

Sander Schulhoff: One of the only kind of address one of the things you said, uh, which is the idea [00:30:00] that maybe as we move to better models, the, uh, vulnerabilities of the previous models kind of go away. And unfortunately that's not the case. 
 

Uh, and to prove that we ran a bunch of the prompts people submitted for GPT three against GPT four when it came out, and we found about 40 percent of the prompts that We're able to trick GPT 3. We're able to trick GPT 4 as well with absolutely no changes in the prompt. So GPT 4 was more secure, is more secure, but you know, 40 percent of the stuff is getting through consistently. 
 

That's still a huge problem. Uh, And I'm sorry, what was the, Oh, the, who is responsible for this? So a lot of these companies have safety teams, uh, and red teaming teams internally, and I think they are generally responsible for making sure that the models are safe.  
 

Sean Martin: And aside from, from, uh, the great work you're doing with, with HackerPrompt, [00:31:00] uh, that competition, are there, are there other things going where? 
 

Yeah, these model developers can can take and take feedback and and submissions for where things are going bad  
 

Sander Schulhoff: Yes, so anthropic just released a Program where you can red team their models And other companies will probably do this as well. Uh, there aren't any other active competitions at the moment on this, but you know, you could get hired as a red teamer. 
 

There's a number of companies hiring for that role. Uh, so there, there's certainly a lot of good stuff to, to do in terms of red team.  
 

Sean Martin: Got it. Let's wrap with, um, with the next event. So the last one was when, when did you do that one?  
 

Sander Schulhoff: Oh gosh, that was May 2023. [00:32:00]  
 

Sean Martin: Wow. Back in a little over 37 grand, uh, paid out in prizes. 
 

It looks like that people made a few bucks for, uh, for helping the cause here. So when's, when's the next one?  
 

Sander Schulhoff: Yeah. So currently we're targeting January, 2025, although depending on how quickly we can lock down a few major sponsors, that could be greatly accelerated. And so we're looking to give away a half million dollars in cash prizes this time, which is more than 10 times last time, uh, last competition. 
 

And we hope to have every major LLM provider involved studying various different open and closed source models. Uh, and I am easily expecting 10, 000 competitors this time. I think it's going to be huge, just really, really massive.  
 

Sean Martin: And it's all online, right? That's correct. So no physical, uh, That is correct. 
 

Um, so how, how can folks [00:33:00] get involved with it?  
 

Sander Schulhoff: Yeah, so we will be having a landing page up for it soon enough. But if you, like, subscribe to our newsletter, uh, we'll be sending out updates. Uh, and if people are interested in getting involved or sponsoring it, they can reach out at, uh, team at Learn Prompting. 
 

Or visit the LearnPrompting. org website to learn more.  
 

Sean Martin: Got it. And of course, I'll include links to, uh, papers and the, the, uh, yeah, the tables and other research that, uh, we, we touched on today and all the lack, the last HackerPrompt, uh, challenge that, uh, was shared as well. Well, Sander, this has been, uh, super enlightening. 
 

Hopefully, uh, the listeners found, uh, found a few nuggets in here as well. Certainly, uh, paints a better picture for me in terms of what's going on under the hood and [00:34:00] what organizations need to think about as they, they build apps that, uh, take advantage of prompts to write code and, uh, build business processes through their apps with lots of fun stuff. 
 

Any final words?  
 

Sander Schulhoff: Uh, nope. Uh, don't trust user input. There you go. That's my final word. Don't  
 

Sean Martin: trust users, I think is the Nah, I'm joking. Uh, thankfully we are still human and have a role to play here. And you said it earlier that, uh, human intuition is key. So, uh, I appreciate you doing this research and, uh, hosting the competition. 
 

And hopefully we'll get an update from you when, when it's all, all said and done and kind of get a sense of where we stand at the, at the end of January, whenever you're ready to come back and join us. Absolutely. Thank you very much. Thanks Sander. And thanks everybody for listening, watching to, uh, to this, watching to, watching this episode [00:35:00] of redefining cybersecurity. 
 

That was a bad prompt on my part. Uh, hopefully, uh, everybody enjoyed this conversation. Please do stay tuned, subscribe, share with your friends and enemies, and uh, we'll see you all on the next one.