ITSPmagazine Podcasts

OWASP Top 10 For Large Language Models: Project Update | An OWASP 2024 Global AppSec San Francisco Conversation with Steve Wilson | On Location Coverage with Sean Martin and Marco Ciappelli

Episode Summary

In this On Location episode, Sean Martin and Steve Wilson, Project Lead for the OWASP Top 10 for Large Language Model AI Applications, discuss the newest security challenges revolving around large language models (LLMs) and the significant insights from the OWASP Top 10 project. Learn about the key issues like prompt injection attacks, supply chain risks, excessive agency, and more, while getting a sneak peek into the upcoming updates to the OWASP Top 10 for LLMs.

Episode Notes

Guest: Steve Wilson, Chief Product Officer, Exabeam [@exabeam] & Project Lead, OWASP Top 10 for Larage Language Model Applications [@owasp]

On LinkedIn | https://www.linkedin.com/in/wilsonsd/

On Twitter | https://x.com/virtualsteve

____________________________

Hosts:

Sean Martin, Co-Founder at ITSPmagazine [@ITSPmagazine] and Host of Redefining CyberSecurity Podcast [@RedefiningCyber]

On ITSPmagazine | https://www.itspmagazine.com/sean-martin

Marco Ciappelli, Co-Founder at ITSPmagazine [@ITSPmagazine] and Host of Redefining Society Podcast

On ITSPmagazine | https://www.itspmagazine.com/itspmagazine-podcast-radio-hosts/marco-ciappelli

____________________________

Episode Notes

In this episode of the Chat on the Road On Location series for OWASP AppSec Global in San Francisco, Sean Martin hosts a compelling conversation with Steve Wilson, Project Lead for the OWASP Top 10 for Large Language Model AI Applications. The discussion, as you might guess, centers on the OWASP Top 10 list for Large Language Models (LLMs) and the security challenges associated with these technologies. Wilson highlights the growing relevance of AppSec, particularly with the surge in interest in AI and LLMs.

The conversation kicks off with an exploration of the LLM project that Wilson has been working on at OWASP, aimed at presenting an update on the OWASP Top 10 for LLMs. Wilson emphasizes the significance of prompt injection attacks, one of the key concerns on the OWASP list. He explains how attackers can craft prompts to manipulate LLMs into performing unintended actions, a tactic reminiscent of the SQL injection attacks that have plagued traditional software for years. This serves as a stark reminder of the need for vigilance in the development and deployment of LLMs.

Supply chain risks are another critical issue discussed. Wilson draws parallels to the Log4j incident, stressing that the AI software supply chain is currently a weak link. With the rapid growth of platforms like Hugging Face, the provenance of AI models and training datasets becomes a significant concern. Ensuring the integrity and security of these components is paramount to building robust AI-driven systems.

The notion of excessive agency is also explored—a concept that relates to the permissions and responsibilities assigned to LLMs. Wilson underscores the importance of limiting the scope of LLMs to prevent misuse or unauthorized actions. This point resonates with traditional security principles like least privilege but is recontextualized for the AI age. Overreliance on LLMs is another topic Martin and Wilson discuss.

The conversation touches on how people can place undue trust in AI outputs, leading to potentially hazardous outcomes. Ensuring users understand the limitations and potential inaccuracies of LLM-generated content is essential for safe and effective AI utilization.

Wilson also provides a preview of his upcoming session at the OWASP AppSec Global event, where he plans to share insights from the ongoing work on the 2.0 version of the OWASP Top 10 for LLMs. This next iteration will address how the field has matured and new security considerations that have emerged since the initial list.

Be sure to follow our Coverage Journey and subscribe to our podcasts!

____________________________

This Episode’s Sponsors

Are you interested in sponsoring our event coverage with an ad placement in the podcast?

Learn More 👉 https://itspm.ag/podadplc

____________________________

Follow our OWASP 2024 Global AppSec San Francisco coverage: https://www.itspmagazine.com/owasp-2024-global-appsec-san-francisco-cybersecurity-and-application-security-event-coverage

On YouTube: 📺 https://www.youtube.com/playlist?list=PLnYu0psdcllTcqoGpeR1rdo6p47Ozu1jt

Be sure to share and subscribe!

____________________________

Resources

OWASP Top 10 for Large Language Models: Project Update: https://owasp2024globalappsecsanfra.sched.com/event/1g3YF/owasp-top-10-for-large-language-models-project-update

Safeguarding Against Malicious Use of Large Language Models: A Review of the OWASP Top 10 for LLMs | A Conversation with Jason Haddix | Redefining CyberSecurity with Sean Martin: https://itsprad.io/redefining-cybersecurity-190

OWASP LLM AI Security & Governance Checklist: Practical Steps To Harness the Benefits of Large Language Models While Minimizing Potential Security Risks | A Conversation with Sandy Dunn | Redefining CyberSecurity Podcast with Sean Martin: https://itsprad.io/redefiningcybersecurity-287

Hacking Humans Using LLMs with Fredrik Heiding: Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models | Las Vegas Black Hat 2023 Event Coverage | Redefining CyberSecurity Podcast With Sean Martin and Marco Ciappelli: https://itsprad.io/redefining-cybersecurity-208

Learn more about OWASP 2024 Global AppSec San Francisco: https://sf.globalappsec.org/

____________________________

Catch all of our event coverage: https://www.itspmagazine.com/technology-cybersecurity-society-humanity-conference-and-event-coverage

To see and hear more Redefining CyberSecurity content on ITSPmagazine, visit: https://www.itspmagazine.com/redefining-cybersecurity-podcast

To see and hear more Redefining Society stories on ITSPmagazine, visit:
https://www.itspmagazine.com/redefining-society-podcast

Are you interested in sponsoring our event coverage with an ad placement in the podcast?

Learn More 👉 https://itspm.ag/podadplc

Want to tell your Brand Story as part of our event coverage?

Learn More 👉 https://itspm.ag/evtcovbrf

Episode Transcription

OWASP Top 10 For Large Language Models: Project Update | An OWASP 2024 Global AppSec San Francisco Conversation with Steve Wilson | On Location Coverage with Sean Martin and Marco Ciappelli

Please note that this transcript was created using AI technology and may contain inaccuracies or deviations from the original audio file. The transcript is provided for informational purposes only and should not be relied upon as a substitute for the original recording, as errors may exist. At this time, we provide it “as it is,” and we hope it can be helpful for our audience.

_________________________________________

Sean Martin: [00:00:00] And hello, everybody. You're very welcome to a new on location episode here on ITSB Magazine. I'm flying solo today. No Marco Ciapelli joining me, as many folks know when we're doing these conversations. Marco usually chats with me with our guests as we cover events, but he says when it's technical, leave me out of it.

You guys have fun. So here we are. I'm thrilled to have Steve Wilson on. Steve, how are you?

Steve Wilson: Doing great, Sean. Thanks for having me on.

Sean Martin: Yeah, good to have you on and, uh, and it's a topic near and dear to my heart. Broad, Broad's looking at AppSec is something that I love, love talking about. I think there's a lot of opportunity to, to do, uh, some good things in the space.

And of course there's no lack of interest in AI. And LLMs, and you're presenting on this topic around the project that you've, you've been working on at OWASP. You're presenting an update of the LLM [00:01:00] project at OWASP in San Francisco soon. So I'm excited to have a chat with you and understand a little bit more about the project and what people can expect with your session.

Before we get into that though, maybe a. A couple words on some of the things you work on, Olafsman otherwise, uh, to kind of set the stage.

Steve Wilson: Yeah, so, um, uh, I have a lot of things that I like working on, but I'd say all of them right now pivot around the crux of some combination of A. I. and cyber security. So, uh, my day job is I'm the chief product officer at X beam, which is a leader in A.

I. driven security operations. The company's been using. A. I. and machine learning and various. Formats for 10 years to detect anomalies and, and cybersecurity attacks. And, you know, that experience in the cybersecurity field using AI. You know, that kind of led me [00:02:00] as the discovery of all of, you know, the explosion of these large language models led me into getting interested in that and that led me to getting much more involved in OWASP last year and starting the OWASP top 10 for large language models project, which we'll talk about.

That in turn led me to a point where O'Reilly approached me about writing a book about LLM and AI security, which I just finished up and should be out next month. That's called the Developers Playbook for Large Language Model Security.

Sean Martin: I love it. And with any luck, I'll have, uh, have you back on to, uh, dig deeper into the book and what people can expect there.

Um, I, I've been fortunate enough to have a few conversations, uh, around the OWASP Top 10 for, uh, LLMs. When I was with, uh, Jason Haddox, we went through, we went through the Top 10 things and, and discussed what the impact [00:03:00] is to organizations if they don't address them and how the OWASP Top 10 helps with that.

And I also had the pleasure of speaking with, uh, Sandy Dunn, who does the, uh, Did the checklist, I believe for, for this as well to kind of help organizations take the top 10 to the next level. And so I'll, I'll include links to those, both, both of those conversations. Cause I think they're, they're very important.

Um, for the moment now, though, people haven't listened to those yet. Kind of describe to me why I think we all know why, but I want to hear it from you. Why, why, why we needed a top 10, uh, for. LLMs from OWASP. How did that whole, whole thing get started?

Steve Wilson: Yeah, it's, it's interesting. I mean, you, you have to rewind in the time machine a little bit and remember what the world was like in early 2023.

Um, you know, chat GPT had just come out. It went from zero to the world's most popular SAS [00:04:00] application in history in the matter of a few weeks. There was a ton of interest around this and, and people were just, Running off, starting to develop things, but working in the cyber security industry, I started doing some research on what does it mean to secure these things?

And, and people had written papers and there were a little scattered things, but frankly, there was nothing that was well organized on the topic. And. You know, working in the security industry, I was familiar with OWASP. I worked with people who were really in the OWASP community. And, um, and I was working actually closely with Jeff Williams at the time, who wrote the first OWASP top 10 list.

And, uh, I floated the idea with him of, What if we wrote a top 10 or large language models? And he actually encouraged me to pursue that and helped me get introduced to the right people at OWASP. Um, what was interesting is going back to when we started this and let's call it spring last [00:05:00] year was I announced it on my LinkedIn page.

I hoped I would find 10 or 15 like minded people in the world who are interested in this and we could build a little working group and discover some fun stuff. Uh, You know, just a random post on my LinkedIn page got tens of thousands of views. I had 200 people on a Slack channel within the first few days, all volunteering to help.

And it just grew from there. And, um, one of the things that we decided early on was, We had to do this quickly. Like there was, there was no guidance out there. The standards bodies, you know, the nists and the miters and all those as important as they are, they take a year, two years to do anything. We kind of put together a roadmap and said, let's do something in six weeks.

We got a lot of smart people here. Um, let's put something together quick and, and sort of the combination of the, the interest and the timing, it really blew [00:06:00] up and I, I don't know how many people have read it, but I guarantee you it's in the hundreds of thousands at this point.

Sean Martin: Yeah. And I think, uh, it's an important point.

I mean, I'm a huge fan of a wasp and, uh, To your point, there are tremendous numbers of incredibly smart people doing cool things and giving back through projects, small, big, large, all over the map, touching on different things. So I was thrilled to see this one come to bear. It interests me, or, I don't know if it interests me, but I find it Strange.

Well, I don't know. It's honestly, I can leave it at interesting. I guess the point that, this is, when Chat2BT came around with the interface where you could prompt through a UI. That's kind of what was new and took the world by storm. People were already building AI and large language model stuff before that.

I think, so we, we had the [00:07:00] risk there. I think the, the, the prompt based. Uh, exposure and then the wild success that the, that the UI driven, uh, chat GBT had kind of highlighted the fact that we really need to take, take some action here now.

Steve Wilson: Yeah. The, the way that I, I talk to people about it is. People have been developing stuff, real useful stuff with AI for decades.

I started my first AI company in 1992 and, you know, I sold software to Citibank and John Deere tractor, and they were using it to do interesting things, but, um, you know, the interesting thing about these AI applications is they were very much what I would call back office stuff. And, you know, it was the kind of stuff where the security is, you know, Could be very much traditional security where it's like, it's, hey, it's behind a firewall.

People don't see it. People don't touch it. Um, and the security research was, was actually a lot [00:08:00] of it was very academic about data poisoning, and it was more worried about Russian spies, implanting data that might affect, you know, U. S. defense. initiatives and things like that. The idea that these, these chatbots come out and be front of front and center.

Um, there'd been places where this had happened, you know, Microsoft put out a chatbot in sort of 2016. It was like a, supposed to be a cute, fun teen entertainment thing. And. People immediately hacked it using things like prompt injection and data poisoning and turned it into some sexist Nazi, um, was a PR disaster at the time.

So we've known those vulnerabilities were there, but with, with chat GPT immediately, every enterprise is thinking, how do I attach this to my enterprise data stores? How do I put this on my website? How do I give customers access to it? All that back office stuff went away. It's now frontline for your business.

And all of a sudden it went, it just shot to the [00:09:00] top of every CISO's list in terms of like, crap, how am I going to secure this?

Sean Martin: And the, and the growth of apps. And let's remember that if you're using an LLM, it's usually driven by an API that's used as part of some other app, which, so if you build an LLM driven service, you're probably Putting it into a bunch of things in the organization, uh, further exposing it as well.

Um, just for sake of clarity, I don't, I can run down them or, or you can either way, the, the, the current top 10, just to kind of paint that picture for folks.

Steve Wilson: Yeah, I don't think we need to read off the top 10, but I'll hit, I'll hit some of the ones that were, that were really key in the original top 10 and that do get a lot of interest.

Um, the one at the top, uh, for folks that are OWASPy, uh, the name will certainly sound familiar is it's called Prompt Injection and there's been some kind of injection attack on [00:10:00] almost every top 10 list for every technology going back to the very first top 10 list that had, you know, SQL injection at the top of the list.

Um, but Prompt Injection really in this case is, is anything where the attacker is using. I call a, you know, crafty prompt to get the bot to do something that's out of alignment with your wishes. And that could be between jailbreaking it and turning off all its guardrails. So you could use it for nefarious purposes or tricking it into giving you information that you shouldn't have.

Um, some of the other ones, uh, one of the ones that I talk about more and more, and we'll, we'll come back to this when we talk about the next iteration of the list is supply chain risk. Again, it's something OWASP people have become really familiar with. You look at things like Log4j, which was probably the biggest AppSec event of the last decade.

It was a supply chain issue. [00:11:00] Um, the AI software supply chain is a dumpster fire right now. Um, you know, it's this, this ecosystem has developed so fast that things like hugging face has exploded to become the get hub of AI stuff, but it's, it's grown up so fast, there isn't a lot of infrastructure for knowing what you're getting is from a good place.

And, um, you know, people have been doing studies. There's, there's thousands of tainted models and tainted trade. Training data sets and things that are out there. So where you're getting this stuff becomes very important. And how do you track the providence of what you're getting? And what are you putting into your application?

The basic problem set is the same as it has been, but whole new set of components. Um, some of them very different than, you know, sort of. Here's the Linux packages or the Python packages that I'm putting into my app.

Sean Martin: Um, and just [00:12:00] quickly, I think that I don't know, I don't know if I could count the number of times that this particular point came up in conversations that we had during Black Hat.

We did, I don't know, 24 podcasts for Black Hat and I don't know, maybe 15%, 20 percent of them had, had some, some connection to supply chain and APIs and NLMs and yeah, it's, it's definitely on top of mind for a lot of fans,

Steve Wilson: but I'll give you, I'll give you two more, um, which are the, the ones that have been, I'll call them controversial in certain ways because they're not the ones that sound like.

All the ones from the other top 10 lists. Um, one of them is what we call excessive agency. And, um, you could say with a little bit of stretch, it's kind of like least privilege, but really as, as people move from building, you know, chat bots to co pilots to autonomous agents, the [00:13:00] level of risk that you're taking on by providing responsibility for your LLM to take actions.

Goes up and up and up. And when we talk about excessive agency, um, it can be as much of a product management issue as it is an app sec issue. It's like, what are you designing your LLM to do? If your LLM is designed to execute a stock trade, then it's going to execute stock trades and. Uh, if, for example, someone uses a prompt injection to trick your bot into doing something, it might execute a trade that you didn't want it to execute.

And, you know, I take this all the way to the extreme, which is the example I use. Everybody's seen 2001, at least anybody who's a nerd. Um, You know, at the end of the movie, Hal turns off the life support systems for most of the crew. Um, why could Hal turn off the life support systems with no human in the loop?

Because they [00:14:00] gave him the agency to do that. And he probably shouldn't have had it. Um, and, you know, it is funny to look at 2001 now and go and watch it in a world where you use chat GPT every day. Boy, how's not science fiction at all anymore. It's like right in line. Um, last one I'll hit is what we call over reliance.

And this has to do with sort of the nature of hallucinations and people just believing stuff that LLMs Tell them that they shouldn't believe, and it gets you into a shocking amount of trouble if you don't manage this.

Sean Martin: So I want to, I like that last one as well, but I want to go back to the previous one and get your thoughts on this because obviously OWASP is designed to bring security and risk management back to the world of DevOps and engineering and application development.

And. In the old, I used to be a QA engineer and building stuff, testing stuff when it wasn't actually called AppSec, [00:15:00] but in the old days, I do have gray hair, in the old days, we could fairly easily define the scenarios that we knew we wanted this thing to function within, and therefore write code. Test cases and user scenarios to validate that what it's supposed to do it does what it's not supposed to do it doesn't to me when we when we throw in the LLM stuff, it's almost endless scenarios that are possible.

It seems to me. So I don't know your thoughts on that.

Steve Wilson: So this is this is one of the reasons that that I do. First off, when, when we talk about who the audience is for the top ten list, and then some of the other works that we've created from the group, like our CISO checklist, there's been so much interest from parties that aren't classically people who came to OWASP for guidance, but we've been out there on the front lines providing guidance, so these other audiences [00:16:00] Come and want to listen and want to get advice.

So we do put out a lot of guidance that's intended for, you know, CISOs or maybe even not really security people like product managers. And one of the first pieces of advice I do give people is. You want to limit the scope of what you're doing with the LLM, the more that you can constrain the scope of what it does, the less worries you have about what it's going to take on from an agency perspective, right?

If I really restrict its permissions, then it can do less. I tell people, not joking at all, that you need to treat your LLM as something between a confused deputy and an enemy sleeper agent in the middle of your app. And if you take that very skeptical attitude with that component, um, you know, kind of look where it sits in the trust boundaries.

You, you have to really scrutinize what goes on there. Um, and then, you know, you get to the point where you're thinking about from a [00:17:00] product management perspective, you know, how do I give the LLM enough data so that it knows how to do its job effectively and doesn't just hallucinate all the time because it's making stuff up, but how do I balance that with really restricting the data that it has access to so that it can't give the wrong data to the wrong person because it gets confused.

Sean Martin: Or wrong data to, uh, An authorized system.

So let's, let's shift over to your session. Uh, so all wasp global app sec in San Francisco, September 23 through 27. You're speaking on Thursday, the 26th at three 30, um, basically giving an update on the project. What can you share with us here? That will get people to join you in San Francisco for that session.

Steve Wilson: So, you know, we put out the first version of the list last summer, we updated it last fall, [00:18:00] um, and at that point we decided, look, we have some solid guidance out there, we're going to let it marinate a little bit. We're going to get a lot of feedback from people. We're also going to let this field mature a little bit and see where it goes.

And so we focused the early part of this year on mostly just doing evangelism, you know, going out and telling that story and, and bringing, bringing the guidance to more and more people, which was great. But earlier this year, we started what we call the 2. 0 project, which is What's the next major revision of the list going to be?

And, you know, we're aiming for that to come out later this fall. And so at the session, we're going to, we're going to talk about some of the things that we've learned since the first version of the list, what are the big topics that are cropping up? And what's maybe some of the ways that the list is evolving.

And we'll, we'll give a sneak peek at that, um, during the session.

Sean Martin: And how much, no specifics, cause we want people to hear the [00:19:00] whole session and maybe you can come back on and elaborate on some of the things that you presented and also heard from, from the group when you, when you had a chance to connect with them, there is the.

The feedback you're getting related to the top 10 specifically and, or are you hearing things of, well, here's how we made some assumptions, creating the top 10 based on how LLMs work, that's changed completely, how organizations are using them, that's changed completely as well, or significantly, I should say.

Yeah, there's just the, here's the feedback and what you did. And then also. Things, things have moved dramatically since the last time we put this up.

Steve Wilson: The good news is that the feedback on the first versions of the list has been just incredibly positive. People are really appreciative of the guidance and the fact that we managed to create a fairly tight document that people can understand and digest.

We've really seen [00:20:00] it get taken up by the industry and it's become kind of an underpinning to a lot of other standards bodies work. People like NIST and MITRE and things have taken it in and put it into their slower running standards bodies. And I think that's awesome. But what we have seen is the development patterns that people are using for LLMs have matured a lot.

Um, you know, not surprisingly, given how many people are doing it, how fast the space is moving. And, um, you know, just to tease a couple of the things that have kind of risen up the list, um, you know, we've talked about this idea of agency and wanting to limit that, but at the same time, people are putting a lot more autonomous agents.

Into practice. And so what does that mean? How do you, how do you approach that when you really do want to give it agency? What's the best way to do that? Um,

Sean Martin: uh, does that lead to people need to start thinking about response as [00:21:00] well? Looking for anomalies and then figuring out how to,

Steve Wilson: absolutely. I mean, I think that, um, You know, monitoring your LLM and logging everything that it's doing and looking at it almost like you do users with things like user behavior analytics.

That's, you know, analyzing what's going on in your LLM is incredibly important. Um, same thing when you're looking at what's going on inside the app. One of the other just huge topics and shifts in terms of how people develop this is, um, you know, in classic. AI security. A lot of your focus is on your training process and your training data.

And what we're finding is most of the people using LLMs right now do no training. They took a pre trained transformer and they're using patterns like retrieval augmented generation, or what you call rag. To give it data and context. And it turns out a lot of [00:22:00] security considerations with how you do that.

And so I think that's another thing finding its way to sort of top of consciousness for the expert group right now. Nice

Sean Martin: one. Well, I'm excited to, uh, to hear more during this session and, uh, continue to watch the progress of this, uh, of this project, I appreciate you did putting it out there to start and getting so many people involved in getting, getting something out quickly.

Sure. Continuing to invest in, uh, in updating it. Everybody should, uh, go to AppSecGlobal2024, San Francisco, September 23, 27, catch Steve for the project, project update for OWASP top 10 for LLMs on Thursday, the 26th at 3. 30. And of course I'll, I'll include links to, uh, to the session, to your book, which touches on this as well, um, as, as that's available and a couple other chats that I mentioned at the beginning.

Please do connect with Steve. [00:23:00] See everybody at OWASP Apps at Global in San Francisco very, very soon. Thanks, Steve, for joining me.

Steve Wilson: Thanks, Sean, for having me and I look forward to seeing everybody in San Francisco soon.

Sean Martin: Perfect. Thanks, everybody, for listening and watching. Please stay tuned. I, uh, I have some more, more things up my sleeve, uh, for OWASP AppSec Global in San Francisco and many more events that, uh, Mark and I plan to, uh, cover this year and into the beginning of next.

Please stay tuned to ITSB Magazine. Thanks, everybody.