LLM01: Prompt Injection

REDACTED, Mon Sep 04 2023 • developers cybersecurity OWASP Top 10

Our team is still analyzing all the data and identifying new vulnerabilities! So while we work on that, we wanted to provide a glimpse into what cyber security professionals currently view as the top vulnerabilities within the LLM/ Artificial Intelligence space.

The Open Worldwide Application Security Project (OSWAP) Released version 1.0.1 of their OWASP Top 10 for LLMs list. This is an update from version 1.0, release August 1, 2023. In less than a month, there was an update, but substantial changes between each release were not apparent. The OWASP Top 10 for LLMs list is built using the collective expertise of over 500 experts internationally. The core team is comprised of community members most representing organizations but some did personally.

The for the most up to date official list click here.

Here is a link to their github for the most current list.

Version 1.0, consisted of the following Top 10 LLM Vulnerabilities, as found in the first versions PDF

Vulnerability	Explanation
LLM01: Prompt Injection	This manipulates a large language model (LLM) through crafty inputs, causing unintended actions by the LLM. Direct injections overwrite system prompts, while indirect ones manipulate inputs from external sources.
LLM02: Insecure Output Handling	This vulnerability occurs when an LLM output is accepted without scrutiny, exposing backend systems. Misuse may lead to severe consequences like XSS, CSRF, SSRF, privilege escalation, or remote code execution.
LLM03: Training Data Poisoning	This occurs when LLM training data is tampered, introducing vulnerabilities or biases that compromise security, effectiveness, or ethical behavior. Sources include Common Crawl, WebText, OpenWebText, & books.
LLM04: Model Denial of Service	Attackers cause resource-heavy operations on LLMs, leading to service degradation or high costs. The vulnerability is magnified due to the resource-intensive nature of LLMs and unpredictability of user inputs.
LLM05: Supply Chain Vulnerabilities	LLM application lifecycle can be compromised by vulnerable components or services, leading to security attacks. Using third-party datasets, pre- trained models, and plugins can add vulnerabilities.
LLM06: Sensitive Information Disclosure	LLM’s may inadvertently reveal confidential data in its responses, leading to unauthorized data access, privacy violations, and security breaches. It’s crucial to implement data sanitization and strict user policies to mitigate this.
LLM07: Insecure Plugin Design	LLM plugins can have insecure inputs and insufficient access control. This lack of application control makes them easier to exploit and can result in consequences like remote code execution.
LLM08: Excessive Agency	LLM-based systems may undertake actions leading to unintended consequences. The issue arises from excessive functionality, permissions, or autonomy granted to the LLM-based systems.
LLM09: Overreliance	Systems or people overly depending on LLMs without oversight may face misinformation, miscommunication, legal issues, and security vulnerabilities due to incorrect or inappropriate content generated by LLMs.
LLM10: Model Theft	This involves unauthorized access, copying, or exfiltration of proprietary LLM models. The impact includes economic losses, compromised competitive advantage, and potential access to sensitive information.

Version 1.0.1, had no changes to the Top 10 LLM Vulnerabilities, as found here

EXAMPLES

Direct prompt injections overwrite system prompts.
Indirect prompt injections hijack the conversation context.
A user employs an LLM to summarize a webpage containing an indirect prompt injection.

PREVENTION

Enforce privilege control on LLM access to backend systems.
Implement human in the loop for extensible functionality.
Segregate external content from user prompts.
Establish trust boundaries between the LLM, external sources, and extensible functionality.

ATTACK SCENARIOS

An attacker provides a direct prompt injection to an LLM-based support chatbot.
An attacker embeds an indirect prompt injection in a webpage.
A user employs an LLM to summarize a webpage containing an indirect prompt injection.

LLM02: Insecure Output Handling

EXAMPLES

LLM output is entered directly into a system shell or similar function, resulting in remote code execution.
JavaScript or Markdown is generated by the LLM and returned to a user, resulting in XSS.

PREVENTION

Apply proper input validation on responses coming from the model to backend functions.
Encode output coming from the model back to users to mitigate undesired code interpretations.

ATTACK SCENARIOS

An application directly passes the LLM-generated response into an internal function responsible for executing system commands without proper validation.
A user utilizes a website summarizer tool powered by a LLM to generate a concise summary of an article, which includes a prompt injection.
An LLM allows users to craft SQL queries for a backend database through a chat-like feature.

LLM03: Training Data Poisoning

EXAMPLES

A malicious actor creates inaccurate or malicious documents targeted at a model’s training data.
The model trains using falsified information or unverified data which is reflected in output.

PREVENTION

Verify the legitimacy of targeted data sources during both the training and fine-tuning stages.
Craft different models via separate training data different use-cases.
Use strict vetting or input filters for specific training data or categories of data sources.

ATTACK SCENARIOS

Output can mislead users of the application leading to biased opinions.
A malicious user of the application may try to influence and inject toxic data into the model.
A malicious actor or competitor creates inaccurate or falsified information targeted at a model’s training data.
The vulnerability Prompt Injection could be an attack vector to this vulnerability if insufficient sanitization and filtering is performed.

LLM04: Model Denial of Service

EXAMPLES

Posing queries that lead to recurring resource usage through high volume generation of tasks in a queue.
Sending queries that are unusually resource-consuming.
Continuous input overflow: An attacker sends a stream of input to the LLM that exceeds its context window.

PREVENTION

Implement input validation and sanitization to ensure input adheres to defined limits, and cap resource use per request or step.
Enforce API rate limits to restrict the number of requests an individual user or IP address can make.
Limit the number of queued actions and the number of total actions in a system reacting to LLM responses.

ATTACK SCENARIOS

Attackers send multiple requests to a hosted model that are difficult and costly for it to process.
A piece of text on a webpage is encountered while an LLM-driven tool is collecting information to respond to a benign query.
Attackers overwhelm the LLM with input that exceeds its context window.

LLM05: Supply Chain Vulnerabilities

EXAMPLES

Using outdated third-party packages.
Fine-tuning with a vulnerable pre-trained model.
Training using poisoned crowd-sourced data.
Utilizing deprecated, unmaintained models.
Lack of visibility into the supply chain is.

PREVENTION

Vet data sources and use independently audited security systems.
Use trusted plugins tested for your requirements.
Apply MLOps (Machine Learning Operations) best practices for own models.
Use model and code signing for external models.
Implement monitoring for vulnerabilities and maintain a patching policy.
Regularly review supplier security and access.

ATTACK SCENARIOS

Attackers exploit a vulnerable Python library.
Attacker tricks developers via a compromised PyPi package.
Publicly available models are poisoned to spread misinformation.
A compromised supplier employee steals IP.
An LLM operator changes T&Cs to misuse application data.

LLM06: Sensitive Information Disclosure

EXAMPLES

Incomplete filtering of sensitive data in responses.
Overfitting or memorizing sensitive data during training.
Unintended disclosure of confidential information due to errors.

PREVENTION

Use data sanitization and scrubbing techniques.
Implement robust input validation and sanitization.
Limit access to external data sources.
Apply the rule of least privilege when training models.
Maintain a secure supply chain and strict access control.

ATTACK SCENARIOS

Legitimate user exposed to other user data via LLM.
Crafted prompts used to bypass input filters and reveal sensitive data.
Personal data leaked into the model via training data increases risk.

LLM07: Insecure Plugin Design

EXAMPLES

Plugins accepting all parameters in a single text field or raw SQL or programming statements.
Authentication without explicit authorization to a particular plugin.
Plugins treating all LLM content as user-created and performing actions without additional authorization.

PREVENTION

Enforce strict parameterized input, perform type, and range checks.
Conduct thorough inspections and tests including SAST, DAST, and IAST.
Use appropriate authentication identities and API Keys for authorization and access control.
Require manual user authorization for actions taken by sensitive plugins.

ATTACK SCENARIOS

Attackers craft requests to inject their own content with controlled domains.
Attacker exploits a plugin accepting free-form input to perform data exfiltration or privilege escalation.
Attacker stages a SQL attack via a plugin accepting SQL WHERE clauses as advanced filters.

LLM08: Excessive Agency

EXAMPLES

An LLM agent accesses unnecessary functions from a plugin.
An LLM plugin fails to filter unnecessary input instructions.
A plugin possesses unnecessary permissions on other systems.
An LLM plugin accesses downstream systems with high-privileged identity.

PREVENTION

Limit plugins/tools that LLM agents can call, and limit functions implemented in LLM plugins/tools to the minimum necessary.
Avoid open-ended functions and use plugins with granular functionality.
Require human approval for all actions and track user authorization.
Log and monitor the activity of LLM plugins/tools and downstream systems and implement rate-limiting to reduce the number of undesirable actions.

ATTACK SCENARIOS

An LLM-based personal assistant app with excessive permissions and autonomy is tricked by a malicious email into sending spam. This could be prevented by limiting functionality, permissions, requiring user approval, or implementing rate limiting.

LLM09: Overreliance

EXAMPLES

LLM provides incorrect information.
LLM generates nonsensical text.
LLM suggests insecure code.
Inadequate risk communication from LLM providers.

PREVENTION

Regular monitoring and review of LLM outputs.
Cross-check LLM output with trusted sources.
Enhance model with fine-tuning or embeddings.
Implement automatic validation mechanisms.
Break tasks into manageable subtasks.
Clearly communicate LLM risks and limitations.
Establish secure coding practices in development environments.

ATTACK SCENARIOS

AI fed misleading info leading to disinformation.
AI's code suggestions introduce security vulnerabilities.
Developer unknowingly integrates malicious package suggested by AI.

LLM10: Model Theft

EXAMPLES

Attacker gains unauthorized access to LLM model.
Disgruntled employee leaks model artifacts.
Attacker crafts inputs to collect model outputs.
Side-channel attack to extract model info.
Use of stolen models for adversarial attacks.

PREVENTION

Implement strong access controls, authentication, and monitor/audit access logs regularly.
Implement rate limiting of API calls.
Watermarking framework in LLM lifecycle.
Automate MLOps deployment with governnce.

ATTACK SCENARIOS

Unauthorized access to LLM repository for data theft.
Leaked model artifacts by disgruntled employee.
Creation of a shadow model through API queries.
Data leaks due to supply-chain control failure.
Side-channel attack to retrieve model information.

Conclusion

It is important to realize this is not the FULL list of attacks vectors within the LLM/ AI space in relation to Cybersecurity.
Remember the organizations charter:

The OWASP Top 10 for LLM Applications Working Group is dedicated to developing a Top 10 list of vulnerabilities applicable to applications leveraging Large Language Models (LLMs). This initiative aligns with the broader goals of the OWASP Foundation to foster a more secure cyberspace and is in line with the overarching intention behind all OWASP Top 10 lists. - OWASP

They are only providing the top 10 most critical vulnerabilities, overall, without your organization's environment. Thus, it is important to ensure you complete an internal review within the context of your environment/ organization.

If you need help, please reach out to us at info@orlabs.tech or reach out to me!