Securing Your AI Applications: Navigating Contemporary Threats and Solutions
The remarkable evolution of Artificial Intelligence (AI) has become a cornerstone of tech innovation. From Waymo's self-driving triumphs to Google DeepMind's historic victory over Go world champion Lee Sedol, the AI landscape has radically transformed. However, earlier, these feats were the exclusive domain of tech behemoths with deep pockets and unparalleled resources. The narrative shifted when OpenAI introduced ChatGPT, amassing a million users within a mere five days! ChatGPT, a stellar example of Large Language Models (LLMs), opened the doors to Generative AI tools, capable of creating text, images, and videos akin to their creators. This democratization catalyzed a wave of disruptions across sectors like Marketing (Jasper, Bertha, Copy.ai), Sales (Gong), CRM (Lavender,Outplay), and Software Development (Github, Replit) using GenAI. Mckinsey predicts a substantial contribution of $2.6 – 4.4 trillion to the global economy through Generative AI, ushering in a transformative era analogous to the impacts of software and cloud technologies. A Morgan Stanley report underscores that two-thirds of CIOs are poised for AI investments in 2023.
Yet, the burgeoning AI risks have left security experts on edge, and several companies like Samsung and Apple banned ChatGPT usage. Data breach at OpenAI increased security concerns with the existing security tools playing catch-up.
In this post, we’ll delve into the architectural blueprint and security architecture of AI applications, highlight the top AI risks and threats, and furnish pragmatic solutions along with a wealth of resources to mitigate them.
Enterprise AI Application Architecture
LLMs are trained on a plethora of data in a non-continuous, offline setup. For instance, ChatGPT 3.5's training ceased with the data available until September 2021, rendering it incapable of referencing events post that timeframe or new data beyond its training sets like enterprise non-public information. To bridge this gap, the Retrieval Augmented Generation (RAG) design pattern is gaining traction. RAG was invented by Meta and leveraged by leading companies like Microsoft and Salesforce to build Office365, Github co-pilots, and Einstein GPT, respectively. It furnishes LLMs with use-case-specific information not present in their training data as part of the input (Figure 1). Documents are chunked and converted to a vector (array) of numbers, called embeddings, where each dimension corresponds to a feature or attribute of the language. Then, a retrieval system performs keyword and semantic searches, using embeddings, within an enterprise knowledge base to identify relevant documents. These documents, along with user queries, are then forwarded to the LLM, which processes this information to generate a contextually informed response.
Secure AI application Architecture
Three important components are needed to secure AI applications: input validation, output validation, and Secure Retrieval. Input validation happens on the user query (input), and the AI response (output) from LLM goes through output validation. Secure Retrieval enforces Access control of document retrieval from the knowledge base using the user identity and ensures that AI response matches factual inputs from the knowledge base. This secure architecture compromising of these three components on top of the RAG architecture (Figure 1) is called SecureRAG (Figure 2). There are several open-source and commercial solutions providing frameworks for implementing RAG, but we have not found any solutions that implement the three security components of SecureRAG: input validation, output validation, and secure retrieval.
In the following sections, we will discuss the top AI application threats and how to mitigate these threats with SecureRAG (Figure 2),
Threats & Mitigations
Prompt injections
Similar to a SQL injection, a nefarious tactic used in traditional software systems, prompt injections are executed by threat actors crafting malicious user queries to deceive models into unintended actions, such as spawning abusive content or disclosing sensitive data. Some known instances include the attacks published by Carnegie-Mellon and Dropbox.
To mitigate, Implement the following security checks (input validation) on user input (Figure 3):
- Filtering: Employ heuristics and guardrails to filter out malicious input.
- Canonicalization, Text Encoding: Rephrase user queries through an AI model to transform malicious input into a safe format.
Allowlist & Blocklist: Utilize AI models to identify query intent and align user queries with the application domain and permitted actions, akin to security allowlist techniques, and block known malicious inputs.
A variety of open-source projects like Rebuff from ProtectAI, NVidia's NeMO-Guardrails, and, Llm-Guard, along with enterprises like Cadea.AI are developing commercial solutions to counteract prompt injections.
Malicious LLM output
Due to their training on untrustworthy internet data, LLMs might generate harmful outputs leading to SQL injection, phishing content, Cross-site scripting (XSS), CSRF, and other malicious output.
To mitigate, Implement the following security checks (input validation) on LLM/AI response (Figure 4):
- Output Validation: Treat LLM output as untrusted and ensure that you output validation pipeline in place. We recommend doing malicious content removal and output canonicalization & encoding.
As output validation leverages resources except Rebuff are applicable to output validation as well.
Unauthorized Data Access
LLMs, unlike SQL databases, lack the granularity in provisioning distinct access rights to different users or groups. Granular access rights in an SQL database ensure that sensitive information is only accessible to authorized personnel.
On the other hand, LLMs typically operate on an all-or-nothing access principle, making it challenging to implement nuanced access controls. This is due to their inherent architecture, which processes the entire training data set without the capability to distinguish between different data segments or user permissions at inference time. This distinction highlights a significant security challenge, as it could potentially lead to unauthorized access to sensitive or critical information within the LLM's knowledge base, especially in scenarios where diverse user roles require differentiated access permissions. The absence of such granular access control in LLMs remains an active blocker for many organizations
In addition, If you are using enterprise data with third-party LLMs, you are also susceptible to third-party LLM providers using your data to train their models and potentially leaking it to their LLM users.
To mitigate:
- Access Control & Prompt Engineering: Exclude sensitive data from training datasets and enforce access control using the secure retrieval framework at inference time (Figure 2) and instruct LLMs only to utilize input-provided information. A similar mechanism can be used to segregate one tenant’s data from other tenants.
- Enterprise LLMs: Utilize enterprise-grade versions of third-party LLM services such as OpenAI’s ChatGPT Enterprise, Azure OpenAI Service, AWS BedRock, and Google Vertex AI for additional security and privacy assurances, including guarantees against the utilization of your data for model training purposes.
Cadea
Cadea’s security platform provides end-to-end security solutions for AI applications. It effectively mitigates a range of outlined threats like prompt injection, content safety, and more by employing guardrails on inputs and outputs. Our SecureRAG solution easily melds with existing identity management systems like Okta, OneLogin, and Azure AD, inheriting Access Control Policies. Additionally, SecureRAG connects to various data stores like Google Drive, Jira, and Wiki through 300+ connectors, enhancing secure and efficient AI development.
Cadea is dedicated to reinforcing your AI security stance against major threats. For more information or to schedule a demo, visit Cadea or contact us at info@cadea.ai.