AI Research

脂肪肝吃什么食物

Divyansh Agarwal

Ben Risher

1 additional author

March 4, 2025 6 min read

AI-powered solutions like Salesforce CRM are revolutionizing customer engagement, streamlining workflows, and providing deeper insights into customer needs. However, with the rise of large language models (LLMs), new security challenges have emerged. One significant threat is prompt injection attacks, which attempt to manipulate AI systems through carefully crafted inputs. As Salesforce integrates AI into its CRM tools, understanding and protecting against these vulnerabilities is essential for safeguarding data, reputation, and customers.

Failing to address emerging threats, such as prompt injection, could result in data breaches, compromised system integrity, and erosion of customer trust. It is crucial for organizations to proactively implement robust security measures. This blog details the AI Research team’s work on developing and implementing reliable solutions to protect Salesforce applications against prompt injection attacks. Our goal is to ensure the ongoing safety and effectiveness of our AI-enhanced CRM tools.

What is Prompt Injection?

In AI systems, a “prompt” refers to instructions given to an AI application in order to perform a specific task. The LLMs powering Salesforce’s AI applications use prompts and other inputs provided by our users to generate responses. The system returns these responses to the user. The generative nature of LLMs makes them susceptible to carefully crafted prompt engineering attacks. A prompt injection attack refers to a malicious prompt designed to elicit unintended information or fraudulent actions from an LLM. Prompt injection attacks exploit an LLM’s instruction following ability and may trick them into bypassing security policies, disclosing sensitive data, or producing harmful content. Recently, Copilot for Microsoft 365 was shown to be vulnerable to prompt injection attempts. Similarly, bad actors can design prompts with malicious intent that may seek to exploit Salesforce’s AI applications for similarly nefarious purposes.

At Salesforce, trust is our #1 value. We design AI applications with trust at their core, that our customers can safely use. The Salesforce AI Research team builds models and detectors to identify prompts that may be adversarial in nature. With the advent of agentic workflows, and LLMs having access to a plethora of tools, datasets etc., detecting and deflecting prompt injection attempts is of vital importance.

Safeguarding Salesforce AI Against Prompt Injection

In order to safeguard Salesforce and customer assets from prompt injection attempts, we explored different research paths. One possible intervention is to develop a system capable of analyzing user prompts and assessing their safety. To this end, the AI research team develops classifiers and heuristic methods. These methods identify malicious intent in prompts with high accuracy. The following section outlines steps taken to design, build, and evaluate such a system.

Design: Creating a Taxonomy

Before we could begin training a reliable prompt injection detection model, we had to design its taxonomy. A thoughtful taxonomy is essential for any machine learning classifier. Developing models to detect prompt injection attempts is an iterative process, and a well-structured taxonomy allows us to reliably evaluate (and improve) the performance on specific inputs. The table below showcases the seven prompt injection variants that are relevant to the CRM threat model.

Type	Description
Pretending/ Role-play	Instructing the LLM/agent to assume the role of a different “system persona” with malicious intent. Social engineering attacks such as deceiving the system with adversarial conversational content
Privilege Escalation/ Attempts to change core system rules	Injecting malicious instructions that aim to bypass/change existing system policies and the LLM safety training. E.g. Do Anything Now (DAN) jailbreak attacks
Prompt Leakage	Prompts intending to leak sensitive information from the LLM prompt such as the system policies and contextual knowledge documents. This is for the purpose of active reconnaissance
Adversarial Suffix	A set of seemingly random character encodings appended to a prompt. It is designed to circumvent guardrails and alignme
Privacy Attacks	Prompts that attempt to extract, infer, or expose personal or confidential data. This is with the aim of unauthorized access or misus
Malicious Code Generation	Prompts attempting to generate malicious code outputs from an LLM. E.g. creating malware, viruses, fraud utilities etc.

With a taxonomy in hand, we were able to begin training our classifier, which is discussed in the next section. Developing this taxonomy is an iterative process performed by the AI research team in collaboration with Salesforce security, product and ethics teams.

Build: Gathering Data

After carefully defining the above taxonomy, we procured high-quality data to train and benchmark our injection detector. It was important that we curated the data points which supported our proposed taxonomy. We use a mix of open source datasets published by the community on prompt injection scenarios and jailbreak attempts, along with other CRM-related prompts.

We worked cross-functionally with an internal annotation team as well as the Office of Ethical and Humane Use (OEHU) to ensure reliable, relevant, and labeled training data. OEHU continually provided expert assistance and clarification throughout the data labeling process. Simultaneously, the legal team helped us ensure that we only use permissible datasets for classifier training. This collaboration was crucial in aligning our model with Salesforce’s commitment to trust and safety.

Augmenting open-source datasets that have limited data samples for one or more target categories is crucial to training a dependable classifier. In addition to human annotation, we utilized synthetic data to bolster our training datasets. When faced with such categorical short-comings, we turned to our in-house synthetic data generation pipelines. The resultant pipelines leveraged techniques such as zero-shot and few-shot LLM prompting, LLM self-correction of labels, and LLM content editing to inject harmful content in safe texts (a.k.a., data “mutation”). The combination of synthetic data generation techniques, coupled with human annotation allowed us to create diverse training data that is well-balanced across different classes in taxonomy, has control over subtle differences between safe and unsafe content, and is tailored to various CRM use cases.

Evaluation: Implementing a Feedback Loop

Our iterative training process consisted of a feedback loop with four phases: training, testing, red teaming, and (re)evaluation. The goal was to cycle through these phases as often as needed to develop a model that met our performance expectations.

After each round of training, we benchmark the model’s performance on a variety of test sets according to our taxonomy. Following initial testing, we red teamed our model checkpoints, simulating attacks and stress-testing the models by introducing challenging inputs. We utilized our internal automated red teaming library, fuzzai, to build our red teaming suite.

The final phase, evaluation, combined results from testing and red teaming to analyze the collective outcomes. This analysis, particularly of the red teaming results, helped us identify potential weaknesses in our model, for improvement in the next round of our feedback loop.

We utilize this process to build multiple iterations of our prompt injection detection model, as well as other detectors deployed to Salesforce’s Trust Layer. The prompt injection model assigns probability scores to user prompts along with the labels. This allows an intervention before sending them to an Agent or LLM for execution.

Conclusion: Enhancing Security for a Safer AI-Powered CRM

Prompt injection attacks highlight the importance of ongoing security monitoring for AI-powered CRM systems. By leveraging Salesforce’s robust defense mechanisms and staying informed about emerging threats, you can help ensure that your CRM is protected against the evolving landscape of AI vulnerabilities. We continually evaluate our prompt injection detection classifier against open source detector, external LLMs, and other third-party solutions. Embrace AI with confidence—knowing that your Salesforce CRM defends against prompt injection and other security risks.

With these protections in place, Salesforce customers can continue to benefit from the powerful capabilities of AI while keeping sensitive information secure.

Acknowledgments

Yixin Mao, Vera Vetter, Jason Wu

Explore more

Salesforce AI Website: www.salesforceairesearch.com
Follow us on Twitter: @SFResearch, @Salesforce

Toxicity Detection in Salesforce CRM: Keeping Customer Interactions Safe and Trustworthy

4 min read

xGen-small: Enterprise-ready Small Language Models

10 min read

Divyansh Agarwal Senior Research Engineer

Divyansh Agarwal's research focuses on building LLMs for synthetic data generation, trust and safety in AI systems, and factuality in summarization. He has co-authored more than 20 publications at top AI conferences like EMNLP, NAACL, ICWSM etc., including 1st author contributions, and has 5+ Read More

More by Divyansh

Ben Risher Offensive Security Engineer Lead

I am an Offensive Security Engineer and lead ExploitAI, a small team of multi-disciplined engineers operating at the intersection of Security, Artificial Intelligence, and Machine Learning. My team and I use our combined skills to assess Salesforce's models, products, and features to identify and Read More

More by Ben

Denise Pérez Senior Product Marketing Manager

I am an AI storyteller and thought leader at Salesforce AI Research, where I shape the narrative on what’s next in AI. I help define how tomorrow’s AI is understood today. Since 2021, I’ve been bridging cutting-edge research with real-world impact—translating complex breakthroughs into Read More

More by Denise

xLAM Enters Its Next Era: The Evolution of Large Action Models

6 min read

Illustration of a person at desk with their laptop shown with a privacy, security lock.

SFR-Guard: Ensuring LLM Safety and Integrity in CRM Applications

12 min read

How Enterprise General Intelligence (EGI) Will Form a New Business Imperative

11 min read

Celebrating Juan Carlos Niebles: Colombia’s Top 100 in AI

3 min read

Illustration of three workers at a table discussing the data visualizations on the wall, all on a dark purple background.

Introducing Text2Data: A Low-Resource, Text-to-Anything AI for Data Generation

5 min read

Does Context Matter? Introducing ContextualJudgeBench for RAG and Summarization Evaluation

6 min read

A screen shows various ways how AI in Customer Success enhances human collaboration, drives greater efficiency, engagement, and customer satisfaction.

How Well Do AI Models Understand You? PersonaBench Puts Them to the Test

8 min read

Jagged Intelligence in the Enterprise

13 min read

Get the latest articles in your inbox.

Enter a valid e-mail address

Select your Country

Select a state/province

Yes, I would like to receive the Salesforce 360 Highlights newsletter as well as marketing emails regarding Salesforce products, services, and events. I can unsubscribe at any time.

I agree to the Privacy Statement and to the handling of my personal information. In particular, I consent to the transfer of my personal information to other countries, including the United States, for the purpose of hosting and processing the information as set forth in the Privacy Statement. Learn More

I understand that these countries may not have the same data protection laws as the country from which I provide my personal information. For more information, click here.

Please read and agree to the Master Subscription Agreement

By registering, you confirm that you agree to the processing of your personal data by Salesforce as described in the Privacy Statement.

New to Salesforce?

About Salesforce

白带清洁度lll度是什么意思	脚底褪皮是什么原因	pci是什么意思	不为良相便为良医是什么意思	什么食物养胃又治胃病
daddy是什么意思	胃炎可以吃什么水果	局部癌变是什么意思	新生儿拉肚子是什么原因引起的	屏保什么意思
肝肿瘤不能吃什么	组织是什么意思	alt医学上是什么意思	感冒是挂什么科	身上长小肉揪是什么原因
handmade是什么牌子	口腔溃疡吃什么食物	肾功能不全有什么症状	人格是什么意思	吃完饭就想睡觉是什么原因

孕早期吃什么水果好hcv8jop4ns9r.cn	晚上睡觉尿多是什么原因hcv8jop9ns9r.cn	早上八点到九点属于什么时辰hcv9jop3ns3r.cn	轻断食什么意思hcv8jop0ns5r.cn	欧金金什么意思hcv7jop9ns5r.cn
脑供血不足什么原因hcv9jop1ns8r.cn	刘欢属什么生肖hcv7jop6ns6r.cn	天地不仁以万物为刍狗是什么意思hcv9jop1ns5r.cn	什么是高脂肪食物hcv9jop4ns6r.cn	回盲瓣呈唇形什么意思hcv9jop2ns0r.cn
头发没有光泽是什么原因hcv9jop7ns0r.cn	皮肤黑穿什么颜色的衣服显白hcv9jop2ns2r.cn	异常灌注是什么意思hcv8jop4ns7r.cn	上火吃什么最快能降火bjcbxg.com	破伤风伤口有什么症状hcv9jop7ns1r.cn
1991年属羊是什么命hcv8jop9ns8r.cn	带银饰有什么好处hcv9jop2ns0r.cn	咳嗽吐黄痰是什么原因hcv7jop4ns6r.cn	生源地是什么意思hcv8jop6ns2r.cn	芥酸对身体有什么危害hcv8jop5ns3r.cn

脂肪肝吃什么食物

Divyansh Agarwal

Ben Risher

1 additional author

What is Prompt Injection?