"Hallucinations" affect "reliability"! Salesforce executive says "trust in large models has declined" and usage has already been reduced.

Executives at enterprise software giant Salesforce have admitted that confidence in large models has declined over the past year. The company is reducing its reliance on generative AI in its main AI product Agentforce, opting instead for more fundamental “deterministic” automation technologies to improve software reliability.

On Monday, The Information reported that Sanjna Parulekar, Senior Vice President of Product Marketing at Salesforce, said: "All of us were more confident in large language models a year ago." The company now uses deterministic automation based on predefined instructions in Agentforce, rather than relying entirely on the reasoning and interpretation capabilities of AI models.

This strategic adjustment aims to address technical failures such as “hallucinations” that occur when large models handle precise tasks, ensuring that critical business processes follow exactly the same steps every time. Salesforce's website now highlights Agentforce's ability to “eliminate the inherent randomness of large models.”

As one of the most valuable software companies, Salesforce’s partial retreat from large models may impact thousands of businesses using this technology; currently, Agentforce is expected to generate over $500 million in annual revenue.

Technical Reliability Challenges Drive Strategic Shift

Salesforce has encountered multiple technical challenges with large models in practical applications. Agentforce CTO Muralidhar Krishnaprasad noted that when more than eight instructions are given to a large model, it begins to omit instructions, which is not ideal for tasks requiring precise handling.

The experience of home security company Vivint confirms these issues. Vivint utilizes Agentforce to process customer support for 2.5 million clients, but has run into reliability problems. For example, despite having instructions to send customer satisfaction surveys at the end of every interaction, Agentforce sometimes fails to send the survey for uncertain reasons.

To solve such problems, Vivint worked with Salesforce to set up "deterministic triggers" within Agentforce to ensure surveys are sent every time. Using this basic form of automation not only reduces operating costs but also allows the company to offer customers lower prices.

Addressing AI "Drift" Phenomena

Salesforce executive Phil Mui described another key challenge in a blog post in October: AI “drift.” According to Mui, the company’s “most complex clients” face difficulties when using AI, “when users ask irrelevant questions, AI agents lose focus on their main objectives.”

For instance, an AI chatbot programmed to guide customers to fill out forms may “lose focus” when customers ask unrelated questions. To address this issue, Salesforce has developed the Agentforce Script system, which minimizes the “unpredictability” of large language models by identifying tasks that can be handled by agents that do not use large models.

This system is currently in the testing phase, and aims to ensure that AI agents remain focused on core tasks when faced with off-topic queries.

Adjustments and Optimization in Real-World Application

Salesforce has also adjusted its degree of usage of large models within its own operations. Although CEO Marc Benioff previously stated that Agentforce, which partly relies on OpenAI’s large models, now handles the majority of Salesforce’s customer service inquiries—enabling the company to cut about 4,000 customer service roles—the company now seems to have reduced the use of large models by its customer service agents.

For example, last week when responding to requests for assistance with Agentforce technical issues, the company provided a list of blog post links rather than asking for more information or discussing possible issues. This approach resembles the method enterprises have used for years to handle customer or website visitor queries via basic chatbots.

A Salesforce spokesperson said that this year the company has "refined topic structures, reinforced safeguards, improved retrieval quality, and adjusted responses to be more specific, more contextually relevant, and better suited to real customer needs." The spokesperson stated that more customer problems are being resolved by support agents than ever before, and the number of resolved conversations is expected to increase by 90% in the fiscal year ending in January.

This trend reflects challenges faced across the industry. Earlier this month, a chatbot backed by enterprise AI startup Sierra at Gap Inc. answered questions about adult products and Nazi Germany, highlighting the common problem of large models deviating from their intended use.

Risk Warning and DisclaimerThere are risks in the market, and investment requires caution. This article does not constitute personal investment advice, nor does it take into account the specific investment objectives, financial circumstances or needs of individual users. Users should consider whether any opinions, views or conclusions in this article are suitable for their particular situation. Investing based on this information is at your own risk.