How to Test Agentforce Agents Using Testing Center?

Table of Content

Author

Vikas Deep
Vikas Deep

Date

Vikas Deep
Jul 2, 2026

How to Test Agentforce Agents Using Testing Center?

Many Agentforce teams run a few checks in Agent Builder, feel confident, and move the agent to production.

Then the edge cases appear: the wrong topic gets selected, an action does not run, or the response misses the user’s request. Finding these issues after launch is costly and quickly damages trust.

Agentforce Testing Center helps catch them earlier. You can upload test scenarios, run them in bulk, and review clear pass-or-fail results for topic selection, action execution, and response quality.

This guide explains how Testing Center works, how to create test cases that uncover real failures, and how to add testing to your release process, from initial setup to production deployment.

What is Agentforce Testing Center?

Agentforce Testing Center is a batch testing environment built into Salesforce Setup. You supply a set of test cases, each pairing a user utterance with expected outcomes, and the Testing Center runs them all in parallel against your agent. 

It returns pass/fail results across three dimensions: topic selection, action execution, and response quality.

It is not a monitoring system. It does not watch production traffic or flag live failures. It runs in a sandbox only. Every test executes in a non-production environment, which means agent interactions can modify CRM data without affecting live records. 

That constraint is intentional,  and important to plan around before you build your test suite.

How to Access Agentforce Testing Center? 

Agentforce Testing Center is available through Salesforce Setup:

Setup > Einstein > Einstein Generative AI > Agent Studio > Testing Center

Alternatively, open any agent in Agent Builder and click Batch Test to reach the Testing Center directly from the agent you are working on. The below are the required access: 

  • Lightning Experience,
  • Enterprise, Performance, Unlimited, or Developer edition,
  • Agentforce licence: no separate add-on required,
  • Sandbox or non-production org: production is not supported,
  • Admin or developer permissions with Agentforce access.

Test Case Structure

Each test case contains six fields. Only the utterance is required, but at least one other field must be populated — empty values are treated as test failures.

Field What to Enter Notes
Utterance The user input you want to test Required. At least one other field must also be filled.
Expected Topic Topic API name (e.g. BillingIssues) Must be the API name, not the label.
Expected Actions Actions the agent should execute (e.g. QueryRecords) Multiple actions per case are supported.
Expected Response Description of the intended agent output Plain language — not exact text matching.
Expected Subagent Which subagent should handle the utterance Optional. Only relevant for multi-subagent setups.
Conversation History Prior agent and user messages for multi-turn tests The last message in history must be from the agent, not the user.

One important detail on topic API names: using the topic label instead of the API name is one of the most common reasons test cases fail incorrectly. Check the API name in Agent Builder before building your CSV.

Running Tests in Testing Center

The Testing Center accepts test cases in two ways.

Option 1: Upload a CSV

Download Salesforce's test case template from Testing Center, populate it with your utterance-outcome pairs, then upload the file, name the test, select the agent, and run. This is the standard method for structured test suites built by a developer or QA team.

Option 2: AI-Generated Test Cases

Click Generate Test Cases in the Testing Center. Provide a test name, select the agent, and write a plain description of the scenarios to cover, for example, 'Test customer account lookup queries with different phrasings.' Salesforce generates utterances automatically and produces a downloadable CSV.

AI-generated cases give you a fast starting point. They do not replace deliberate edge-case and negative-case design. Salesforce generates test cases via AI and loads them on the screen, the downloadable CSV is one option that comes after the test suite is loaded in the system. So the correct point would be to review the AI generated test suite and once they have the status ready to run, click Run test suite.

Reading Test Results

Summary metrics display at the top of the results view: total duration, Topic Pass %, Action Pass %, and Response Pass %. These three percentages are your primary health indicators.

Column What It Shows What Failure Means
Topic Test Result Whether the agent selected the expected topic Topic definition, scope, or classification wording needs adjustment
Action Test Result Whether the agent executed the expected actions Action order, conditions, or instructions are misconfigured
Outcome Test Result Whether the actual response matched the expected description Response instructions, output format, or grounding is off

You can filter results by All, Passed, or Failed. Download the full results as CSV for sharing or tracking progress across test runs.

On non-deterministic results: LLM-based agents do not always produce the same output for the same input. The same utterance can route to different topics on different runs. This is not a Testing Center bug, it is a characteristic of how large language models work. If you see the same test case passing and failing inconsistently, the agent's topic instructions are not specific enough. That is where to fix it, not in the test case.

Multi-Turn Conversation Testing

Testing Center supports conversation history, which lets you validate agent behaviour across a sequence of exchanges, not just single isolated messages. This matters for any agent that maintains context across a conversation: a service agent that needs account verification before taking action, or a sales agent that gathers qualification data across several turns.

To set up multi-turn tests: include prior messages in the Conversation History column, alternating between user and agent messages. The last message in history must always come from the agent. Each user message in the sequence is evaluated independently against its expected response, with the full preceding context carried forward.

Test-Driven Agent Development: The Workflow

A disciplined test cycle has five steps. Running them in order prevents the most common mistake, promoting an agent that passed happy-path tests but fails on realistic edge cases.

  • Create test cases: Write or generate utterances covering normal usage, edge cases, and negative cases. Define expected topics and actions for each. Do not skip negative cases, testing that the agent correctly declines or redirects is as important as testing that it succeeds.
  • Run the initial test suite: Upload the CSV to the Testing Center, execute, and review failures. Do not adjust the agent before you have a baseline result.
  • Refine the agent: Open failing utterances in Agent Builder. Adjust topic definitions, guardrails, or action instructions. Adjust Expected Response descriptions if they were imprecise, not only when the agent fails.
  • Retest: Rerun the suite after each round of changes. Verify that fixes resolve failures without introducing new ones.
  • Iterate: Add new test cases as you identify gaps. Before production deployment, all critical test paths should be passed consistently across multiple runs.

Constraints to Know Before You Start

1. Sandbox Only

Agentforce Testing Center runs exclusively in non-production environments. This is not a configuration choice, you cannot run tests against production agents. Tests modify CRM data during execution, which is why production access is blocked. Build sandbox parity with production agent configuration before running tests.

2. Credit Consumption

Each test case run consumes Agentforce credits. Large suites of hundreds of cases add up. Monitor consumption through Digital Wallet in Setup, and prioritise test cases that cover high-risk scenarios rather than building exhaustive suites indiscriminately.

3. CSV Validation

Incorrect topic API names or action names cause failures that look like agent problems but are actually test case problems. Before running any suite, verify that every API name in your CSV matches exactly what is configured in Agent Builder. Use Setup > Agent Studio to cross-check.

Example: Service Agent Validation

A customer support team builds a service agent handling account, billing, and subscription requests. They create a CSV with 50 utterances across those three topic areas, including paraphrase variants and edge cases.

Initial results:

  • Topic Pass %: 96%
  • Action Pass %: 92%
  • Response Pass %: 88%

Two failure patterns emerge: utterances phrased as 'unsubscribe' route to the wrong topic instead of Subscription, and balance queries consistently miss the RetrieveAccount action in the expected sequence. 

The team adjusts the Subscription topic's classification description and adds the missing action to the balance query flow. A second test run confirms both issues are resolved. The agent goes to production with documented pass rates rather than anecdotal confidence.

That last point matters. Documented pass rates give the deployment decision a defensible basis. Anecdotal confidence does not.

Testing Center in the Agent Lifecycle

Testing Center validates behaviour before deployment. It is not a substitute for post-deployment monitoring.

After production release, Agentforce Analytics tracks live topic selection accuracy and response patterns. Utterance Analysis shows how the agent handles specific real-world inputs. 

Production feedback from these tools feeds your next test cycle, you add new test cases based on observed failure modes, run them in a sandbox, refine, and redeploy.

The Testing Center is the pre-flight check. Production observability is the ongoing instrument panel. Both are required.

Conclusion

Getting your agent into production is one milestone. Getting it to behave reliably in production is the real work.

The Testing Center gives you the structure to validate that behaviour before users encounter it. The teams that use it well are not the ones with the most test cases, they are the ones who treat testing as part of the build process, not a final check before go-live.

If you are building or deploying an Agentforce agent and want a review of your test approach before go-live, MIDCAI’s AI implementation services covers agent architecture, topic and action design, and test-driven deployment processes.

Talk to a MIDCAI Agentforce specialist about your deployment and testing setup.

No items found.

About the Author

Vikas Deep

5+ years of experience delivering Salesforce solutions across Marketing Cloud, Sales Cloud, Data Cloud, Agentforce, and Marketing Cloud Intelligence. At MIDCAI, I help shape functional approaches that simplify complex requirements and support stronger business outcomes.

Similar Blogs

Ready to future-proof your business?

Get in touch with us for any enquiries and questions

Get in touch

Define your goals and identify areas where technology can add value to your business

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Join minds that move technology

We are looking for passionate people to join us on our mission.

Let’s build what’s next

where your skills fuel innovation and your growth powers ours

Salesforce Technical Lead
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Let’s work through it together.

CRM services that bring your data, teams, and

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.