AI Agent & Copilot
  • Home
  • Exclusives
  • Podcast
  • Microsoft Analysis
  • Reports
  • Events
    • 2026 Event
    • 2025 Event Videos
  • Tech Analysts
  • Summit NA
  • Partner Executive Summit
  • AI Agent & Copilot Summit
AI Agent & Copilot
  • Exclusives
  • Podcast
  • Microsoft Analysis
  • Reports
  • Events
    • 2026 Event
    • 2025 Event Videos
  • Tech Analysts
  • Login / Join

    A confirmation code will be emailed when setting up your account or resetting your password—check spam if needed.no-reply@dynamicscommunities.com

AI Agent & Copilot
  • Login / Join

    A confirmation code will be emailed when setting up your account or resetting your password—check spam if needed.no-reply@dynamicscommunities.com

Home » Copilot Studio Tools Upgraded to Bring AI Tests Into Alignment With Human Evaluations
AI and Copilots

Copilot Studio Tools Upgraded to Bring AI Tests Into Alignment With Human Evaluations

Tom SmithBy Tom SmithFebruary 5, 2026Updated:February 5, 20264 Mins Read
Facebook Twitter LinkedIn Email
Share
Facebook Twitter LinkedIn Email

Microsoft this week filled in details on enhancements to the PowerCAT Copilot Studio Kit, which augments Copilot Studio to help develop, govern, and test custom AI agents.

Tops among the new features is Rubrics Refinement, which is used to create, test, and improve evaluation standards, or rubrics, that measure the quality of AI agents and the responses they generate.

Rubrics Refinement helps ensure that AI grading of an agent’s responses aligns with human judgment and therefore measures up to organizational quality standards. It helps address the growing need among enterprises for AI agents built with robust quality, testing, and validation, in line with traditional enterprise software standards.

Other new features in PowerCAT Copilot Studio Kit include a series of enhancements that similarly focus on measuring the quality of agents and their outputs, as well as broader agent governance.

Rubrics: How They’re Used, How They’re Advancing

In an AI agent context, a rubric is a set of natural-language grading instructions used by an AI judge to evaluate agent response quality through a description of quality responses and grades.

An AI judge — in the form of an LLM — produces not only a grade but also a rationale explaining its assessment. A human also grades the responses, and the customers know the rubric is working as intended when the AI grade and human grade are in alignment. If they’re not, the rubric needs improvement. AI judges’ grading quality depends on the rubric’s quality.

In the absence of a systematic evaluation criteria, organizations are challenged to define standards, compare grades, and identify where rubric instructions may need improvement.

That’s where Rubrics Refinement comes in, with the goal of maximizing alignment between AI and human evaluations.

It does so through:

  • Reusable evaluation standards, which define rubrics once and reuse them across agents and tests
  • Alignment with human judgment in order to systematically minimize misalignment between AI and human graders
  • Quality assurance that establishes durable assets in the form of organizational quality standards
  • Confidence in AI evaluation, building trust through transparent, iterative refinement

Rubrics Refinement involves these steps:

  1. Defining a rubric and evaluation criteria
  2. Running tests using the rubric to generate AI grades
  3. Reviewing agent responses and providing human grades
  4. Comparing AI and human assessments to identify misalignment

Following these steps, users refine and repeat as needed.

Response optimization — the actual steps to improve the quality of an agent’s answers — takes place in Copilot Studio itself. Rubrics Refinement focuses purely on ensuring the organization’s evaluation criteria accurately reflect human judgment so that automated grading results can be trusted.

Microsoft noted that rubrics are used for two distinct “levels”: the test case level, for test automation with custom grading, and the test run level, for iteratively refining and improving the rubric. Rubrics Refinement supports both levels.

The company said Rubrics Refinement is designed for use by quality assurance teams, agent builders, enterprises, and anyone seeking trustworthy AI evaluations.

Additional Agent Quality Features

Several additional features in the PowerCAT Copilot Studio Kit address agent quality, response quality, and governance. They include:

Compliance Hub, which is used to define and enforce governance policies for Copilot Studio agents while it continuously evaluates agent configurations against risk thresholds and automatically creates compliance cases when violations are detected

Conversation KPIs, which are designed to track and analyze the performance of custom agents, simplifying understanding conversation outcomes by providing aggregated data in Dataverse, rather than requiring analysis of complex conversation transcripts.

Agent Inventory, which provides a tenant-wide view into all Copilot Studio custom agents, including features those agents are using, knowledge sources, and more. Agent Inventory ships with a dashboard and the data it captures can be exported for use in other applications.

Conversation Analyzer, which allows users to review the conversations of their custom agents using custom prompts to get additional insights.

Future releases of the Copilot Studio Kit, the company said, will include enhanced diagnostics and analytics, and governance features for approvals, lifecycle management, and publishing.

More on AI Agent Testing and Governance:

  • Microsoft Advances Copilot With Automated Agent Test Tools, Built-In Collaboration in Teams App
  • Microsoft Fills in Agent 365 Management and Governance Details

AI Agent & Copilot Summit is an AI-first event to define opportunities, impact, and outcomes with Microsoft Copilot and agents. Building on its 2025 success, the 2026 event takes place March 17-19 in San Diego. Get more details.

ai ai agent Cloud Wars Microsoft Analysis copilot featured governance Microsoft Microsoft 365 security
Share. Facebook Twitter LinkedIn Email
Analystuser

Tom Smith

Analyst
Cloud Wars, Agent and Copilot

Areas of Expertise
  • LinkedIn

  Contact Tom Smith ...

Related Posts

Microsoft Unveils Agentic AI Push Across D365, Power Platform and M365 Copilot in 2026 Release Wave 1

April 3, 2026

Microsoft 365 Copilot Updates Advance Governance, Productivity App Automations

April 3, 2026

Microsoft Links ERP Success to AI with Business Central Insights

April 3, 2026

Reducing Fraud with AI Agents in Accounts Payable

April 2, 2026
Add A Comment

Comments are closed.

Community Summit NA 2026

Gaylord Opryland Resort
      Nashville, TN
October 11–15, 2026

The largest independent Microsoft AI & Business Applications User Conference on the planet. Four days of big ideas, education, training, networking and more to define your relevance in the AI era. Register Now

Recent Posts
  • Microsoft Unveils Agentic AI Push Across D365, Power Platform and M365 Copilot in 2026 Release Wave 1
  • Microsoft 365 Copilot Updates Advance Governance, Productivity App Automations
  • Microsoft Links ERP Success to AI with Business Central Insights
  • Reducing Fraud with AI Agents in Accounts Payable
  • Microsoft Refines Research Agent’s Depth, Quality By Tapping Anthropic and OpenAI Models

  • Newsletter
  • Event Sessions
  • AI Reports
  • Exclusive Interviews

Join Today

Advertisement
AI Agent & Copilot
  • Home
  • Privacy Policy
  • Contact Us
  • AI Agent & Copilot Summit
© 2026 AI Agent and Copilot

Type above and press Enter to search. Press Esc to cancel.