Microsoft Refines Research Agent's Depth, Quality By Tapping Anthropic and OpenAI Models

Microsoft is increasing the power of its Researcher AI agent with features that deliver higher quality outputs — in terms of accuracy, depth, and user confidence — through the use of multiple AI models.

Researcher, which is designed to manage complex research activities, is being enhanced with a function called Critique, which uses one AI model to review and optimize the output of another AI model. The second new feature, Council, delivers multiple models’ responses side by side within the agent experience.

How Critique Works

Critique utilizes a combination of models from Anthropic and OpenAI to separate generation of outputs from evaluation of those outputs’ quality. With Critique, one model leads the generation phase, planning the task, iterating through to retrieval, and producing an initial draft, while a second model focuses on review and refinement, acting as an expert reviewer before the final report is produced.

Critique is intended to address a shortcoming that impacts many research functions that rely on AI: those functions utilize a single model to perform planning, sourcing, synthesis of content, and writing. By equally weighting evaluation and generation functions, Critique creates a feedback loop that delivers results with strong factual accuracy and analytical breadth.

Critique follows a review process similar to those used in academic and professional research. It’s built around structured reviews that focus on strengthening the report without turning the reviewer into a second author. The reviewer model scrutinizes the report from several angles and then generates an enhanced report, optimizing for:

Source Reliability — The reviewer emphasizes the use of reputable, authoritative, and domain‑appropriate sources, assigning priority to evidence that is verifiable and suitable for the context
Completeness — The reviewer evaluates if the final report satisfies the intent of the person’s request comprehensively with relevant and unique insights
Evidence Grounding Controls — The reviewer requires every key claim to be supported by reliable sources with precise citations—strengthening factual accuracy, reliability, and trust in the final output

Using an industry research benchmark, Microsoft said Researcher enhanced with Critique yielded a 32% improvement in breadth and depth of analysis and a 46% improvement in presentation quality when compared to Researcher with a single model.

Microsoft said these results are possible because Critique pushes to identify missing analytical angles, close coverage gaps, sharpen formulations, and produce responses with stronger organization and clearer narrative flow.

Critique will be the default experience in Researcher.

How Council Works

Council produces multiple model responses and presents them side-by-side for evaluation in the Researcher experience. Council runs an Anthropic model and an OpenAI model simultaneously, with each model producing a complete, standalone report—surfacing facts, citations, and analysis that the other may overlook or weigh differently.

Once both reports are generated, an AI model dedicated to judging evaluates them to create a summary of key findings and highlights where the models agree or diverge – including differences in magnitude, framing, or interpretation – and identifies unique contributions from each model. 

The feature is available when Council is selected in the Researcher model picker.

Microsoft Refines Research Agent’s Depth, Quality By Tapping Anthropic and OpenAI Models

Tom Smith

Areas of Expertise

Combat Fear of AI With Open-Mindedness and Curiosity About the Tech’s Potential

Meta’s AI Org Structure Puts Emphasis on Speed and Decentralized Decisions

AI Success Requires Human Expertise; Customers Need Partner Help to Manage Complexity

Microsoft and Third-Party Agents Build Out Security Copilot Ecosystem