OpenRouter Introduces Fusion To Combine Multiple AI Models For Stronger Deep Research Performance

OpenRouter introduced Fusion, a new tool designed to synthesize outputs from multiple AI models and produce stronger responses than any individual model can typically generate.

The company said Fusion enables users and developers to select a panel of participant models, along with a judge model that reviews the outputs, identifies agreements and disagreements, and helps generate a final answer grounded in the combined analysis.

OpenRouter said its testing showed that panels of models consistently outperformed individual models on deep research tasks. The company also said frontier model panels can achieve beyond-frontier performance, while lower-cost budget model panels can outperform some frontier models and approach the performance of higher-end systems at a lower cost.

Fusion is available through OpenRouter’s chat interface and via the company’s API. Developers can call Fusion directly using the OpenRouter Fusion model slug, customize a model panel through plugins, or add Fusion as a server tool so a model can decide when to use multiple perspectives for a more complex task.

To evaluate Fusion, OpenRouter used 100 deep research tasks from the DRACO benchmark, which tests reasoning, tool use, knowledge, citation quality, and the ability to synthesize complex information. The benchmark spans domains such as academic research, finance, law, medicine, technology, UX design, general knowledge, personalized assistance, and product comparisons.

In OpenRouter’s testing, a fused panel consisting of Fable 5 and GPT-5.5, synthesized by Opus 4.8, scored 69.0%, outperforming every individual model tested. Fable 5 alone scored 65.3%, while GPT-5.5 scored 60.0% and Opus 4.8 scored 58.8%.

OpenRouter also said a budget panel consisting of Gemini 3 Flash, Kimi K2.6, and DeepSeek V4 Pro scored 64.7%, surpassing GPT-5.5 and Opus 4.8 while coming within 1% of Fable 5’s score at about half the cost.

The company said Fusion works by sending a prompt to multiple models in parallel, with each model using the same tools, including web search, web fetch, and bash. A judge model then evaluates the outputs and produces structured analysis covering consensus points, contradictions, partial coverage, unique insights, and blind spots. The final response is then written based on that combined analysis.

OpenRouter noted that Fusion is not intended to replace every model or workflow. Instead, the company positioned it as a way to get more thorough answers for complex questions where multiple perspectives, independent tool use, and synthesis can improve the final result.

The company also said Fusion can be useful for coding-related workflows when a coding model needs help on higher-level questions, such as architecture decisions or research into best practices.

OpenRouter said Fusion responses may take longer when invoked, often running two to three times longer than a standard model call because the system sends the prompt to multiple models, waits for responses, and then processes the results. The company said this approach is designed to balance standard model speed with the availability of deeper answers when needed.