Traceloop announced the launch of its general availability and the closing of $6.1 million in seed funding, led by Sorenson Capital and Ibex Investors, with additional participation from Y Combinator, Samsung NEXT, and Grand Ventures.
Public model benchmarks do not reveal how a new AI model will perform in a business setting. Businesses gather data from seeing when an agent goes wrong and spend days tweaking prompts in a tedious trial-and-error approach. The software engineering practices of Continuous Integration and Continuous Deployment (CI/CD) have yet to be established for AI agents, which leads to customer churn as users disengage from unpredictable and buggy assistants.
New agent frameworks and protocols from OpenAI, Anthropic, and Google make it easier to connect AI systems to external data and trigger autonomous actions. However, as these agents grow more complex, developers face two persistent challenges. First, they lack visibility into how decisions are made. Second, they lack a reliable method for evaluating performance in real-world conditions. Through varying and arbitrary criteria, public benchmarks and testing methods often fall short once applications are deployed in production. When AI agents misfire, whether by hallucinating, taking the wrong action, or producing unpredictable outputs, users do not file bug reports. They disengage.
Created on open-source OpenLLMetry, Traceloop is now available as a commercial platform that helps teams test, troubleshoot, and improve AI agents before they reach users. Replacing manual “vibe checks” with automated evaluations gives teams the tools to reduce guesswork, deploy changes more frequently with data-backed confidence, and catch quality issues before they reach production, enabling faster iteration, more reliable outputs, and greater confidence in every release.
A veteran team with hybrid experience in machine learning, artificial intelligence, and enterprise software development created Traceloop. Co-founder and CEO Nir Gazit worked for four years at Google, where he led a team of engineers responsible for building models for predicting user engagement and retention using internal LLMs. And Gal Kleinman, Traceloop’s cofounder and CTO, previously led the development of Fiverr’s machine learning platform and data infrastructure.
The company’s open-source technology OpenLLMetry, powers critical AI systems at enterprise scale. And IBM utilizes OpenLLMetry together with its Instana platform to monitor the performance of large language models running on services like Amazon Bedrock and IBM watsonx.ai, helping teams understand how AI applications behave in real-world conditions. Altogether, OpenLLMetry now sees half a million monthly installs across open-source packages with 5.6 thousand stars, over sixty contributors, and 50,000 weekly active installations of the SDK.
How the funding will be used: The funding will be used to accelerate product development, expand go-to-market efforts, and support Traceloop’s mission to make AI agents production-ready and enterprise-grade.
KEY QUOTES:
“Prompt engineering shouldn’t be a guessing game or have to rely on ‘vibes’ to be successful. It should be like the rest of engineering – observable, testable, and reliable. When we bring the same rigor to AI that we expect from the rest of our stack, we unlock its full potential.”
Nir Gazit, co-founder and CEO, Traceloop
“Trust but verify. It’s no secret that LLMs represent a step-function improvement in how humans interact with data. But their confidence — and potential for inaccuracy — makes AI agents that much more dangerous. IBM, Cisco, Dynatrace, and others already rely on Traceloop’s core technology for agent observability and verification, ensuring that AI agents function as intended. I expect the adoption of verification tools to outpace that of LLMs themselves.”
Aaron Rinberg of Ibex Investors
“Miro Insights processes millions of conversations across their platform. At that scale, edge cases appear almost instantly. We can’t assume that what works in testing will behave the same way in production. Traceloop gives us real-world performance visibility, flags critical edge cases, and helps us confidently experiment with and migrate to new models like GPT-4.1 without disrupting the user experience.”
Eu-Tak Kong, AI Engineer at Miro
“Agents are rapidly becoming AI’s de facto customer-facing technology, but teams are still relying on customer feedback to iterate. Traceloop has the right technology at the right moment to offer substantial value immediately.”
Vidya Raman, partner, at Sorenson Capital