Protege: $30 Million Raised To Expand Real-World Data Access For AI Development

By Amit Chowdhry • Today at 7:24 AM

Protege, an AI data platform focused on unlocking trusted, real-world datasets at scale, has raised a $30 million Series A round led by Andreessen Horowitz (a16z). The financing expands the company’s previously announced $25 million Series A from August 2025 and brings total funding to $65 million since Protege’s founding in 2024. Returning investors include Footwork, CRV, Bloomberg Beta, Flex Capital, Shaper Capital, and others.

Protege helps AI teams source and operationalize private and proprietary datasets across domains and formats—including media content, audio recordings, de-identified health records, and medical imaging—through licensing agreements with data providers. The company also supports dataset curation and optimization for training and evaluation workflows, and says it works with AI companies and institutions worldwide, including most of the “Magnificent Seven.”

Protege said it expanded its data partner network to hundreds of organizations in 2025, curating and packaging datasets from across that network and providing revenue-share payouts to partners when their data is used. The new capital will be used to accelerate product development, broaden the company’s data network into additional domains and formats, deepen institutional partnerships, and scale the team and infrastructure required to deliver AI-ready, rights-protected access to real-world data.

KEY QUOTES:

“Across industries, we’re seeing demand for real-world data grow faster than the market’s ability to supply it responsibly. At the same time, data is highly fragmented, and neither data holders nor AI builders are set up to operationalize it at scale. Protege serves as a trusted source of curated, and AI-ready data while unlocking new revenue streams for data providers. Partnering with Andreessen Horowitz allows us to scale this model and deliver high-quality, use-case-specific data that AI research teams can trust.”

Bobby Samuels, CEO and Co-founder, Protege

“Access to data is the biggest bottleneck to the advancement of AI. The next phase of AI will be driven by real-world, proprietary data generated through everyday human activity. Protege is pioneering ways to safely access this information across data sources and compensate data owners to unlock AI’s potential.”

Travis May, Chairman and Co-founder, Protege

“The next era of AI will be shaped by who can responsibly unlock access to the world’s most valuable data. Protege has built a platform that respects the complexity of real-world data across industries while making it usable for modern AI development. Their momentum reflects a broader shift in the market, and we’re proud to support the team as they scale this critical layer of the AI ecosystem.”

Daisy Wolf, Partner, Andreessen Horowitz