Inferact Launches With $150 Million Funding At $800M Valuation To Commercialize vLLM As Inference Demand Surges

By Amit Chowdhry • Yesterday at 11:18 PM

Inferact, a new AI infrastructure startup founded by creators and core maintainers of the open-source vLLM inference engine, has launched with $150 million in seed funding at an $800 million valuation, positioning the company to push performance gains in large language model serving while keeping vLLM’s development in the open. The seed round was co-led by Andreessen Horowitz and Lightspeed Venture Partners, with participation from Sequoia Capital, Altimeter Capital, Redpoint Ventures and ZhenFund. Inferact also listed additional backers including The House Fund, Striker Venture Partners, Laude Ventures, Databricks Ventures, GC&H, and the UC Berkeley Chancellor’s Fund.

Inferact’s pitch centers on a growing bottleneck in AI: inference, or the compute required to serve models in production. While model capabilities have advanced rapidly, the company argues the systems required to deploy those models efficiently are struggling to keep up—especially as architectures diversify to include mixture-of-experts, multimodal systems, and longer-running agentic workflows that increase “test-time compute.”

vLLM, originally developed out of UC Berkeley-affiliated research, has become a widely adopted open-source inference engine that sits between model developers and hardware platforms, optimizing how models run across different accelerators. Inferact says that position gives it early visibility into new model architectures and new silicon, with the stated goal of making “serving AI” feel closer to an infrastructure utility than a bespoke engineering effort.

Andreessen Horowitz, in announcing its investment, framed the bet around a sharp increase in inference workloads as AI applications expand and agentic systems run longer tasks, producing more tokens and increasing concurrent demand. The firm said that inference becomes harder at scale due to batching, caching, and low-level kernel and operator optimizations that vary across models and chips.

Inferact said it intends to “supercharge” adoption of vLLM while contributing optimizations back to the open-source community, emphasizing that vLLM’s development model will remain open rather than shifting behind proprietary tooling. The company is also recruiting engineers and researchers to work on inference performance, broaden support for emerging architectures, and expand coverage across new hardware targets.