Tensormesh: $20 Million Raised To Scale KV Caching Infrastructure For Enterprise AI Inference

By Amit Chowdhry • Today at 1:55 PM

Tensormesh announced $20 million in new funding from investors including AMD’s AMD Ventures, CoreWeave, NVIDIA’s NVentures, Valley Capital Partners, and Laude Ventures, extending its seed round and bringing total funding to $24.5 million.

Alongside the funding announcement, the company launched the general availability of Tensormesh Inference, its SaaS inference platform designed to reduce AI infrastructure costs through KV caching technology. The platform addresses a major challenge in enterprise AI workloads, where inference requests repeatedly recompute the same context and prompts, consuming GPU resources and increasing latency and costs.

Tensormesh said its platform stores and reuses previously computed results through KV caching, enabling enterprises to reduce latency and GPU spend by up to 10x. The company noted that the technology is especially valuable for multi-step agentic AI workflows, where repeated processing of conversation history, system prompts, and tool definitions can significantly increase inference expenses.

The company said the participation of AMD Ventures, CoreWeave, and NVentures reflects growing industry interest in KV caching as a foundational layer of AI infrastructure.

Tensormesh Inference is now generally available through two deployment models. The first is a serverless inference offering that provides OpenAI-compatible API access to frontier AI models without requiring infrastructure management. The second is reserved deployments for enterprises needing dedicated capacity, predictable performance, and customized SLA support.

The platform also introduces a pricing structure where cached input tokens served from the KV cache are billed at zero cost across all serverless deployments. Tensormesh said the model is intended to directly reflect the efficiency gains produced by caching technology.

In addition to cost reductions, the company emphasized transparency features within the platform, including visibility into cache hit rates, token-level cost breakdowns, KV cache usage ratios, and real-time performance analytics. Tensormesh said customers can continuously optimize deployments using metrics such as time to first token, inter-token latency, throughput, and GPU compute utilization.

The new funding will be used to accelerate product development, expand integrations with hardware and AI cloud providers, and deepen the company’s work on LMCache, its open-source KV caching project. LMCache currently has more than 8,000 GitHub stars and integrations across platforms including vLLM, SGLang, TensorRT, NVIDIA Dynamo, AWS SageMaker, and Oracle OCI Data Science.

Tensormesh was founded by faculty members, researchers, and alumni from the University of Chicago, University of California, Berkeley, and Carnegie Mellon University. The company is led by co-founder and CEO Junchen Jiang.

KEY QUOTES:

“Tensormesh offers a new vision on the significance of the intermediate data that LLMs generate when processing prompts. Behind the term KV cache is a whole concept of AI interpretation of the question it is asked. This makes it a whole new class of data and a category Tensormesh is uniquely positioned to define. We’re excited to keep building.”

Junchen Jiang, Co-Founder And CEO, Tensormesh

“As enterprises scale AI workloads, maximizing every GPU cycle is critical. Software innovations like KV caching are a powerful complement to raw accelerator performance. Paired with AMD Instinct™ GPUs, Tensormesh’s platform can help customers drive value from their infrastructure investments.”

Ramine Roane, Corporate Vice President, AI At AMD

“Tensormesh is working to solve infrastructure challenges that will ultimately impact the economics and scalability of AI. Their work advancing KV caching can help make inference faster and more efficient at scale, and it reflects exactly the kind of foundational innovation CoreWeave Ventures is committed to backing.”

Brannin McBee, Co-Founder And Chief Development Officer, CoreWeave

“KV caching represents one of the most consequential and underexplored opportunities in AI infrastructure today. Tensormesh has built the only platform that makes this technology production-ready for the enterprise, and we believe it will become a critical part of how every serious AI deployment is run.”

Steve O’Hara, Founder And Managing Partner, Valley Capital Partners

“As AI workloads grow, intelligent reuse of cached state has become one of the most powerful levers for performance and cost efficiency. Tensormesh’s LMCache is built to take full advantage of next-generation storage, and we look forward to our continued collaboration to push the boundaries of what’s possible across the AI stack.”

Leno Park, Vice President Of NAND Product Planning, Samsung Electronics

“Inference economics will define what is possible for the next generation of AI products. Tensormesh is tackling one of the most important challenges in AI infrastructure: helping companies reduce GPU spend without requiring changes to application code. The combination of meaningful cost savings and simple deployment is rare, it positions Tensormesh to become a critical layer in the AI infrastructure stack.”

Hui Zhang, CTO And Co-Founder, Conviva, And Advisor To Tensormesh

“What started as a research project around KV caching is becoming a critical part of the AI stack. Tensormesh understood early that enterprises were paying AI systems to recompute the same work again and again, and built foundational infrastructure to eliminate that inefficiency and dramatically improve price-performance. The team has paired deep systems expertise with real open-source credibility to build infrastructure enterprises can actually rely on.”

Pete Sonsini, Co-Founder And General Partner, Laude Ventures