Neural Magic: Reducing The Computation And Memory Needed For Neural Networks, Which Can Save Companies Billions

By Amit Chowdhry • Nov 14, 2023

Neural Magic is a company that is building software that makes machine learning execution simple and efficient. From research to code, Neural Magic uses model sparsity to accelerate open-source LLMs and bring operational simplicity to GenAI deployments. Pulse 2.0 interviewed Neural Magic CEO Brian Stevens to learn more.

Brian Stevens’ Background

Brian Stevens

Stevens studied computer science at the University of New Hampshire (UNH), one year after they started having a CS curriculum. Stevens said:

“UNH had a relationship with Digital Equipment Corporation back then. I joined DEC as a member of their technical staff and was in the right place at the right time. I had access to DEC’s cutting-edge hardware and software and learned from so many smart mentors.”

“I later left DEC to join a company focused on making Linux ready for the enterprise. Then along came 2001. Venture money dried up, and I decided to join Red Hat, where I spent 12 years as EVP and CTO of Worldwide Engineering.”

“When I joined Red Hat, the company was about 200 people. I started hiring and building as best as I could in 2001, with an eye toward making Red Hat Linux ready for the enterprise. I didn’t like the business model where every six months, we were selling a $60 product, and revenue bumps came from a sense of urgency to get a new product out — whether you need it or not. Instead, I wanted our team to bring continuity to the enterprise with what became software subscriptions. At the time, one of our customers told us it took $1 million and six months to put a new version of Linux into production, and we wanted it to be seamless.”

“In 2014, Google recruited me to scale Google Cloud. I joined as vice president of product and went on to lead the division as CTO. When I joined, revenues were less than $50 million, and it’s a $30 billion business today. Some of the most interesting work I did at Google Cloud was to switch our sales focus toward large enterprises and build out the capability they needed.”

“After Google, I had planned on focusing on advisory roles. I joined a couple of public boards to stay engaged, and around when COVID hit, I started working with some Boston VCs, helping them with diligence. That’s how I met Nir Shavit, founder of Neural Magic. As I spent more time with him, I fell in love with Neural Magic’s approach and what it could mean for the future of AI deployments. The business value AI can bring when integrated can be so impactful, but it’s constrained today. It should be commoditized, and I feel Neural Magic helps free up customers to run AI successfully on commodity hardware.”

“I first started at Neural Magic as an Executive Chairman, where I primarily worked with the team on product development and strategy. Now, as CEO, I work across the organization, with a focus on building solutions that make it as easy as possible to deploy AI models on commodity hardware. Of course, a big focus of our work this past year has been on Large Language Models (LLMs) and how to optimize them to run efficiently on commodity hardware.”

Favorite Memory

What has been Stevens’ favorite memory working for the company so far? Stevens shared:

“Today, we’re at the early stage of AI where we’ve seen many technological breakthroughs over the years, and the hardware needed to run AI models is specialized and expensive. Over time, hardware for emerging AI applications will commoditize, and software will get more sophisticated.”

“If AI is as important as we think it is, it can’t be a niche, expensive capability. It needs to be a commodity capability available to everybody, and I think Neural Magic has an opportunity to be really instrumental in that journey.”

“With this in mind, one of my favorite memories working for the company was when I led the team through a transition to open source and open research in 2021.”

“The grassroots that happens in the early phases of adoption of technology is often led by open source and is very much what’s happening in AI today. That’s why I felt it was so important for us to build a community and make it easy for developers to access and contribute to Neural Magic’s work. I’m thrilled that today, we have thousands of people, and growing, in our Neural Magic community.”

Core Products

What are Neural Magic’s core products and features? Stevens explained:

“Here’s a brief overview of our offerings today:

1.) DeepSparse is an inference runtime that delivers GPU-class performance on commodity CPUs, purely in software, anywhere. DeepSparse achieves its performance using algorithms that reduce the computation and memory needed for neural network execution and accelerate the resulting computation.

2.) SparseZoo is a repository of pre-optimized models, including computer vision, natural language processing, and large language models, to get started with DeepSparse and accelerate customers’ AI journey on CPUs.

In our Neural Magic blog, our team regularly publishes about how to integrate DeepSparse into AI initiatives to optimize performance, accuracy, and operational efficiency with machine learning workloads. Most recently, we wrote a summary of our latest published research paper on how to build sparse LLM applications on CPUs with DeepSparse.”

Challenges Faced

Have you faced any specific bottlenecks in your sector of work recently? Stevens acknowledged:

“The vast majority of enterprises are still at the very early stages of evaluating how to leverage AI. In the coming months and years, as more enterprises solidify their use cases for AI and move towards large-scale implementations, I expect to see very broad adoption of Neural Magic’s solutions.”

Evolution Of Neural Magic’s Technology

How has the company’s technology evolved since launching, and what have been some of the company’s most significant milestones? Stevens cited:

“Neural Magic’s team is always working on improving the performance of its software. I’m very proud of the performance benchmarks Neural Magic’s DeepSparse runtime has achieved with MLPerf Inference results, an AI performance benchmarking test run by MLCommons. In 2022, we were able to show a boost in CPU performance by 175x on popular deep learning models. Then, six months later, Neural Magic showed a 6x improvement in performance, elevating overall CPU performance to 1,000x.”

“Another significant milestone to note is with our growing partner ecosystem, making Neural Magic’s solutions available through Intel and AMD, as well as cloud providers such as GCP and AWS.”

Customer Success Stories

Can you share any specific customer success stories? Stevens highlighted:

“One of our customers, a generative AI platform for automated document processing, had been using GPUs to get the required level of model performance they needed. As part of their commitment to client satisfaction and a constant need to upgrade their services, they had been exploring additional avenues for deploying their solution, recognizing that not all clients have access to GPUs.”

“With Neural Magic, they can leverage cost-efficient commodity hardware and CPUs to reduce infrastructure spending and achieve 4-6X faster performance over GPUs. Our inference runtime, DeepSparse, provides the flexibility they need to scale their operations efficiently and be better positioned in the market.”

Funding

After asking Stevens about the company’s funding, he revealed:

“Neural Magic has raised $55 million in venture funding. Our last round was a $35 million Series A in 2021, led by NEA, with participation from Andreessen Horowitz, VMware, Verizon, Amdocs, Comcast, Pillar, and Ridgeline Ventures.”

Total Addressable Market

What total addressable market (TAM) size is the company pursuing? Stevens assessed:

“There are many ways to look at the TAM of our space. The TAM for the AI industry is expected to be in the hundreds of billions of dollars, and the TAM of GPUs is in the tens of billions of dollars. What excites me more than these current market sizes is their potential growth rates, which will be exponential in the coming years.”

Differentiation From The Competition

What differentiates the company from its competition? Stevens affirmed:

“There are some companies that are trying to optimize models. Neural Magic’s aim is much more ambitious — we want to offer frictionless deployment of AI models on any CPUs.”

“Within the optimization space, there are some companies that optimize models for specific types of hardware, but that adds a lot of complexity to operation teams to manage. Neural Magic is unique in that we optimize the model once, and then it can be deployed anywhere. Our optimization on the hardware type is done at deployment time, seamlessly. We give people the optionality of all three commodity hardware vendors — Intel, AMD, and ARM. We take advantage of any high-performance instruction sets when they’re there, and look at things like CPU or memory at the deployment time.”

Future Company Goals

What are some of the company’s future company goals? Stevens concluded:

“The LLMs that you hear about today are hundreds of times bigger than the AI language models people were working with before. We’ve all seen the value that these LLMs bring, and also the excessive costs and specialty hardware resources that these LLMs take to run.”

“Neural Magic is building some extraordinary new solutions for running LLMs more efficiently on CPUs. We’re going to build and run some of the biggest LLMs on a single machine, which today would take many GPUs to run. We look forward to making a number of LLM-focused announcements, the first of which will come in the next few months.”