DataCebo: $8.5 Million Raised To Help Increase Developer Productivity At Enterprises By Utilizing Generative AI

By Amit Chowdhry • Dec 7, 2023

Today DataCebo emerged with SDV Enterprise, a commercial offering of the popular open-source product Synthetic Data Vault (SDV). DataCebo also announced that it has raised $8.5 million in seed funding co-led by Link Ventures and Zetta Venture Partners. And Uncorrelated Ventures also participated. The company plans to use the funding to advance its product and to build a go-to-market team.

With SDV Enterprise, developers can easily build, deploy, and manage sophisticated generative AI models for enterprise-grade applications when real data is limited or unavailable.

SDV Enterprise’s models can create higher-quality synthetic data statistically similar to original data so developers can effectively test applications and train robust ML models. And SDV Enterprise is currently in beta with the Global 2000. Today, Global 2000 organizations have about 500 to 2000 applications for which they need to create synthetic data 12 times a year.

DataCebo’s co-founders, Kalyan Veeramachaneni (CEO) and Neha Patki (vice president of product) created SDV at MIT’s Data to AI Lab. And SDV enables developers to build a proof-of-concept generative AI model for small tabular and relational datasets with simple schemas and create synthetic data. SDV was downloaded over a million times and has the largest community around synthetic data. DataCebo was founded in 2020 to revolutionize enterprise developer productivity by utilizing generative AI.

DataCebo’s first product – SDV Enterprise – takes SDV to the next level by providing every team with synthetic data 1000x faster with 10x the quality. The key features include:

— Scalability – Developers can train a generative AI model with much larger datasets and complex schemas with hundreds of interconnected tables

— Deep Data Understanding – Developers can train models that understand the deeper meaning behind real-world data concepts like the structure of a phone number and which geographical areas it represents

— Programmability – Developers can fine-tune the generative AI model stack using low-code APIs by supplying their data schema, business logic and evaluation criteria

— Integration – Developers can deploy synthetic data applications by ingesting and exporting data in a variety of different formats

— Management – Developers can manage multiple synthetic data applications, track changes and update their generative AI models as their applications grow and change

KEY QUOTES:

“Our developers spend a lot of time creating data manually to test their applications. We had been looking for generative AI-based solutions that can automate and create high-quality synthetic data for our needs. Of all the solutions we looked at, DataCebo’s SDV Enterprise was the best fit to handle the complexity of our data. With SDV Enterprise, we were able to generate synthetic data within hours – what otherwise took our developers days or weeks in some cases. Currently, we have used SDV Enterprise in 13 applications, and the demand is growing exponentially.”

— Wim Blommaert, product owner of AI-powered synthetic data generation at ING Belgium, one of the biggest banks in the world

“Synthetic data can help companies reduce the bias and privacy risks that are common with real-world data. DataCebo helps data teams generate synthetic tabular datasets using generative adversarial networks (GANs) in a rapid, accurate way, so they can train more ML models in a given timeframe.”

— Kevin Petrie, vice president of research at Eckerson Group

“The ability to build generative models on-prem is critical for enterprises. Their data is proprietary and is very specific. In our first year, we quickly learned that this unique capability that SDV Enterprise provides is a massive enabler for them. Our customers often ask whether they need massive hardware or specific hardware requirements to use SDV Enterprise. They are often surprised that with SDV Enterprise, they can train generative models on a single machine. This opens up a new horizon of possibilities for training and using these models and applying them to a variety of use cases. As one customer said, if we have to spend $100,000 to train a model, it simply reduces the number of use cases we can use it for.”

— DataCebo co-founder Kalyan Veeramachaneni (CEO)

“We are thrilled to support this world-class MIT team. Their leadership in generative modeling for complex enterprise data is unlike others in the synthetic data industry. We are confident in this team’s ability to lead the category, enabling the next users of AI models to connect statistical to computational outcomes and sew the fabric of open to closed source synthetic data generation.”

— Dave Blundin, co-founder and managing partner at Link Ventures and a DataCebo board member

“The huge enthusiasm of the open source community and the ROI enjoyed by early commercial adopters have shown DataCebo to be a product leader in the emerging field of generative AI for synthetic data. It is rare to find a company whose products serve as both pathbreakers and standard-bearers, and we are very excited to invest in this amazing team from MIT, knowing that they will continue to push the envelope.”

— Mark Gorenberg, founder and managing director at Zetta Venture Partners and a DataCebo board member