Tripo AI: Interview With Chief Scientist Dr. Yanpei Cao About the AI 3D Foundational Model Company

By Amit Chowdhry Jun 4, 2026

Tripo AI is a world-leading general-purpose artificial intelligence company dedicated to leading the democratization of creation through cutting-edge algorithmic research and the practical application of AI 3D foundation models and world models.

Pulse 2.0 interviewed Tripo Chief Scientist Dr. Yanpei Cao to learn more.

Dr. Yanpei Cao

Please tell me more about your background.

“I’m Yanpei Cao, Chief Scientist at Tripo AI. My background sits right at the intersection of AI, computer graphics, and spatial intelligence. Before co-founding Tripo, I led 3D generative research at Tencent’s ARC Lab and AI Lab. But I also have an entrepreneurial background: earlier in my career, I was the CTO of Owlii, a volumetric capture startup that was later acquired by Kuaishou. That was really where I learned what it takes to build and scale 3D generation systems for the real world.

Academically, I’ve spent years publishing at venues like SIGGRAPH, CVPR, and NeurIPS. But for me, the goal has never been just about hitting academic milestones. The real driving force behind my work is figuring out how to teach machines to genuinely understand and generate the world. We aren’t just trying to generate static meshes that look good from a certain angle; we are tackling the underlying physics, topology, and interactive structures of 3D environments.

Now, at Tripo, I direct our R&D across multimodal 3D generation and generative world models. My core focus is taking foundational AI breakthroughs and turning them into industrial-grade production pipelines. I want to completely eliminate the traditional bottlenecks of interactive content creation. Ultimately, my vision is to build the simulation infrastructure for our world, so that any developer or creator can go directly from a concept to a fully rigged, deployable, and interactive environment in a matter of seconds.”

Could you tell me more about Tripo’s role in the ecosystem?

“Tripo positions itself as the foundational bedrock and the spatial computing infrastructure for next-generation interactive entertainment and global UGC ecosystems.

Unlike traditional AI plug-ins, Tripo is explicitly not designed as a mere efficiency tool for game studio artists. While efficiency tools offer incremental cost reduction, their commercial ceiling is strictly capped. Tripo’s ultimate objective is to provide the underlying infrastructure that blurs the line between players and creators, fundamentally altering how digital content is authored and interacted with.”

Could you tell me more about the company’s corporate vision?

“Empowering every individual to architect interactive, complex worlds with the ease of natural expression.”

“The core mandate is radical democratization. By utilizing world models and generative AI, Tripo eliminates technical barriers of manual topology and physics coding. We enable users with zero background in 3D modeling or software engineering to seamlessly build rich , logically sound, and interactive physical experiences.”

What is the company’s role beyond pure tooling?

“Positioning a company solely around cost reduction yields diminishing returns. Tripo aims to transform the creative workflow entirely. The goal is to empower ordinary users to orchestrate, simulate, and publish interactive spaces across any accessible platform. We want to make the creation of an interactive, physics-based environment as intuitive and frictionless as typing a sentence or uploading a video today.”

Tripo recently launched Project Eden. Can you tell us more about it?

Project Eden

“Project Eden distinguishes itself from conventional industry approaches (such as ‘action-conditioned video generation’ and ‘static 3D scene generation’), by achieving a native architecture that decouples underlying state evolution from visual rendering.

This breakthrough makes a true world model capable of environmental persistence and deterministic state control. It naturally unlocks disruptive capabilities, including long-term object permanence, reusable scenes, and concurrent multi-agent interaction. Project Eden aims to become the foundational engine for next-generation interactive content creation, while simultaneously providing a highly logically consistent training and evaluation environment for embodied AI.

Project Eden utilizes a three-layer decoupled architecture:

Layer 1: Evolving Structured State: Instead of hiding the world in a pixel history, we maintain a globally shared, continuous world state. It is a compact, implicit/structured representation that governs the underlying geometry, object semantics, and the physical consequences of any action inputs.
Layer 2: State-to-Observation Interface: When a specific viewpoint is queried, the system translates the evolving state into geometric and semantic constraints tailored for that camera angle. This fundamentally guarantees cross-camera and cross-perspective physical consistency.
Layer 3: Generative Rendering: Relying on the objective state constraints, this layer dynamically renders high-fidelity visual frames on demand, filling in textures, lighting, and high-frequency dynamic details without blindly guessing the structure.

Project Eden‘s Three Core Capabilities

Object Permanence and Viewpoint Consistency: In our world model, objects that leave the camera’s frustum do not disappear; they continue to exist in the underlying state. When a user looks away and turns back, the model queries a confirmed objective state rather than hallucinating from scratch.
Reusable Worlds and Deterministic Control: Unlike traditional video generation models, which act as irreversible ‘blind boxes,’ Project Eden allows users to repeatedly intervene, control, and modify an evolving base state. It acts as a reusable, modular sandbox.
Native Multi-Agent Interaction: Because state evolution and rendering are decoupled, a single underlying world state can be shared and updated synchronously across multiple agents. The system only needs to render the multiple views based on individual local coordinates, making concurrent, multi-perspective interaction computationally economical and mathematically possible.”

Could you describe the company’s world model technical roadmap?

“Our technical roadmap is defined by the Decoupling of State and Generative Rendering.

Tripo’s core competitive advantage lies in recognizing that predicting physical transitions and drawing pixels are two entirely different tasks. By forcing models to specialize—one tracking the structural state and the other rendering the visual output—we maximize compute efficiency and deliver superior visual fidelity while natively supporting multi-user synchronization.”

Could you describe the Rationale For Decoupling over End-to-End Monolithic Architectures?

“End-to-end monolithic video models face significant capacity bottlenecks because they attempt to compress space, time, and physical events into autoregressive pixel trajectories. Tripo’s decoupled approach offers several key advantages:

Native Multi-User Interaction: Because the state layer remains globally unified and persistent, multiple users can interact within the exact same space-time matrix simultaneously without exponential compute costs.
Environmental Permanence: Virtual environments exist continuously and can be revisited at any time without resetting, mirroring traditional game server states rather than video clips.
Compute Optimization: Separating state evolution from visual rendering drastically improves training efficiency and prevents the system from wasting massive compute power on hallucinating off-screen geometries.

What are the company’s productization roadmaps and milestones?

Our roadmap is structured as a progression from static asset creation to fully interactive world generation.

Current Foundation: Tripo Studio

Our productization is already firmly established today with Tripo Studio. Currently, Tripo Studio serves as a professional-grade workspace where creators, developers, and studios leverage our P1.0 and H3.1 models to generate, refine, and export pipeline-ready 3D assets in seconds. It has successfully productized our core AI 3D models, drastically reducing the time required for traditional modeling, retopology, and texturing workflows.

Short-Term Product Strategy (2026 Rollout)

Building upon the Tripo Studio ecosystem, our short-term strategy involves merging our native 3D asset generation engine with large language models to deploy a new interactive 3D platform. Users will be able to input natural language or reference imagery to automatically build playable games and environments with pristine topology, without writing a single line of code.Long-Term Architectural Vision

The ultimate objective is to trigger the true ‘ChatGPT Moment’ for interactive media. A user could prompt: ‘A post-apocalyptic escape room containing a weathered jeep, with the ignition key hidden inside a wooden crate,’ and the system will instantly compile a fully realized, playable sandbox. Because all logic, physics, and state transitions will run natively on our world model architecture, the engineering cost of defining digital world rules drops to zero.

What is the company’s commercialization strategy?

Monetizing AI 3D Pipelines

The Tripo P and H-series product lines are fully commercialized, generating steady cash flow and providing user data loops. These pipelines service global game development studios and 3D printing vendors by delivering pipeline-ready assets that directly reduce studio overhead and outsourcing expenses.

Monetizing the World Model

For the world model, Tripo intends to operate as an infrastructure-as-a-service (IaaS) provider. Much like modern game engines or cloud computing utilities, Tripo will provide the underlying ‘Interactive Runtime.’ Developers, UGC platforms, and robotics companies will utilize our APIs for state-transition computations and real-time generation, effectively making Tripo the foundational utility grid for next-generation virtual worlds.

Along the way, components of the generative rendering layers will be spun off and commercialized as standalone, high-performance generative AI renderers tailored for the advertising and animation industries.

Could you tell me more about the company’s open source and cultural foundations?

Open-Source Philosophy

Tripo embraces a strategic open-source model designed to mobilize the global research community and solve industry-wide challenges. For instance, our recent release of TripoSplat introduces Density-Sampled Gaussians (DeG), using a policy-gradient approach to smartly allocate an arbitrary number of 3D Gaussians directly from data. This strategy accelerates scientific progress and serves as a powerful mechanism to aggregate top-tier global talent. Meanwhile, our core commercial barriers, such as our proprietary data curation pipelines and end-to-end industrial engineering, remain heavily protected.

Internal Engineering Culture

Tripo operates on a first-principles engineering culture. Rather than indexing against competitor roadmaps or following short-term hype cycles, our team focuses entirely on solving the foundational mathematics of spatial computing. This independent, research-driven mindset is exactly what led Tripo to abandon traditional serialization and build our highly unique, decoupled world model architecture.”