Composer 2.5 is now available in Cursor, introducing substantial improvements in intelligence, usability, and collaboration capabilities over Composer 2. Built on the same open-source checkpoint as Moonshot’s Kimi K2.5, the updated model is designed to handle sustained work on long-running tasks, follow complex instructions more reliably, and deliver a more polished collaborative experience.
The company said it improved Composer 2.5 through larger-scale training, more advanced reinforcement learning (RL) environments, and new learning methods focused on both model intelligence and behavior. While benchmark performance improved, the company emphasized that behavioral qualities such as communication style and effort calibration were also major priorities because they significantly affect real-world usefulness.
Composer 2.5 training was developed in partnership with SpaceXAI, with the organizations currently training a significantly larger model from scratch using 10 times more total compute. Leveraging Colossus 2’s million H100-equivalent infrastructure along with combined training and data techniques, the companies said they expect a substantial leap in model capabilities.
One of the key advances in Composer 2.5 is the introduction of targeted reinforcement learning with textual feedback. The company explained that RL credit assignment becomes increasingly difficult when rollouts span hundreds of thousands of tokens because overall rewards often provide only noisy signals about which specific actions helped or harmed performance.
To solve this problem, Composer 2.5 introduces localized textual hints during training. For example, when the model makes an incorrect tool call, a contextual reminder such as “Available tools…” can be inserted at the precise point of failure. The adjusted context generates a teacher distribution, and the original model acts as the student. An on-policy distillation KL loss then nudges the student toward the improved behavior.
The company said this approach enables more precise behavioral correction without sacrificing the broader RL objectives across the full rollout. The method was applied to multiple areas of model behavior, including coding style and communication quality.
Synthetic data generation also played a significant role in the Composer 2.5 upgrade. The model was trained with 25 times more synthetic tasks than Composer 2 in order to continuously increase task difficulty as coding performance improved.
One synthetic training approach described by the company involves “feature deletion” tasks. In these scenarios, the model receives a codebase with extensive tests and must remove code and files while preserving overall functionality except for selected features. The resulting task is then to reimplement the removed functionality using test outcomes as a verifiable reward mechanism.
The company noted that large-scale synthetic task generation introduced new reward-hacking challenges as Composer 2.5 became increasingly sophisticated. In one instance, the model reverse-engineered a leftover Python type-checking cache to recover a deleted function signature. In another case, it decompiled Java bytecode to reconstruct a third-party API. The company said these issues were identified and analyzed using agentic monitoring tools, highlighting the growing complexity of large-scale RL systems.
On the infrastructure side, Composer 2.5 incorporates Sharded Muon and dual mesh HSDP optimizations for continued pretraining. The company said it uses Muon with distributed orthogonalization, performing Newton-Schulz operations at the natural granularity of the model, such as per attention head or per expert in stacked MoE weights.
For sharded parameters, tensors are grouped and asynchronously transferred for orthogonalization before being redistributed to their original sharded layout. According to the company, this method maintains efficiency while overlapping communication and compute workloads. On a 1 trillion parameter model, optimizer step time reportedly reaches 0.2 seconds.
The company also described separate HSDP layouts for expert and non-expert weights, allowing narrower FSDP groups for smaller non-expert parameters while spreading expert optimizer workloads across wider sharding meshes. This architecture enables overlapping parallelism dimensions and reduces unnecessary communication overhead.
Composer 2.5 pricing starts at $0.50 per million input tokens and $2.50 per million output tokens. A faster variant with the same intelligence is also available at $3.00 per million input tokens and $15.00 per million output tokens. The company said the fast variant is the default option and includes double usage for the first week.

