News

Kimi K2: The Open-Source Trillion-Parameter Model Behind Cursor's AI

Aan Team·March 23, 2026·3 min read
Kimi K2: The Open-Source Trillion-Parameter Model Behind Cursor's AI

Moonshot AI, a Chinese AI company backed by Alibaba, released Kimi K2 — a one-trillion-parameter Mixture of Experts model with open weights under a modified MIT license. Only 32 billion parameters activate per token, making it efficient despite its massive total size. It was trained on 15.5 trillion tokens and holds the top open-source score on SWE-bench Verified, LiveCodeBench v6, and several math benchmarks.

The timing matters because in March 2026, Cursor revealed that their Composer 2 coding model was built on top of Moonshot's Kimi architecture. When the tool that millions of developers use to write code is powered by an open-weights model from a Beijing startup, the dynamics of AI competition have fundamentally changed.

One trillion parameters, 32 billion active

K2 uses a Mixture of Experts architecture with 384 expert modules, selecting 8 per token plus 1 shared expert. This means each generation step uses only 32 billion parameters while the full model holds a trillion parameters worth of specialized knowledge. The architecture uses Multi-Head Latent Attention, derived from the same scaling law analysis behind DeepSeek-V3.

The model was trained using MuonClip, a novel optimizer that extends the Muon optimizer with techniques to prevent training instability at this scale. Moonshot reports zero training instabilities during the entire pre-training run — a significant engineering achievement for a trillion-parameter model. The base context window is 128K tokens, extending to 256K for the Thinking and K2.5 variants.

Benchmark performance

On LiveCodeBench v6, K2 scores 53.7%, surpassing Claude Sonnet 4 at 48.5% and GPT-4.1 at 44.7%. On SWE-bench Verified with multi-attempt agentic evaluation, it reaches 71.6% — behind Claude Sonnet at 80.2% but far ahead of DeepSeek-V3 at 38.8%. On math, K2 scores 97.4% on MATH-500, the highest among all models tested, and 49.5% average on AIME 2025, beating Gemini 2.5 Flash at 46.6%.

On reasoning and logic, K2 scores 89.0% on ZebraLogic, compared to 73.7% for Claude Sonnet 4 and 58.5% for GPT-4.1. On GPQA-Diamond, a graduate-level science benchmark, it scores 75.1%, ahead of all compared models. These results put K2 in direct competition with proprietary frontier models — and it is fully open-weights.

Why open weights matter here

K2 is released under a modified MIT license that allows commercial use. The weights are available on HuggingFace in block-fp8 format, and deployment is supported via vLLM, SGLang, KTransformers, and TensorRT-LLM. This means any company can run this model on their own infrastructure without sending data to a third-party API.

The practical impact is significant. Cursor built their Composer 2 on this architecture. Any coding tool, IDE integration, or development assistant can use these weights as a foundation. At $0.57 per million input tokens and $2.40 per million output tokens through the Moonshot API — or free if self-hosted — K2 offers frontier-level coding performance at a fraction of proprietary pricing.

The model family

Moonshot released several K2 variants. K2-Base is the foundation model for custom fine-tuning. K2-Instruct is the chat-optimized version for direct use. K2-Thinking adds step-by-step reasoning with tool calling, and K2-Turbo-Preview offers high-speed inference at 60 to 100 tokens per second. In January 2026, Moonshot released K2.5, which extends the base with multimodal capabilities — text, vision, and video understanding — trained on an additional 15 trillion multimodal tokens.

The full family is available on HuggingFace under the moonshotai organization, and the GitHub repository has accumulated over 10,500 stars. For developers evaluating which model to use for coding tasks, K2 is now the strongest open-weights option available, and the gap with proprietary models continues to narrow.