Can India build a soverign model or do we need to?

The GPU embargo forced China to innovate on

1. Multihead Latent Attention (MLA)

2. Multi Token Prediction

3. Smarter Mixture of Experts

4. Custom CUDA accleration

5. FP8 low precision training

6. GPRO

Major AI players in India currently are
* Krutrim by Bhavish Agarwal
* Sarvam AI
* Lossfunk by Paras Chopra (who recently sold his company Wingify for $200 million)
* Two Platforms by Pranav Mistry (Silicon valley based company primarily working on Indian datasets) (Jio invested $15million for 25% stake)
* AI4Bharat (research lab in IIT Madras)

TWO Platforms

Pranav Mistry left Samsung in 2021 to start TWO Platforms. They introduced
* SUTRA-V1 Multilingual language chat model. Dense architecture and 36b parameter.
* SUTRA-R0 Reasoning model. Dense architecture and 8b & 36b parameters available. Beats o1 and R1 in MMLU in most Indian languages.
* SUTRA-P0 (Coming soon). Time-series predicitive AI model - model which can forecast the future based on historical time-series data.
* Dual transformer architecture
* MMLU benchmark SUTRA has achieved 20-25% better performance in MMLU bechmark in Hindi, Korean, Gujarati, Japanese and Arabic as compared to GPT-4O.
* Tokenization The SUTRA models uses 4-8x less tokens for non-Romanized languages. They are claiming they outperformed all other LLMs in tokenization in 22 Indian official languages.

Krutrim AI

Models available on huggingface account of Krutrim as of febrauray 2025 are Dhwani-1, Krutrim-2, Krutrim-translate, Chitrath-1, Vyakhyarth-1.
* Krutrim-1 (7B model) (Jan 2024)
* 1. Krutrim-2 (Feb 2024)
* 2. Chitrath-1 (Vision language model build on top of K-1)
* 3. Dhwani-1 (ASR model build on top of K-1)
* 4. Vyakhyarth-1 (Embedding model for RAG & Search)
* 5. Krutrim Translate-1 (Translation model)
* BharatBench to evaluate Indic performaces
* Promised to build the largest supercomputer in India by the end of the year.
* Deployed deepseek-r1 on Krutrim cloud.

Lossfunk

* Exploring methods to uncensor Deepseek using abliteration
* Paras mentioned about using less data for training and more data for testing just like how human intelligence works

Sarvam AI

They have open sourced Shuka-v1 and Sarvam-2B on their huggingface platform.
Sarvam-1(October 2024) Indic 2b parameter model build on top of 10 Indic languages. Better performace on benchmarks MMLU, ARC-C, TriviaQA, BoolQ
Shuka-V1 (Aug 2024) India's first OS audio language model which uses llama as decoder.
Mayura Translation
Saraas ASR with translation
Bulbul TTS
Saarika STT

AI4Bharat

IndicTrans-2(May 2023) Machine translation on 22 Indian official languages. AksharantarLargest publicly available transliteration dataset for Indian languages