Can India build a soverign model or do we need to?
The GPU embargo forced China to innovate on
1. Multihead Latent Attention (MLA)
2. Multi Token Prediction
3. Smarter Mixture of Experts
4. Custom CUDA accleration
5. FP8 low precision training
6. GPRO
Major AI players in India currently are
* Krutrim by
Bhavish Agarwal
* Sarvam AI
* Lossfunk by Paras Chopra
(who recently sold his company Wingify for $200 million)
*
Two Platforms by Pranav Mistry (Silicon valley based
company primarily working on Indian datasets) (Jio invested
$15million for 25% stake)
* AI4Bharat (research lab in IIT
Madras)
TWO Platforms
Pranav Mistry left Samsung in 2021 to start TWO Platforms. They
introduced
* SUTRA-V1 Multilingual language chat
model. Dense architecture and 36b parameter.
*
SUTRA-R0 Reasoning model. Dense architecture and 8b & 36b
parameters available. Beats o1 and R1 in MMLU in most Indian
languages.
* SUTRA-P0 (Coming soon). Time-series
predicitive AI model - model which can forecast the future based
on historical time-series data.
* Dual transformer architecture
* MMLU benchmark SUTRA has achieved 20-25% better
performance in MMLU bechmark in Hindi, Korean, Gujarati, Japanese
and Arabic as compared to GPT-4O.
* Tokenization The SUTRA models uses 4-8x less tokens for
non-Romanized languages. They are claiming they outperformed all
other LLMs in tokenization in 22 Indian official languages.
Krutrim AI
Models available on huggingface account of Krutrim as of febrauray
2025 are Dhwani-1, Krutrim-2, Krutrim-translate, Chitrath-1,
Vyakhyarth-1.
* Krutrim-1 (7B model) (Jan 2024)
* 1. Krutrim-2 (Feb
2024)
* 2. Chitrath-1 (Vision language model build on top of
K-1)
* 3. Dhwani-1 (ASR model build on top of K-1)
* 4.
Vyakhyarth-1 (Embedding model for RAG & Search)
* 5. Krutrim
Translate-1 (Translation model)
* BharatBench to evaluate
Indic performaces
* Promised to build the largest
supercomputer in India by the end of the year.
* Deployed
deepseek-r1 on Krutrim cloud.
Lossfunk
* Exploring methods to uncensor Deepseek using abliteration
* Paras mentioned about using less data for training and more data
for testing just like how human intelligence works
Sarvam AI
They have open sourced Shuka-v1 and Sarvam-2B on their huggingface
platform.
Sarvam-1(October 2024) Indic 2b parameter model build on
top of 10 Indic languages. Better performace on benchmarks MMLU,
ARC-C, TriviaQA, BoolQ
Shuka-V1 (Aug 2024) India's first OS audio language model
which uses llama as decoder.
Mayura Translation
Saraas ASR with translation
Bulbul TTS
Saarika STT
AI4Bharat
IndicTrans-2(May 2023) Machine translation on 22 Indian
official languages. AksharantarLargest publicly available
transliteration dataset for Indian languages