AI / ML2026-05-17 13:59 UTC· 2 min

IBM ships Granite Embedding R2 — Apache-2.0 multilingual models with 32K context

IBM Research's Granite team has released two ModernBERT-based embedding models — 97M and 311M parameters — supporting 200+ languages with a 32K context window, claiming top-of-class retrieval among sub-100M models.

By AI / ML DeskDesk persona — AI-augmented, human-approvedfact-checked

IBM published Granite Embedding R2 on 14 May 2026, releasing two open-weights models under the Apache 2.0 licence . The smaller variant has 97M parameters with 384-dimensional embeddings; the larger has 311M parameters and 768-dimensional embeddings . Both share a 32,768-token context window — a 64× increase over the R1 generation .

Both models use a ModernBERT encoder — 12 layers for the 97M, 22 layers for the 311M — paired with Rotary Position Embeddings and Flash Attention 2.0 . R1 used XLM-RoBERTa with a 512-token context; R2's architecture shift is substantial.

On MTEB Multilingual Retrieval, the 97M model scores 60.3, which IBM presents as the best-in-class score for sub-100M-parameter models — a 9.4-point gap over multilingual-e5-small (50.9, at 118M params) . The 311M model scores 65.2 and lands second among sub-500M models, trailing only harrier-oss-v1-270m at 66.4 . On the LongEmbed benchmark, designed for long-document retrieval, the 311M model takes first place at 71.7 .

The release supports 200+ languages overall, with 52 receiving enhanced training (the explicit list spans Albanian to Vietnamese), plus 9 programming languages including Python, Rust... actually, the listed code coverage is Python, Go, Java, JavaScript, PHP, Ruby, SQL, C and C++ .

Operational headlines: on an H100, the 97M model encodes ~2,500 documents/second; the 311M does ~1,800 docs/sec — the team reports this is "over 5.5× the encoding speed" of jina-embeddings-v5-text-nano (212M, MTEB 63.3) . The 311M model also supports Matryoshka Representation Learning: practitioners can truncate the 768-dim output to 256 dimensions for a 3× storage reduction at a cost of only 0.5 MTEB Multilingual points .

Training methodology: knowledge distillation from Granite 3.3 Instruct, Mistral v0.2 Instruct and Granite 4.1 8B; contrastive fine-tuning on multilingual retrieval pairs; model merging across stages . IBM notes it deliberately excluded the MS-MARCO training dataset and any data with non-commercial licensing — a meaningful commitment for downstream commercial users .

Two practical things to flag. First, the 97M model does **not** support Matryoshka truncation — only the 311M does. Second, the 97M shows a slight cross-lingual regression on Belebele (-2.2 vs R1) attributed to vocabulary pruning from 250K to 180K tokens — for workloads sensitive to less-resourced languages, benchmark the 97M against R1 directly before swapping.

Weights and code are on Hugging Face at `ibm-granite/granite-embedding-{97m,311m}-multilingual-r2`; the technical report is arXiv 2605.13521 .

← Back to ai-ml