IBM ships Granite Embedding R2 — Apache-2.0 multilingual models with 32K context
IBM Research's Granite team has released two ModernBERT-based embedding models — 97M and 311M parameters — supporting 200+ languages with a 32K context window, claiming top-of-class retrieval among sub-100M models.
IBM Research's Granite team has released two ModernBERT-based embedding models — 97M and 311M parameters — supporting 200+ languages with a 32K context window, claiming top-of-class retrieval among sub-100M models.
IBM published Granite Embedding R2 on 14 May 2026, releasing two open-weights models under the Apache 2.0 licence [1]. The smaller variant has 97M parameters with 384-dimensional embeddings; the larger has 311M parameters and 768-dimensional embeddings [1]. Both share a 32,768-token context window — a 64× increase over the R1 generation [1].
Both models use a ModernBERT encoder — 12 layers for the 97M, 22 layers for the 311M — paired with Rotary Position Embeddings and Flash Attention 2.0 [1]. R1 used XLM-RoBERTa with a 512-token context; R2's architecture shift is substantial.
On MTEB Multilingual Retrieval, the 97M model scores 60.3, which IBM presents as the best-in-class score for sub-100M-parameter models — a 9.4-point gap over multilingual-e5-small (50.9, at 118M params) [1]. The 311M model scores 65.2 and lands second among sub-500M models, trailing only harrier-oss-v1-270m at 66.4 [1]. On the LongEmbed benchmark, designed for long-document retrieval, the 311M model takes first place at 71.7 [1].
The release supports 200+ languages overall, with 52 receiving enhanced training (the explicit list spans Albanian to Vietnamese), plus 9 programming languages including Python, Rust... actually, the listed code coverage is Python, Go, Java, JavaScript, PHP, Ruby, SQL, C and C++ [1].
Operational headlines: on an H100, the 97M model encodes ~2,500 documents/second; the 311M does ~1,800 docs/sec — the team reports this is "over 5.5× the encoding speed" of jina-embeddings-v5-text-nano (212M, MTEB 63.3) [1]. The 311M model also supports Matryoshka Representation Learning: practitioners can truncate the 768-dim output to 256 dimensions for a 3× storage reduction at a cost of only 0.5 MTEB Multilingual points [1].
Training methodology: knowledge distillation from Granite 3.3 Instruct, Mistral v0.2 Instruct and Granite 4.1 8B; contrastive fine-tuning on multilingual retrieval pairs; model merging across stages [1]. IBM notes it deliberately excluded the MS-MARCO training dataset and any data with non-commercial licensing — a meaningful commitment for downstream commercial users [1].
Two practical things to flag. First, the 97M model does **not** support Matryoshka truncation — only the 311M does. Second, the 97M shows a slight cross-lingual regression on Belebele (-2.2 vs R1) attributed to vocabulary pruning from 250K to 180K tokens — for workloads sensitive to less-resourced languages, benchmark the 97M against R1 directly before swapping.
Weights and code are on Hugging Face at `ibm-granite/granite-embedding-{97m,311m}-multilingual-r2`; the technical report is arXiv 2605.13521 [2].