At a Glance
- TRIBE v2 trained on 451.6 hours of fMRI recordings from 25 subjects, then evaluated across 1,117.7 hours from 720 subjects
- Zero-shot predictions work across new individuals, unseen languages, and entirely different tasks without any retraining
- Spatial resolution jumps 70-fold over previous neural decoding baselines
- Open-source release includes model weights, codebase, paper, and interactive demo under a CC BY-NC license
Predicting how your brain fires in response to a film clip, a podcast, or a block of text, without running a single scan on you, is what TRIBE v2 actually does. Meta’s FAIR team released this tri-modal foundation model on March 26, 2026, and the architecture’s implications stretch from neurological disease research into how AI systems might eventually validate their cognition against biological baselines. Most coverage focuses on the 70x resolution figure. The zero-shot generalization across subjects, languages, and tasks is the capability that fundamentally changes what neuroscience labs can now do.
Why 70x Resolution Is Not the Whole Story
Seventy-fold resolution improvement over prior state-of-the-art is not a marginal refinement. But many analyses underweight something rarer: TRIBE v2 predicts brain responses for subjects it has never encountered, in languages it was never trained on, with no additional calibration.
That zero-shot capability means a research lab can run virtual fMRI experiments on entirely new subject populations and new stimulus languages without acquiring fresh scan data. No competing model published before March 2026 demonstrates this cross-lingual, cross-subject generalization at this scale.
From Four Subjects to 720: The Data Gap TRIBE v2 Closes
Meta’s predecessor model, which won the Algonauts 2025 challenge, was trained on low-resolution fMRI recordings from just four individuals. TRIBE v2 trained on 451.6 hours of fMRI data from 25 subjects across four naturalistic studies covering movies, podcasts, and silent videos.
Evaluation then scaled to a much broader corpus: 1,117.7 hours from 720 subjects. That two-stage design, deep training on focused subjects, wide evaluation across a diverse cohort, is what separates TRIBE v2’s generalization performance from prior narrow-paradigm models.
How the Architecture Actually Works
TRIBE v2 stands for TRansfomer for In-silico Brain Experiments, version 2. Three frozen foundation models serve as feature extractors: V-JEPA2-Giant for video (64-frame segments spanning 4 seconds per time-bin), Wav2Vec-BERT 2.0 for audio (resampled to 2 Hz), and LLaMA 3.2-3B for text (preceding 1,024 words prepended per word for temporal context).
All three streams compress into a shared 384-dimension space, concatenated to form a 1,152-dimension multi-modal time series. A Transformer encoder with 8 layers and 8 attention heads then processes this across a 100-second window. A subject-specific prediction block at the output maps latent representations onto 20,484 cortical vertices and 8,802 subcortical voxels.
Zero-Shot Performance: The Number That Matters
On the Human Connectome Project (HCP) 7T dataset, TRIBE v2 achieved a group correlation near 0.4, a two-fold improvement over the median individual subject’s group-predictivity score. In practical terms, the model’s zero-shot prediction of group-averaged brain responses is more accurate than the actual fMRI recordings of many individual participants within that group.
Fine-tuning changes the picture further. Give TRIBE v2 at most one hour of fMRI data from a new participant and fine-tune for a single epoch. The result is a two- to four-fold improvement over linear models trained from scratch on the same data.
In-Silico Neuroscience: What Virtual Experiments Actually Recover
Running virtual experiments replaces physical fMRI scans for hypothesis screening. On the Individual Brain Charting dataset, TRIBE v2 correctly recovered the fusiform face area (FFA) and parahippocampal place area (PPA) for vision processing, Broca’s area for syntax, and the temporo-parietal junction (TPJ) for emotional processing, all through computational simulation alone.
Applying Independent Component Analysis to the model’s final layer revealed something the team didn’t explicitly train for: TRIBE v2 naturally organized its internal representations into five well-known functional networks primary auditory, language, motion, default mode, and visual. That emergent biological structure in a deep learning model is the detail most reviewers pass over entirely, and it matters more than the headline resolution number.
Scaling Laws: What This Predicts for Future Versions
The research team observed log-linear scaling throughout their experiments. More fMRI training data produces predictably higher encoding accuracy, with no performance plateau visible in current benchmarks.
That means TRIBE v2’s ceiling is an open question, not a fixed limit. If neuroimaging repositories expand from hundreds of subjects to thousands, the architecture should scale with them.
Where It Falls Short
Trade-Offs Worth Knowing
TRIBE v2 predicts population-averaged brain responses with high accuracy, but the model’s zero-shot layer was validated against normative cohorts. Subjects with atypical neurological profiles fall outside the training distribution, which limits direct applicability for clinical research on conditions like acquired brain injury or neurodevelopmental disorders without targeted fine-tuning. Meta has not published inference latency benchmarks outside research-environment computing conditions.
5 Research Applications TRIBE v2 Opens Right Now
- Pre-screen neuroimaging study hypotheses in silico before committing to scanner sessions
- Predict cross-lingual brain responses without recruiting new subject pools for each language
- Fine-tune on one hour of new participant data to achieve two- to four-fold gains over baseline linear models
- Generate training signals for AI systems by comparing predicted vs. biological neural activation patterns
- Recover established functional landmarks like FFA, PPA, and Broca’s area through virtual experiments on the IBC dataset
Open-Source Release: What Researchers Actually Get
Meta released the model weights on Hugging Face, the full codebase on GitHub, the research paper, and an interactive demo at aidemos.atmeta.com/tribev2, all on March 26, 2026. The CC BY-NC license permits research use and modification without a commercial agreement.
But open weights are not the same as accessible compute. Running inference on high-resolution fMRI datasets at the scale TRIBE v2 operates requires significant GPU infrastructure. For smaller research institutions, that remains the practical constraint between access and adoption.

