Sparse attention – definition
• A way for AI models to look at only the most useful parts of text instead of everything at once.
• It skips less relevant tokens to save time and memory.
Why it matters
- Sparse attention can make long-context tasks faster and cheaper without a big hit to quality. If you’re summarizing a 100-page PDF, it may finish quicker and use less GPU or phone memory. Many modern large-context models use it to scale beyond 100K tokens. The trade-off: if the selection is too sparse, the model might miss details or nuance.
Also called:
- sparse transformer, block-sparse attention, local-global attention
Short excerpt:
- Sparse attention speeds up long-context AI by focusing compute on the most relevant tokens.
0 - 9
A
B
C
D
E
H
I
L
O
P
S
