
Here’s a counterintuitive truth: as AI context windows grow larger, the smartest systems are getting better at using less context, not more.
While models now support up to 1 million token context windows, Cursor just achieved a 46.9% token reduction in their coding agents—and improved performance in the process. The secret? Dynamic Context Discovery, an approach that treats information like a lazy-loading system rather than dumping everything into the AI’s attention span at once.
This shift represents a fundamental change in how we think about AI context management, with profound implications for anyone building knowledge systems that work with AI.
The Context Paradox
Traditional AI systems suffer from what we might call “context obesity.” They front-load massive amounts of information into every conversation, leading to:
- Token budget bloat - paying for unused information in every API call
- Information overload - critical details get buried in noise
- Reduced performance - more context doesn’t always mean better results
- Higher costs - every unused token still costs money
The assumption was simple: bigger context windows mean better AI. But recent research reveals the opposite. Context engineering—the art and science of curating what goes into the limited context window—often matters more than raw capacity.
How Dynamic Context Discovery Works
Instead of including everything upfront, dynamic context discovery treats information as discoverable resources that only consume tokens when actually accessed. Think of it as “lazy loading” for AI context.
Cursor’s implementation reveals five key techniques:
Converting Tool Outputs to Files: Rather than injecting large data directly into prompts, write information to files and give agents tools to selectively read what they need. Files enable lazy loading—information exists and is discoverable, but doesn’t consume tokens until accessed.
Chat History as Files: Previous conversations get stored as files rather than included in every new context window.
Selective Tool Loading: Instead of loading full descriptions for dozens of tools upfront, agents initially receive only tool names, then look up full descriptions only when needed. This single change drove that 46.9% token reduction.
Agent Skills on Demand: Supporting structured skills that can be loaded when relevant, not loaded by default.
Terminal Sessions as Files: Treating long-running outputs as files rather than inline context.
Beyond Token Savings: Better Results
The surprising discovery isn’t just efficiency—it’s that less context often produces better results.
Windsurf found that combining multiple retrieval techniques (embedding search, grep, knowledge graphs, AST parsing) achieved 3x better retrieval accuracy than single methods. But the key insight was selectivity: only retrieve what’s relevant for the current task.
JetBrains Research discovered that simple observation masking (hiding irrelevant previous outputs) wasn’t just 52% cheaper—it boosted solve rates by 2.6% compared to unmanaged context. Sometimes the simplest approach beats sophisticated AI-powered summarization.
The Knowledge Management Connection
This research validates something we’ve believed at Basic Memory: the future of AI-human collaboration isn’t about feeding AI everything you know. It’s about creating systems where AI can intelligently discover what it needs from your knowledge when it needs it.
Consider how this maps to personal knowledge management:
Files as First-Class Context: When your knowledge lives in readable files (like Markdown), AI can selectively retrieve relevant information rather than processing your entire knowledge base every time. Your notes become lazy-loadable context.
Semantic Connections Over Brute Force: Instead of dumping everything into context, smart systems help AI find relevant connections between your ideas. The goal is intelligent retrieval, not information overload.
Local-First Advantages: Research shows that local caching provides 10x cost reduction when properly implemented. When your knowledge lives locally, AI can access it efficiently without API costs for storage, and you maintain complete control over what gets shared.
What This Means for Your Workflow
The implications extend far beyond coding agents. If you’re using AI for research, writing, or knowledge work, dynamic context discovery principles can transform your workflow:
Start with Structure: Organize your knowledge in discoverable files rather than monolithic documents. Break large topics into focused, interconnected pieces.
Make Context Explicit: Instead of hoping AI remembers everything from a long conversation, create reference documents it can selectively access. Your meeting notes, project requirements, and research findings become retrievable resources.
Think Connections, Not Collections: Focus on how your ideas connect rather than just accumulating information. Semantic relationships help AI find relevant context without processing everything.
Embrace Transparency: Use systems where you can see what context AI is accessing. Black-box vector databases make it impossible to understand or debug AI’s knowledge retrieval.
The Bigger Picture
Dynamic Context Discovery represents a maturation in AI system design. We’re moving from “feed the AI everything” to “help the AI find what it needs.” This shift mirrors broader trends in software engineering—from monolithic applications to microservices, from loading everything upfront to just-in-time resource allocation.
For knowledge workers, this means the tools that will thrive are those that make your knowledge discoverable and selectively accessible, not those that try to cram everything into every conversation.
The future isn’t bigger context windows. It’s smarter context discovery.
Your knowledge deserves a system that works as intelligently as you do—one that helps AI find the right information at the right time, without overwhelming either of you in the process.
Download Basic Memory for free at GitHub or try Basic Memory Cloud free for 7 days at basicmemory.com.
Further Reading: Dynamic Context Discovery for Production Coding Agents - Cursor’s original blog post by Jediah Katz