System Architecture
CiteMap operates on a three-stage pipeline designed to maximize signal and minimize noise for LLMs.
1. The Miner (Ingest)
The Miner is a headless browser service (Playwright) or a static file parser.
- Input: URL list or Local Markdown directory.
- Operation: It traverses the content, identifying "Integration Nodes" (e.g., "How to use X with Y").
- Output: Raw Content Blocks.
2. The Engine (Optimize)
The Engine processes the raw blocks into a Semantic Manifest.
- Token Optimization: Strips HTML tags, CSS classes, and non-content DOM elements.
- Vectorization: (Optional) Embeds the content for semantic drift detection.
- Output:
manifest.json.
3. The Broadcaster (Serve)
The final stage serves the content to agents.
/llms.txt: The "Robots.txt for Agents"./contexts/*.md: The actual Answer Surfaces.