Content Ingestion
Цей контент ще не доступний вашою мовою.
Your vault doesn’t have to grow one pattern at a time. Content ingestion lets you feed entire documents into the agent — articles, meeting transcripts, PDF books, documentation pages — and the agent extracts knowledge items, classifies them, deduplicates against your existing vault, and stores what’s new.
Ingesting a URL
Section titled “Ingesting a URL”Found an article worth remembering? Feed it directly:
You: “Ingest this article: https://example.com/distributed-systems-patterns”
Agent: Fetched and processed. 4 entries extracted, 1 duplicate skipped.
- Circuit Breaker Pattern (pattern, distributed-systems)
- Bulkhead Isolation (pattern, distributed-systems)
- Retry with Exponential Backoff (pattern, distributed-systems)
Timeout Best Practices— already in vault
The agent fetches the page, extracts text, sends it through an LLM for classification, and checks each extracted item against your vault’s content hashes. Duplicates are skipped automatically.
You can specify a domain and tags to organize the results:
You: “Ingest https://example.com/k8s-security with domain: infrastructure, tags: kubernetes, security”
Agent: 3 entries extracted and tagged.
Ingesting text
Section titled “Ingesting text”For content that isn’t at a URL — meeting notes, copied text, transcripts:
You: “Ingest this transcript from our architecture review meeting…”
Agent: Processed as transcript. 5 entries extracted.
Source types help the LLM classify content more accurately:
| Source type | Use for |
|---|---|
| article | Blog posts, published articles |
| transcript | Meeting recordings, podcast transcripts |
| notes | Personal notes, quick captures |
| documentation | Technical docs, API references, READMEs |
The agent uses source type as context for extraction — a transcript might yield decisions and action items, while documentation yields patterns and conventions.
Batch ingestion
Section titled “Batch ingestion”When you have multiple items to ingest at once:
You: “Ingest these three items:
- Our coding standards doc (text: ’…’)
- The accessibility checklist (text: ’…’)
- Meeting notes from sprint retro (text: ’…’)”
Agent: Batch complete: 3 sources processed, 11 entries extracted, 2 duplicates skipped.
Each item in a batch has its own title, source type, domain, and tags. Items are processed sequentially so deduplication works across the batch — if item 2 would create a duplicate of something item 1 just added, it’s caught.
Ingesting books (PDF)
Section titled “Ingesting books (PDF)”For longer documents like PDF books, the agent uses a chunked pipeline:
Step 1: Start the ingestion job
Section titled “Step 1: Start the ingestion job”You: “Ingest this book: /path/to/design-systems.pdf, title: ‘Design Systems Handbook’, domain: design, author: ‘A. Smith’”
Agent: Job created: job-abc123. 24 chunks ready (10 pages each). Process chunks to extract knowledge.
The PDF is parsed, hashed (so re-ingesting the same file is detected), and split into page-window chunks. Nothing is extracted yet — this step just prepares the pipeline.
Step 2: Process chunks
Section titled “Step 2: Process chunks”Process chunks in batches — this is where the LLM extracts and classifies knowledge:
You: “Process 5 chunks of job-abc123”
Agent: Processed chunks 1-5. 18 entries extracted, 3 duplicates. 19 chunks remaining.
You can process all chunks at once or in smaller batches. Smaller batches give you a chance to review intermediate results and stop early if the quality drops.
Step 3: Check status
Section titled “Step 3: Check status”You: “What’s the status of intake jobs?”
Agent: 1 job:
- job-abc123 — Design Systems Handbook: 5/24 chunks processed, 18 entries extracted
Step 4: Preview before committing
Section titled “Step 4: Preview before committing”If you want to see what the pipeline would extract from a specific page range without storing anything:
You: “Preview pages 50-60 of design-systems.pdf”
Agent: Preview (not stored): 4 potential entries found in pages 50-60.
What happens during ingestion
Section titled “What happens during ingestion”Every ingestion path follows the same core pipeline:
- Extract text — from URL, raw text, or PDF pages
- Classify via LLM — identify patterns, anti-patterns, decisions, conventions
- Deduplicate — content-hash comparison against existing vault entries
- Store — new entries go into the vault with domain, tags, and source metadata
The LLM does the heavy lifting of turning unstructured text into structured knowledge items. You don’t need to manually tag or categorize — the agent infers type, severity, and domain from context.
Tips for good ingestion
Section titled “Tips for good ingestion”- Set a domain — it gives the LLM classification context and keeps your vault organized
- Use accurate source types — a transcript is processed differently than documentation
- Add tags — tags applied at ingestion time propagate to all extracted entries
- Preview first for books — check a small page range before processing the whole thing
- Don’t worry about duplicates — the dedup pipeline handles them automatically
Related guides
Section titled “Related guides”- Building a Knowledge Base — understand patterns and anti-patterns before bulk ingestion
- Entry Linking & Knowledge Graph — link ingested entries to existing knowledge for better discovery
- Knowledge Review Workflow — submit ingested entries for team review before they go live
- Capabilities — full list of ingestion operations
- API Reference — parameter details for
ingest_url,ingest_text,ingest_batch,intake_ingest_book
Previous: Entry Linking & Knowledge Graph — connect entries with typed links. Next: Knowledge Review Workflow — team quality control for vault entries.