Skip to content

Knowledge

Portable SQLite knowledge bases with hybrid FTS5+vector search and AI synthesis. Short alias: kb.

Highlights

  • Hybrid FTS5+vector search via knowledge.search() — keyword, semantic, or combined (hybrid) modes
  • AI synthesis via knowledge.ask() — retrieves relevant chunks then synthesises a concise answer with source citations
  • Personal annotations via knowledge.write() — store rules, notes, and mistakes alongside indexed content
  • Link-graph traversal via knowledge.related() — follow markdown hyperlinks between topics

Functions

Function Description
knowledge.write(topic, content, db, ...) Write a personal annotation to the knowledge database
knowledge.read(topic, db, ...) Read a single entry by topic
knowledge.update(topic, db, ...) Update an existing entry
knowledge.append(topic, db, ...) Append content to an existing entry
knowledge.delete(topic, db, ...) Delete an entry by topic
knowledge.search(q, db, ...) Hybrid FTS5+vector search (mode: hybrid/semantic/keyword)
knowledge.ask(q, db, ...) Retrieve relevant chunks and synthesise an AI answer
knowledge.grep(pattern, db, ...) Regex/text search across all content
knowledge.related(topic, db, ...) Find entries linked from or to a given topic
knowledge.list(db, ...) List entries (returns meta only, no content)
knowledge.toc(db, ...) Display table of contents for a database or topic prefix
knowledge.slice(topic, db, ...) Extract a section from a large entry
knowledge.stats(db) Chunk counts, embedding coverage, and file size
knowledge.info(db) Database metadata, path, and version
knowledge.dbs() List all configured knowledge databases

Key Parameters

Parameter Type Description
q str Search or question text
db str Database name (as configured under tools.knowledge.kb)
topic str Entry topic path (e.g. python/tips/generators)
mode str Search mode: "hybrid" (default), "semantic" (vector-only), "keyword" (FTS5-only)
k int Max results (default: config.search_limit)
category str Entry category filter — one of: reference, rule, note, mistake
source str Filter by meta.source prefix
direction str For knowledge.related(): "out" (links from topic), "in" (links to topic), "both"
depth int For knowledge.related(): traversal depth (default: 1)

Requires

  • OPENAI_API_KEY in secrets.yaml (for embeddings and AI synthesis)
  • onetool-mcp[util] extra (provides sqlite-vec and python-frontmatter)

Configuration

Required

  • OPENAI_API_KEY must be set in secrets.yaml for embeddings and knowledge.ask().

Optional

Key Type Default Description
tools.knowledge.model string "" OpenAI embedding model. Falls back to llm.embedding_model; built-in default: text-embedding-3-small.
tools.knowledge.base_url string "" OpenAI-compatible API base URL. Empty = inherit from top-level llm.base_url.
tools.knowledge.dimensions int 1536 Embedding dimensions. Must match the configured model.
tools.knowledge.max_embedding_tokens int 8191 Max tokens per embedding input.
tools.knowledge.embedding_batch_size int 200 Texts per embeddings API call. Range: 1-2048.
tools.knowledge.search_limit int 10 Default max search results. Range: 1-100.
tools.knowledge.search_extract int 300 Character limit for content extract in search results (0 = full).
tools.knowledge.enrich_model string "" LLM model for knowledge.ask() synthesis. Empty = falls back to top-level llm.model.
tools.knowledge.min_chunk_chars int 200 Minimum body characters per chunk. Chunks below threshold are merged. 0 disables.

Project registry (under tools.knowledge.kb):

tools:
  knowledge:
    model: text-embedding-3-small
    base_url: ""
    dimensions: 1536
    search_limit: 10
    search_extract: 300
    enrich_model: ""
    min_chunk_chars: 200
    kb:
      docs:
        db:
          path: kb/docs.db
          description: Scraped documentation
          embeddings_enabled: true
        scrape:
          output_base_dir: /path/to/scraped/docs
          sources:
            python:
              url: https://docs.python.org/3/
              url_prefix: /3/

Defaults

  • If tools.knowledge.base_url is empty, it inherits from the top-level llm.base_url.
  • If tools.knowledge.model is empty, it inherits from llm.embedding_model.
  • If tools.knowledge.enrich_model is empty, it falls back to llm.model.

Examples

# Search a knowledge base (hybrid FTS5+vector)
knowledge.search(q='context managers', db='docs')

# Keyword-only search with more results
knowledge.search(q='yield generator', db='docs', mode='keyword', k=20)

# AI synthesis — retrieves relevant chunks then answers
knowledge.ask(q='How do I configure authentication?', db='docs')

# Write a personal annotation
knowledge.write(topic='python/tips/loops', content='Use enumerate() for index access', db='docs', category='rule')

# Grep for a pattern across all content
knowledge.grep(pattern='def __init__', db='docs')

# Follow related topics via link graph
knowledge.related(topic='python/asyncio/tasks', db='docs', direction='out', depth=2)

# List all configured databases
knowledge.dbs()

# Check database stats
knowledge.stats(db='docs')

# Read a specific entry
knowledge.read(topic='python/tips/loops', db='docs')

CLI

The onetool kb command group handles offline knowledge base operations (scraping, indexing, and maintenance). All subcommands auto-detect onetool.yaml from the current directory.

Global options

Option Description
-c, --config PATH Path to onetool.yaml (auto-detected from ./onetool.yaml or .onetool/onetool.yaml)
-s, --secrets PATH Path to secrets file (auto-detected alongside config if omitted)

onetool kb scrape

Crawl all sources in a scrape project. Requires the onetool-mcp[scrape] extra and playwright install chromium.

onetool kb scrape <project> [OPTIONS]
Option Description
--only TEXT Comma-separated source names to run (runs all if omitted)
--resume Resume each source from .state.json if present
--max-pages INT Hard limit on pages written per source (overrides config)
--flat-files / --no-flat-files Write flat :: -separated files instead of subdirectories
--debug Write per-page debug artifacts (cleaned.html, raw.html, screenshot.png, meta.json) to ._debug/<slug>/
onetool kb scrape docs
onetool kb scrape docs --only python,stdlib --max-pages 200
onetool kb scrape docs --resume

onetool kb index

Index a project's scraped content into the knowledge database.

onetool kb index <project> [OPTIONS]
Option Description
--path PATH Directory to index (overrides project's output_base_dir)
--overwrite TEXT skip (default) or update
onetool kb index docs
onetool kb index docs --overwrite update
onetool kb index docs --path /tmp/scraped

onetool kb reindex

Backfill missing embeddings for all chunks in an existing database.

onetool kb reindex <db>
onetool kb reindex docs

onetool kb stats

Print chunk counts, embedding coverage, and file size.

onetool kb stats <db>

onetool kb info

Print database metadata, path, and version.

onetool kb info <db>

onetool kb export

Export all chunks (or a filtered subset) to a JSON file.

onetool kb export <db> --output <path> [OPTIONS]
Option Description
-o, --output PATH Output JSON file path (required)
--category TEXT Filter by category
--topic TEXT Filter by topic prefix
onetool kb export docs --output docs-dump.json
onetool kb export docs --output rules.json --category rule