Skip to content

Webfetch

Extracts main content from web pages, filtering navigation, ads, and boilerplate.

Short alias: wf

Highlights

  • Clean content extraction filtering navigation and ads
  • Multiple output formats (markdown, text, json)
  • Batch processing with concurrent execution
  • Output truncation with max_length parameter
  • URL validation with helpful error messages
  • JSON-structured errors when using json output format
  • Optional HTTP response metadata
  • Non-HTML content (plain text, JSON, XML, CSV) returned directly without extraction

Functions

Function Description
webfetch.fetch(url, ...) Fetch and extract content from a URL
webfetch.fetch_batch(urls, ...) Fetch multiple URLs concurrently

Key Parameters

Parameter Type Description
url str URL to fetch content from
output_format str "markdown" (default), "text", "json"
include_links bool Include links in output
include_images bool Include image references
include_tables bool Include tables in output (default: True)
include_comments bool Include comments section
include_formatting bool Preserve headers/lists (default: True)
include_metadata bool Include HTTP metadata in JSON output
favor_precision bool Prefer accuracy over completeness
favor_recall bool Prefer completeness over accuracy
fast bool Skip fallback extraction for speed
target_language str Filter by ISO 639-1 language code
max_length int Truncate output to this length
timeout float Request timeout in seconds (defaults to config)
use_cache bool Use cached pages (default: True)

Note: favor_precision and favor_recall are mutually exclusive.

Configuration

Required

  • No required tools.webfetch settings.

Optional

Key Type Default Description
tools.webfetch.timeout float 30.0 Request timeout in seconds. Range: 1.0-120.0.
tools.webfetch.max_length int 50000 Max extracted content length in characters. Range: 1000-500000.
tools:
  webfetch:
    timeout: 30.0
    max_length: 50000

Defaults

  • If tools.webfetch is omitted, web fetch uses the built-in timeout and max length shown above.

Examples

# Fetch single URL
webfetch.fetch(url="https://example.com/article")

# Fetch with markdown output
webfetch.fetch(url="https://docs.python.org/3/tutorial/", output_format="markdown")

# Fast mode without fallback
webfetch.fetch(url="https://example.com/page", fast=True)

# JSON output with metadata
webfetch.fetch(
    url="https://example.com/article",
    output_format="json",
    include_metadata=True
)

# Precision mode for cleaner extraction
webfetch.fetch(url="https://example.com/page", favor_precision=True)

# Batch fetch multiple URLs
webfetch.fetch_batch(urls=[
    "https://example.com/page1",
    "https://example.com/page2"
])

# Batch with all options
webfetch.fetch_batch(
    urls=["https://example.com/page1", "https://example.com/page2"],
    include_links=True,
    favor_precision=True,
    fast=True
)

# Fetch plain text or JSON files (returned directly without extraction)
webfetch.fetch(url="https://example.com/data.json")
webfetch.fetch(url="https://example.com/robots.txt")