Webfetch¶

Extracts main content from web pages, filtering navigation, ads, and boilerplate.

Short alias: wf

Highlights¶

Clean content extraction filtering navigation and ads
Multiple output formats (markdown, text, json)
Batch processing with concurrent execution
Non-HTML content (plain text, JSON, XML, CSV) returned directly without extraction

Functions¶

Function	Description
`webfetch.fetch(url, ...)`	Fetch and extract content from a URL
`webfetch.fetch_batch(urls, ...)`	Fetch multiple URLs concurrently

Key Parameters¶

Parameter	Type	Description
`url`	str	URL to fetch content from
`output_format`	str	"markdown" (default), "text", "json", "html"
`include_links`	bool	Include links in output
`include_images`	bool	Include image references
`include_tables`	bool	Include tables in output (default: True)
`include_comments`	bool	Include comments section
`include_formatting`	bool	Preserve headers/lists (default: True)
`include_metadata`	bool	Include HTTP metadata in JSON output
`favor_precision`	bool	Prefer accuracy over completeness
`favor_recall`	bool	Prefer completeness over accuracy
`fast`	bool	Skip fallback extraction for speed
`target_language`	str	Filter by ISO 639-1 language code
`max_length`	int	Truncate output to this length
`timeout`	float	Request timeout in seconds (defaults to config)
`use_cache`	bool	Use cached pages (default: True)

Note: favor_precision and favor_recall are mutually exclusive.

Configuration¶

Required¶

No required tools.webfetch settings.

Optional¶

Key	Type	Default	Description
`tools.webfetch.timeout`	float	`30.0`	Request timeout in seconds. Range: `1.0-120.0`.
`tools.webfetch.max_length`	int	`50000`	Max extracted content length in characters. Range: `1000-500000`.

tools:
  webfetch:
    timeout: 30.0
    max_length: 50000

Defaults¶

If tools.webfetch is omitted, web fetch uses the built-in timeout and max length shown above.

Examples¶

# Fetch single URL
webfetch.fetch(url="https://docs.python.org/3/library/json.html")

# Fetch with markdown output
webfetch.fetch(url="https://docs.python.org/3/tutorial/", output_format="markdown")

# Fast mode without fallback
webfetch.fetch(url="https://fastapi.tiangolo.com/tutorial/", fast=True)

# JSON output with metadata
webfetch.fetch(
    url="https://docs.astral.sh/uv/getting-started/",
    output_format="json",
    include_metadata=True
)

# Precision mode for cleaner extraction
webfetch.fetch(url="https://pydantic-docs.helpmanual.io/concepts/models/", favor_precision=True)

# Batch fetch multiple URLs
webfetch.fetch_batch(urls=[
    "https://docs.python.org/3/library/asyncio.html",
    "https://fastapi.tiangolo.com/tutorial/first-steps/"
])

# Batch with all options
webfetch.fetch_batch(
    urls=["https://docs.python.org/3/library/typing.html", "https://docs.pydantic.dev/latest/"],
    include_links=True,
    favor_precision=True,
    fast=True
)

# Fetch plain text or JSON files (returned directly without extraction)
webfetch.fetch(url="https://pypi.org/pypi/requests/json")
webfetch.fetch(url="https://docs.python.org/robots.txt")

Based on¶

This tool is based on trafilatura by Adrien Barbaresi, licensed under Apache 2.0.