Skip to content

Web Fetch

Clean content from any URL. No noise.

Extracts main content from web pages, filtering navigation, ads, and boilerplate.

Highlights

  • Clean content extraction filtering navigation and ads
  • Multiple output formats (markdown, text, json)
  • Batch processing with concurrent execution
  • Output truncation with max_length parameter

Functions

Function Description
web.fetch(url, ...) Fetch and extract content from a URL
web.fetch_batch(urls, ...) Fetch multiple URLs concurrently

Key Parameters

Parameter Type Description
url str URL to fetch content from
output_format str "markdown" (default), "text", "json"
include_links bool Include links in output
include_images bool Include image references
include_tables bool Include tables in output
include_formatting bool Preserve headers/lists (default: True)
fast bool Skip fallback extraction for speed
max_length int Truncate output to this length

Examples

# Fetch single URL
web.fetch(url="https://example.com/article")

# Fetch with markdown output
web.fetch(url="https://docs.python.org/3/tutorial/", output_format="markdown")

# Fast mode without fallback
web.fetch(url="https://example.com/page", fast=True)

# Batch fetch multiple URLs
web.fetch_batch(urls=[
    "https://example.com/page1",
    "https://example.com/page2"
])

Source

trafilatura