NEW Browse AI tools across categories — updated daily. See what's new →

Crawl

Extract and save website content as markdown files for offline access and analysis. Supports configurable crawl depth (1-5 levels), breadth limits, and page caps to balance coverage against perform...

Authortavily-ai
Version1.0.0
LicenseMIT
Token count~1,272
UpdatedJun 5, 2026

Extract and save website content as markdown files for offline access and analysis. Supports configurable crawl depth (1-5 levels), breadth limits, and page caps to balance coverage against performance Includes path filtering via regex patterns to focus on specific sections and exclude irrelevant content Offers two modes: full-page extraction for data collection, or semantic chunking with natural language instructions for feeding results into LLM context Provides a companion Map API for URL...

Install

Quick install

via npx skills · works with 57+ agents
npx skills add https://github.com/tavily-ai/skills/tree/HEAD/skills/crawl
Or pick agent:
npx skills add tavily-ai/skills --skill crawl --agent claude-code
npx skills add tavily-ai/skills --skill crawl --agent cursor
npx skills add tavily-ai/skills --skill crawl --agent codex
npx skills add tavily-ai/skills --skill crawl --agent opencode
npx skills add tavily-ai/skills --skill crawl --agent github-copilot
npx skills add tavily-ai/skills --skill crawl --agent windsurf
More install options

Shorthand — useful for multi-skill repos:

npx skills add tavily-ai/skills --skill crawl

Manual — clone the repo and drop the folder into your agent's skills directory:

git clone https://github.com/tavily-ai/skills.git
cp -r skills/skills/crawl ~/.claude/skills/
How to use: Once installed, ask your agent to "use the crawl skill" or describe what you want (e.g. "Extract and save website content as markdown files for offline access and analys"). Requires Node.js 18+.

crawl

Extract and save website content as markdown files for offline access and analysis. Supports configurable crawl depth (1-5 levels), breadth limits, and page caps to balance coverage against performance Includes path filtering via regex patterns to focus on specific sections and exclude irrelevant content Offers two modes: full-page extraction for data collection, or semantic chunking with natural language instructions for feeding results into LLM context Provides a companion Map API for URL...

crawlby tavily-ai

Extract and save website content as markdown files for offline access and analysis. Supports configurable crawl depth (1-5 levels), breadth limits, and page caps to balance coverage against performance Includes path filtering via regex patterns to focus on specific sections and exclude irrelevant content Offers two modes: full-page extraction for data collection, or semantic chunking with natural language instructions for feeding results into LLM context Provides a companion Map API for URL...

npx skills add https://github.com/tavily-ai/skills --skill crawlDownload ZIPGitHub

More skills from tavily-ai

extractby tavily-aiExtract clean content from specific URLs using Tavily's extraction API. Supports up to 20 URLs per request with optional query-based reranking to focus on relevant content chunks Two extraction modes: basic for fast text extraction, advanced for JavaScript-rendered pages and structured data Automatic OAuth authentication via browser on first run, or manual API key configuration in settings Returns markdown or plain text format with optional image URLs and configurable timeout up to 60 secondsresearchby tavily-aiComprehensive research on any topic with automatic source gathering, analysis, and citations. Conducts multi-source web research with explicit citations, ideal for comparisons, current events, market analysis, and detailed reports Offers three model options: mini for targeted single-topic research (~30s), pro for comprehensive multi-angle analysis (~60-120s), and auto for API-driven complexity detection Authenticates via OAuth through Tavily MCP server with automatic browser-based login on...searchby tavily-aiWeb search with LLM-optimized results, relevance scoring, and flexible filtering. Supports four search depth modes (ultra-fast, fast, basic, advanced) with configurable latency and relevance tradeoffs Includes domain filtering, time range constraints, date ranges, country boosting, and raw content extraction Returns results with title, URL, content snippet, and relevance score; optional image results and favicons Automatic OAuth authentication via Tavily MCP server or API key configuration;...tavily-best-practicesby tavily-aiWeb search API for LLMs with real-time data access, content extraction, site crawling, and AI-powered research. Five core methods: search() for web results, extract() for URL content, crawl() for site-wide extraction, map() for URL discovery, and research() for end-to-end AI synthesis Supports Python and JavaScript SDKs with async clients for parallel queries and configurable search depth (ultra-fast/fast/basic/advanced) Crawl method accepts semantic instructions to focus extraction on...tavily-cliby tavily-aiWeb search, content extraction, site crawling, and deep research via Tavily CLI. Five command modes covering search, extraction, URL discovery, bulk crawling, and multi-source research with citations All commands support JSON output and file saving for structured, agentic workflows Escalation pattern guides you from simple search through extraction, mapping, crawling, to comprehensive research based on your needs Requires tavily-cli installation and API key authentication via tvly logintavily-crawlby tavily-aiMulti-page website crawler with semantic filtering and markdown export. Crawl entire site sections with depth and breadth control; filter by path regex, domain, or natural language instructions to focus results Save each page as local markdown files via --output-dir , or return structured JSON for agentic processing Use semantic instructions with chunk extraction to prevent context bloat when feeding results to LLMs; use full-page extraction for offline documentation downloads Supports...tavily-dynamic-searchby tavily-aiSearch the web, filter results, and extract content so that raw search data never enters your context window . Only your curated print() output comes back.tavily-extractby tavily-aiExtract clean markdown or text from up to 20 URLs, with JavaScript rendering and query-focused chunking support. Handles JavaScript-rendered pages with configurable extraction depth (basic for simple pages, advanced for dynamic SPAs and tables) Supports query-focused extraction to return only relevant content chunks instead of full pages Returns LLM-optimized markdown by default, with options for plain text format and structured JSON output Processes up to 20 URLs in a single call;...

---

Source: https://github.com/tavily-ai/skills/tree/HEAD/skills/crawl
Author: tavily-ai
Discovered via: mcpservers.org

SKILL.md source

---
name: crawl
description: Extract and save website content as markdown files for offline access and analysis. Supports configurable crawl depth (1-5 levels), breadth limits, and page caps to balance coverage against perform...
---

# crawl

Extract and save website content as markdown files for offline access and analysis. Supports configurable crawl depth (1-5 levels), breadth limits, and page caps to balance coverage against performance Includes path filtering via regex patterns to focus on specific sections and exclude irrelevant content Offers two modes: full-page extraction for data collection, or semantic chunking with natural language instructions for feeding results into LLM context Provides a companion Map API for URL...

# crawlby tavily-ai
Extract and save website content as markdown files for offline access and analysis. Supports configurable crawl depth (1-5 levels), breadth limits, and page caps to balance coverage against performance Includes path filtering via regex patterns to focus on specific sections and exclude irrelevant content Offers two modes: full-page extraction for data collection, or semantic chunking with natural language instructions for feeding results into LLM context Provides a companion Map API for URL...

`npx skills add https://github.com/tavily-ai/skills --skill crawl`Download ZIPGitHub

## More skills from tavily-ai
extractby tavily-aiExtract clean content from specific URLs using Tavily's extraction API. Supports up to 20 URLs per request with optional query-based reranking to focus on relevant content chunks Two extraction modes: basic for fast text extraction, advanced for JavaScript-rendered pages and structured data Automatic OAuth authentication via browser on first run, or manual API key configuration in settings Returns markdown or plain text format with optional image URLs and configurable timeout up to 60 secondsresearchby tavily-aiComprehensive research on any topic with automatic source gathering, analysis, and citations. Conducts multi-source web research with explicit citations, ideal for comparisons, current events, market analysis, and detailed reports Offers three model options: mini for targeted single-topic research (~30s), pro for comprehensive multi-angle analysis (~60-120s), and auto for API-driven complexity detection Authenticates via OAuth through Tavily MCP server with automatic browser-based login on...searchby tavily-aiWeb search with LLM-optimized results, relevance scoring, and flexible filtering. Supports four search depth modes (ultra-fast, fast, basic, advanced) with configurable latency and relevance tradeoffs Includes domain filtering, time range constraints, date ranges, country boosting, and raw content extraction Returns results with title, URL, content snippet, and relevance score; optional image results and favicons Automatic OAuth authentication via Tavily MCP server or API key configuration;...tavily-best-practicesby tavily-aiWeb search API for LLMs with real-time data access, content extraction, site crawling, and AI-powered research. Five core methods: search() for web results, extract() for URL content, crawl() for site-wide extraction, map() for URL discovery, and research() for end-to-end AI synthesis Supports Python and JavaScript SDKs with async clients for parallel queries and configurable search depth (ultra-fast/fast/basic/advanced) Crawl method accepts semantic instructions to focus extraction on...tavily-cliby tavily-aiWeb search, content extraction, site crawling, and deep research via Tavily CLI. Five command modes covering search, extraction, URL discovery, bulk crawling, and multi-source research with citations All commands support JSON output and file saving for structured, agentic workflows Escalation pattern guides you from simple search through extraction, mapping, crawling, to comprehensive research based on your needs Requires tavily-cli installation and API key authentication via tvly logintavily-crawlby tavily-aiMulti-page website crawler with semantic filtering and markdown export. Crawl entire site sections with depth and breadth control; filter by path regex, domain, or natural language instructions to focus results Save each page as local markdown files via --output-dir , or return structured JSON for agentic processing Use semantic instructions with chunk extraction to prevent context bloat when feeding results to LLMs; use full-page extraction for offline documentation downloads Supports...tavily-dynamic-searchby tavily-aiSearch the web, filter results, and extract content so that raw search data never enters your context window . Only your curated print() output comes back.tavily-extractby tavily-aiExtract clean markdown or text from up to 20 URLs, with JavaScript rendering and query-focused chunking support. Handles JavaScript-rendered pages with configurable extraction depth (basic for simple pages, advanced for dynamic SPAs and tables) Supports query-focused extraction to return only relevant content chunks instead of full pages Returns LLM-optimized markdown by default, with options for plain text format and structured JSON output Processes up to 20 URLs in a single call;...

---

**Source**: https://github.com/tavily-ai/skills/tree/HEAD/skills/crawl
**Author**: tavily-ai
**Discovered via**: mcpservers.org

Related skills 6

caveman

★ Featured

Ultra-compressed communication mode. Cuts token usage ~75% by speaking like caveman while keeping full technical accuracy. Supports intensity levels: lite, full (default), ultra, wenyan-lite, wenyan-full, wenyan-ultra. Use when user says "caveman mode", "talk like caveman", "use caveman", "less tokens", "be brief", or invokes /caveman. Also auto-triggers when token efficiency is requested.

juliusbrussee 167k
Development

secure-linux-web-hosting

★ Featured

Use when setting up, hardening, or reviewing a cloud server for self-hosting, including DNS, SSH, firewalls, Nginx, static-site hosting, reverse-proxying an app, HTTPS with Let's Encrypt or ACME clients, safe HTTP-to-HTTPS redirects, or optional post-launch network tuning such as BBR.

xixu-me 155k
Development

readme-i18n

★ Featured

Use when the user wants to translate a repository README, make a repo multilingual, localize docs, add a language switcher, internationalize the README, or update localized README variants in a GitHub-style repository.

xixu-me 155k
Development

lark-shared

★ Featured

Use when first setting up lark-cli, running auth login, switching user/bot identity (--as), handling permission denied or scope errors, needing to update lark-cli, or seeing _notice in JSON output.

larksuite 155k
Development

improve-codebase-architecture

★ Featured

Find deepening opportunities in a codebase, informed by the domain language in CONTEXT.md and the decisions in docs/adr/. Use when the user wants to improve architecture, find refactoring opportunities, consolidate tightly-coupled modules, or make a codebase more testable and AI-navigable.

mattpocock 151k
Development

paper-context-resolver

★ Featured

Optional RigorPilot helper for README-first deep learning repo reproduction. Use only when the README and repository files leave a narrow reproduction-critical gap and the task is to resolve a specific paper detail such as dataset split, preprocessing, evaluation protocol, checkpoint mapping, or runtime assumption from primary paper sources while recording conflicts. Do not use for general paper summary, repo scanning, environment setup, command execution, title-only paper lookup, or replacin...

lllllllama 127k
Development