Development

Crawl

Version1.0.0

LicenseMIT

Token count~1,272

UpdatedJun 5, 2026

Extract and save website content as markdown files for offline access and analysis. Supports configurable crawl depth (1-5 levels), breadth limits, and page caps to balance coverage against performance Includes path filtering via regex patterns to focus on specific sections and exclude irrelevant content Offers two modes: full-page extraction for data collection, or semantic chunking with natural language instructions for feeding results into LLM context Provides a companion Map API for URL...

Install

Quick install

via npx skills · works with 57+ agents

npx skills add https://github.com/tavily-ai/skills/tree/HEAD/skills/crawl

Or pick agent:

npx skills add tavily-ai/skills --skill crawl --agent claude-code

npx skills add tavily-ai/skills --skill crawl --agent cursor

npx skills add tavily-ai/skills --skill crawl --agent codex

npx skills add tavily-ai/skills --skill crawl --agent opencode

npx skills add tavily-ai/skills --skill crawl --agent github-copilot

npx skills add tavily-ai/skills --skill crawl --agent windsurf

More install options

Shorthand — useful for multi-skill repos:

npx skills add tavily-ai/skills --skill crawl

Manual — clone the repo and drop the folder into your agent's skills directory:

git clone https://github.com/tavily-ai/skills.git

cp -r skills/skills/crawl ~/.claude/skills/

How to use: Once installed, ask your agent to "use the crawl skill" or describe what you want (e.g. "Extract and save website content as markdown files for offline access and analys"). Requires Node.js 18+.

crawl

crawlby tavily-ai

npx skills add https://github.com/tavily-ai/skills --skill crawlDownload ZIPGitHub

More skills from tavily-ai

extractby tavily-aiExtract clean content from specific URLs using Tavily's extraction API. Supports up to 20 URLs per request with optional query-based reranking to focus on relevant content chunks Two extraction modes: basic for fast text extraction, advanced for JavaScript-rendered pages and structured data Automatic OAuth authentication via browser on first run, or manual API key configuration in settings Returns markdown or plain text format with optional image URLs and configurable timeout up to 60 secondsresearchby tavily-aiComprehensive research on any topic with automatic source gathering, analysis, and citations. Conducts multi-source web research with explicit citations, ideal for comparisons, current events, market analysis, and detailed reports Offers three model options: mini for targeted single-topic research (~30s), pro for comprehensive multi-angle analysis (~60-120s), and auto for API-driven complexity detection Authenticates via OAuth through Tavily MCP server with automatic browser-based login on...searchby tavily-aiWeb search with LLM-optimized results, relevance scoring, and flexible filtering. Supports four search depth modes (ultra-fast, fast, basic, advanced) with configurable latency and relevance tradeoffs Includes domain filtering, time range constraints, date ranges, country boosting, and raw content extraction Returns results with title, URL, content snippet, and relevance score; optional image results and favicons Automatic OAuth authentication via Tavily MCP server or API key configuration;...tavily-best-practicesby tavily-aiWeb search API for LLMs with real-time data access, content extraction, site crawling, and AI-powered research. Five core methods: search() for web results, extract() for URL content, crawl() for site-wide extraction, map() for URL discovery, and research() for end-to-end AI synthesis Supports Python and JavaScript SDKs with async clients for parallel queries and configurable search depth (ultra-fast/fast/basic/advanced) Crawl method accepts semantic instructions to focus extraction on...tavily-cliby tavily-aiWeb search, content extraction, site crawling, and deep research via Tavily CLI. Five command modes covering search, extraction, URL discovery, bulk crawling, and multi-source research with citations All commands support JSON output and file saving for structured, agentic workflows Escalation pattern guides you from simple search through extraction, mapping, crawling, to comprehensive research based on your needs Requires tavily-cli installation and API key authentication via tvly logintavily-crawlby tavily-aiMulti-page website crawler with semantic filtering and markdown export. Crawl entire site sections with depth and breadth control; filter by path regex, domain, or natural language instructions to focus results Save each page as local markdown files via --output-dir , or return structured JSON for agentic processing Use semantic instructions with chunk extraction to prevent context bloat when feeding results to LLMs; use full-page extraction for offline documentation downloads Supports...tavily-dynamic-searchby tavily-aiSearch the web, filter results, and extract content so that raw search data never enters your context window . Only your curated print() output comes back.tavily-extractby tavily-aiExtract clean markdown or text from up to 20 URLs, with JavaScript rendering and query-focused chunking support. Handles JavaScript-rendered pages with configurable extraction depth (basic for simple pages, advanced for dynamic SPAs and tables) Supports query-focused extraction to return only relevant content chunks instead of full pages Returns LLM-optimized markdown by default, with options for plain text format and structured JSON output Processes up to 20 URLs in a single call;...

---

Source: https://github.com/tavily-ai/skills/tree/HEAD/skills/crawl
Author: tavily-ai
Discovered via: mcpservers.org

SKILL.md source

---
name: crawl
description: Extract and save website content as markdown files for offline access and analysis. Supports configurable crawl depth (1-5 levels), breadth limits, and page caps to balance coverage against perform...
---

# crawl

Extract and save website content as markdown files for offline access and analysis. Supports configurable crawl depth (1-5 levels), breadth limits, and page caps to balance coverage against performance Includes path filtering via regex patterns to focus on specific sections and exclude irrelevant content Offers two modes: full-page extraction for data collection, or semantic chunking with natural language instructions for feeding results into LLM context Provides a companion Map API for URL...

# crawlby tavily-ai
Extract and save website content as markdown files for offline access and analysis. Supports configurable crawl depth (1-5 levels), breadth limits, and page caps to balance coverage against performance Includes path filtering via regex patterns to focus on specific sections and exclude irrelevant content Offers two modes: full-page extraction for data collection, or semantic chunking with natural language instructions for feeding results into LLM context Provides a companion Map API for URL...

`npx skills add https://github.com/tavily-ai/skills --skill crawl`Download ZIPGitHub

## More skills from tavily-ai
extractby tavily-aiExtract clean content from specific URLs using Tavily's extraction API. Supports up to 20 URLs per request with optional query-based reranking to focus on relevant content chunks Two extraction modes: basic for fast text extraction, advanced for JavaScript-rendered pages and structured data Automatic OAuth authentication via browser on first run, or manual API key configuration in settings Returns markdown or plain text format with optional image URLs and configurable timeout up to 60 secondsresearchby tavily-aiComprehensive research on any topic with automatic source gathering, analysis, and citations. Conducts multi-source web research with explicit citations, ideal for comparisons, current events, market analysis, and detailed reports Offers three model options: mini for targeted single-topic research (~30s), pro for comprehensive multi-angle analysis (~60-120s), and auto for API-driven complexity detection Authenticates via OAuth through Tavily MCP server with automatic browser-based login on...searchby tavily-aiWeb search with LLM-optimized results, relevance scoring, and flexible filtering. Supports four search depth modes (ultra-fast, fast, basic, advanced) with configurable latency and relevance tradeoffs Includes domain filtering, time range constraints, date ranges, country boosting, and raw content extraction Returns results with title, URL, content snippet, and relevance score; optional image results and favicons Automatic OAuth authentication via Tavily MCP server or API key configuration;...tavily-best-practicesby tavily-aiWeb search API for LLMs with real-time data access, content extraction, site crawling, and AI-powered research. Five core methods: search() for web results, extract() for URL content, crawl() for site-wide extraction, map() for URL discovery, and research() for end-to-end AI synthesis Supports Python and JavaScript SDKs with async clients for parallel queries and configurable search depth (ultra-fast/fast/basic/advanced) Crawl method accepts semantic instructions to focus extraction on...tavily-cliby tavily-aiWeb search, content extraction, site crawling, and deep research via Tavily CLI. Five command modes covering search, extraction, URL discovery, bulk crawling, and multi-source research with citations All commands support JSON output and file saving for structured, agentic workflows Escalation pattern guides you from simple search through extraction, mapping, crawling, to comprehensive research based on your needs Requires tavily-cli installation and API key authentication via tvly logintavily-crawlby tavily-aiMulti-page website crawler with semantic filtering and markdown export. Crawl entire site sections with depth and breadth control; filter by path regex, domain, or natural language instructions to focus results Save each page as local markdown files via --output-dir , or return structured JSON for agentic processing Use semantic instructions with chunk extraction to prevent context bloat when feeding results to LLMs; use full-page extraction for offline documentation downloads Supports...tavily-dynamic-searchby tavily-aiSearch the web, filter results, and extract content so that raw search data never enters your context window . Only your curated print() output comes back.tavily-extractby tavily-aiExtract clean markdown or text from up to 20 URLs, with JavaScript rendering and query-focused chunking support. Handles JavaScript-rendered pages with configurable extraction depth (basic for simple pages, advanced for dynamic SPAs and tables) Supports query-focused extraction to return only relevant content chunks instead of full pages Returns LLM-optimized markdown by default, with options for plain text format and structured JSON output Processes up to 20 URLs in a single call;...

---

**Source**: https://github.com/tavily-ai/skills/tree/HEAD/skills/crawl
**Author**: tavily-ai
**Discovered via**: mcpservers.org

Development