Site Crawler & Audit

How it works

This tool calls a small server-side crawler (a Cloudflare Worker), since fetching another site's pages from directly inside your browser is blocked by CORS. The crawler follows same-origin links breadth-first starting from the URL you give it, reading each page's <title>, <meta name="description">, and outgoing links.

Limits

Each crawl covers up to 40 pages on the same domain, and stops there even if more exist — this keeps each run fast and within the Worker's request budget. If your site is larger, this will show a partial result flagged as such; for full coverage on a large site, a dedicated crawl-time tool integrated into your build process is a better fit than a one-off browser tool.

Why some pages might be missing

This crawler reads the HTML a server returns — it does not execute JavaScript. If navigation links on a site are inserted by client-side JS after the page loads (a common pattern for shared headers/footers), this crawler won't see those links and may miss pages that are only reachable through them. Pages linked directly in a page's static HTML are always found.

FAQ

Does this affect my site's real Google indexing?

No — this tool doesn't submit anything anywhere on its own. It generates files for you to review and upload yourself, exactly like the manual sitemap and RSS generators elsewhere on this site.

Will this crawl private or password-protected pages?

No, it can only see what an anonymous visitor would see — no login, no cookies, no authentication is used during the crawl.

Why does a crawl sometimes take several seconds?

Each page requires a real network request to fetch and parse, run sequentially through the crawler. A 40-page crawl means up to 40 real HTTP requests happening server-side before you get a result — that's inherent to crawling, not a performance issue with the tool.

Site to crawl

Results

Broken links

Missing meta tags