Concurrent web scraper

Requirements

This crawler requires at least Python 3.5 in order to utilise the async/await keywords from asyncio.

Install required modules:

pip install -r requirements.txt

Run:

python async_crawler.py -u https://urltocrawl.com [-c 100]

Flags:

  • -u/--url https://url.com
    • The base URL is required.
  • -c/--concurrency 100
    • Specifying concurrency value is optional (defaults to 100).

Results

The resulting sitemap will be output to the root of this directory as sitemap.html

Description
No description provided
Readme 1.3 MiB
Languages
Python 100%