Files
web-scraper/README.md
2018-09-19 08:38:49 +01:00

559 B

Concurrent web scraper

Requirements

This crawler requires at least Python 3.5 in order to utilise the async/await keywords from asyncio.

Install required modules:

pip install -r requirements.txt

Run:

python async_crawler.py -u https://urltocrawl.com [-c 100]

Flags:

  • -u/--url https://url.com
    • The base URL is required.
  • -c/--concurrency 100
    • Specifying concurrency value is optional (defaults to 100).

Results

The resulting sitemap will be output to the root of this directory as sitemap.html