Files
web-scraper/README.md
2018-09-19 08:38:49 +01:00

29 lines
559 B
Markdown

# Concurrent web scraper
## Requirements
This crawler requires at least Python 3.5 in order to utilise the async/await keywords from `asyncio`.
Install required modules:
```bash
pip install -r requirements.txt
```
Run:
```bash
python async_crawler.py -u https://urltocrawl.com [-c 100]
```
Flags:
- `-u/--url https://url.com`
- The base URL is required.
- `-c/--concurrency 100`
- Specifying concurrency value is optional (defaults to 100).
## Results
The resulting sitemap will be output to the root of this directory as `sitemap.html`