Files
web-scraper/notes.md
2018-08-28 22:34:05 +01:00

9 lines
230 B
Markdown

## Thoughts
###### for each URL, do the following:
* mark it as crawled
* get page content
* if that fails, mark the link as invalid
* find all links in the content
* check each link for dupes
* add to pool or discard