Files
web-scraper/notes.md
2018-08-28 22:34:05 +01:00

230 B

Thoughts

for each URL, do the following:
  • mark it as crawled
  • get page content
    • if that fails, mark the link as invalid
  • find all links in the content
    • check each link for dupes
    • add to pool or discard