230 B
230 B
Thoughts
for each URL, do the following:
- mark it as crawled
- get page content
- if that fails, mark the link as invalid
- find all links in the content
- check each link for dupes
- add to pool or discard