diff --git a/notes.md b/notes.md new file mode 100644 index 0000000..56c7794 --- /dev/null +++ b/notes.md @@ -0,0 +1,9 @@ +## Thoughts + +###### for each URL, do the following: + * mark it as crawled + * get page content + * if that fails, mark the link as invalid + * find all links in the content + * check each link for dupes + * add to pool or discard \ No newline at end of file