From fb096b4468fe05e92b4337eeb84be19e81a4f3af Mon Sep 17 00:00:00 2001 From: Simon Weald Date: Tue, 28 Aug 2018 22:34:05 +0100 Subject: [PATCH] add scratchpad for notes --- notes.md | 9 +++++++++ 1 file changed, 9 insertions(+) create mode 100644 notes.md diff --git a/notes.md b/notes.md new file mode 100644 index 0000000..56c7794 --- /dev/null +++ b/notes.md @@ -0,0 +1,9 @@ +## Thoughts + +###### for each URL, do the following: + * mark it as crawled + * get page content + * if that fails, mark the link as invalid + * find all links in the content + * check each link for dupes + * add to pool or discard \ No newline at end of file