|
|
a3ec9451e3
|
implement parsing of robots.txt
|
2018-09-05 18:56:20 +01:00 |
|
|
|
1b9b207a28
|
attempt to remove base url with trailing slash (if discovered)
|
2018-09-04 13:57:52 +01:00 |
|
|
|
05e907ecec
|
too many changes to make a sensible commit message
|
2018-09-04 09:21:26 +01:00 |
|
|
|
c436016e0c
|
remove unecessary function
|
2018-08-31 19:16:08 +01:00 |
|
|
|
0517e5bc56
|
crawler now initialises and populates crawled pool with urls it finds
|
2018-08-31 19:02:21 +01:00 |
|
|
|
5e0d9fd568
|
initial commit of crawler skeleton
|
2018-08-31 18:26:49 +01:00 |
|