• Joined on 2018-08-15
simon pushed to master at misc/web-scraper 2018-08-31 18:16:10 +00:00
c436016e0c remove unecessary function
03554fde80 add docstrings
Compare 2 commits »
simon pushed to master at misc/web-scraper 2018-08-31 18:13:00 +00:00
759f965e95 use more explicit names, use urljoin to combine urls
simon pushed to master at misc/web-scraper 2018-08-31 18:02:25 +00:00
0517e5bc56 crawler now initialises and populates crawled pool with urls it finds
1b18aa83eb corrected some small errors and added runner function
Compare 2 commits »
simon pushed to master at misc/web-scraper 2018-08-31 17:26:52 +00:00
5e0d9fd568 initial commit of crawler skeleton
915def3a5d rework url sanitiser to use urllib modules, move WebPage object to helpers
Compare 2 commits »
simon pushed to master at misc/web-scraper 2018-08-29 21:27:33 +00:00
453331d69d simplified url qualifier
simon pushed to master at misc/web-scraper 2018-08-29 20:50:36 +00:00
2b812da26a simplify UrlPoolManager to use a set instead of a dict
simon pushed to master at misc/web-scraper 2018-08-28 21:34:07 +00:00
fb096b4468 add scratchpad for notes
simon pushed to master at misc/web-scraper 2018-08-28 21:29:38 +00:00
5d94991167 start making the scraper an object
482d23dd4f blank __init__.py
452de87f35 change name of pool management object to be more clear
73cb883151 add a list manager object
5c933fc5c9 initial commit of single-page scraper
Compare 5 commits »
simon pushed to master at misc/web-scraper 2018-08-28 16:22:55 +00:00
25f8c4c686 remove testing url with requests and assume that the user is correct
simon pushed to master at misc/web-scraper 2018-08-28 08:12:50 +00:00
0d0438670c adjusted title
simon renamed repository from monzo-scraper to misc/web-scraper 2018-08-28 08:11:27 +00:00
simon pushed to master at misc/web-scraper 2018-08-27 18:38:16 +00:00
8a1fd39dc4 added pycache dirs
79b10798a3 initial commit of utils
fb6b976391 initial commit of utils tests
Compare 3 commits »
simon pushed to master at misc/web-scraper 2018-08-27 13:28:32 +00:00
a04de7f4de changed venv name
simon pushed to master at misc/web-scraper 2018-08-23 15:05:26 +00:00
665ec1d7a7 add readme
simon pushed to master at misc/web-scraper 2018-08-23 15:03:47 +00:00
65fc332925 ignore venv and vscode dirs
c6ce63838f bare script file
Compare 2 commits »
simon pushed to master at misc/web-scraper 2018-08-23 15:00:21 +00:00
c383fb7ee9 initial requirements file
01a16a998c initial gitignore