update plans to add gzip encoding

This commit is contained in:
2018-09-06 17:33:10 +01:00
parent 164239b343
commit 6a1259aa7d

View File

@@ -12,3 +12,4 @@
* ~~remove base url from initial urls with and without trailing slash~~
* investigate using [tldextract](https://github.com/john-kurkowski/tldextract) to match urls
* ~~implement parsing of [robots.txt](http://docs.w3cub.com/python~3.6/library/urllib.robotparser/)~~
* investigate [gzip encoding](https://stackoverflow.com/questions/36383227/avoid-downloading-images-using-beautifulsoup-and-urllib-request)