This website requires JavaScript.
Explore
Help
Sign In
simon
0 Followers
·
0 Following
Joined on
2018-08-15
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
2
Projects
Packages
Public Activity
Starred Repositories
simon
pushed to
master
at
misc/web-scraper
2018-09-09 09:21:47 +00:00
d686ae0bc4
update with changes
simon
pushed to
master
at
misc/web-scraper
2018-09-09 09:16:23 +00:00
69f5788745
update notes
b5d644a223
various minor improvements to exception handling
Compare 2 commits »
simon
pushed to
master
at
misc/web-scraper
2018-09-09 09:06:26 +00:00
6508156aa4
use lxml as the parser and only find links on a page if we've got the source
simon
pushed to
master
at
misc/web-scraper
2018-09-09 08:57:22 +00:00
738ab8e441
adjust robots handling to deal with 404s and enforce a user agent which allows us to initially obtain the user agent
simon
pushed to
master
at
misc/web-scraper
2018-09-07 11:40:14 +00:00
fdd84a8786
manually retrieve robots.txt to ensure we can set the user-agent
simon
pushed to
master
at
misc/web-scraper
2018-09-07 10:50:55 +00:00
ab0ab0a010
add more thoughts
simon
pushed to
master
at
misc/web-scraper
2018-09-06 16:33:11 +00:00
6a1259aa7d
update plans to add gzip encoding
simon
pushed to
master
at
misc/web-scraper
2018-09-06 16:31:14 +00:00
164239b343
more thoughts
ce1f2745c9
update thoughts
Compare 2 commits »
simon
pushed to
master
at
misc/web-scraper
2018-09-06 16:25:32 +00:00
e70bdc9ca1
update requirements.txt
simon
pushed to
master
at
misc/web-scraper
2018-09-06 16:21:01 +00:00
d1c1e17f4f
report runtime of script in generated sitemap
simon
pushed to
master
at
misc/web-scraper
2018-09-06 16:08:58 +00:00
816a727d79
ignore generated file
simon
pushed to
master
at
misc/web-scraper
2018-09-06 16:08:27 +00:00
84ab27a75e
render results as HTML
6d9103c154
improved content-type detection
Compare 2 commits »
simon
pushed to
master
at
misc/web-scraper
2018-09-06 15:30:16 +00:00
e57a86c60a
only attempt to read html
simon
pushed to
master
at
misc/web-scraper
2018-09-05 17:56:22 +00:00
a3ec9451e3
implement parsing of robots.txt
simon
pushed to
master
at
misc/web-scraper
2018-09-04 14:40:13 +00:00
f2c294ebdb
added new ideas to implement
simon
pushed to
master
at
misc/web-scraper
2018-09-04 12:58:08 +00:00
1b9b207a28
attempt to remove base url with trailing slash (if discovered)
simon
pushed to
master
at
misc/web-scraper
2018-09-04 11:52:00 +00:00
6abe7d68e0
updated notes
simon
pushed to
master
at
misc/web-scraper
2018-09-04 09:14:28 +00:00
7d919039b6
removed unecessary modules
simon
pushed to
master
at
misc/web-scraper
2018-09-04 08:21:56 +00:00
0726bcccb0
removed original file
05e907ecec
too many changes to make a sensible commit message
Compare 2 commits »
simon
pushed to
master
at
misc/web-scraper
2018-08-31 18:18:02 +00:00
abc628106d
added a docstring to the WebPage object
First
Previous
1
2
3
4
Next
Last