2011/10/31

XKCD2PDF

This one doesn't neccessary fall under the topic of this blog, but since it's a hacky thing I've done - I'm posting it here.

Some time ago I wanted to get all the XKCD comics stored on my iPad for offline viewing. Didn't find another solution so I cooked my own:


  1. A script that gets the JSON files of each comics and rips the important info
  2. Download the stuff
  3. Build a HTML page describing the comics with their 'alt' part (the good stuff)
  4. Open the HTML in Word, set margins to zero, save as PDF
  5. ...
  6. Profit!


I am really sorry about the 4th step, I know it's lame, but suprisingly, Word's "Save-as PDF" feature gave the best looking output.

And if you are lazy, you can grab the already generated PDFs here: (split in 3 chunks, the first 900 comics).

http://www.filesonic.com/file/2840295035/out.pdf
http://www.filesonic.com/file/2840295055/out2.pdf
http://www.filesonic.com/file/2840295065/out3.pdf


P.S. Here's the script:

import sys, urllib2,json
Base = 'http://xkcd.com/'
Tail = 'info.0.json'
def main():
n = int(sys.argv[1])
k = int(sys.argv[2])
out = open("out3.html","w")
out.write("<html><body>\n")
for i in range(n,k+1):
url = '%s%d/%s' % (Base, i, Tail)
print url
f = urllib2.urlopen(url)
a = f.readline()
f.close()
d = json.loads(a)
out.write("<hr/>\n")
out.write(d['title'] + "<br/>\n")
out.write("<img src=\"%d.%s\"/> <br/>\n"%(i,d['img'][-3:]))
out.write(d['alt'].replace("\n","<br/>") + "</br>\n")
u = urllib2.urlopen(d['img'])
localFile = open('%d.%s'%(i,d['img'][-3:]), 'wb')
localFile.write(u.read())
localFile.close()
out.flush()
out.write("</body></html>\n");
out.close()
if __name__ == '__main__':
main()
view raw xkcd.py hosted with ❤ by GitHub