Checking links with Python in TeX documents

As every year the German documentation for the TeX Live distribution is on my agenda. To check the more than 100 weblinks in the document I wrote a small Python script which does the job fairly well.

import re
import urllib2
 
filehandle = open("texlive-de-new.tex")
text = filehandle.read()
filehandle.close()
 
# regexp from http://www.noah.org/wiki/RegEx_Python
m = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', text)
 
i = 0
for item in m:
        i=i+1
        print i, '\t', item, '\t',
        try:
            response = urllib2.urlopen(item)
        except urllib2.HTTPError, e:
                    print e.code
        except urllib2.URLError, u:
                    print u.args
        print "\n"

Uwe

Uwe Ziegenhagen likes LaTeX and Python, sometimes even combined. Do you like my content and would like to thank me for it? Consider making a small donation to my local fablab, the Dingfabrik Köln. Details on how to donate can be found here Spenden für die Dingfabrik.

More Posts - Website