I am trying to do some HTTP / Cookie hacking with python and am having some troubles that I don't know how to work around.
There is a library called ClientCookie for python (it is external in 2.3 but will be built into 2.4) which is located at http://wwwsearch.sourceforge.net/ClientCookie/
What I am doing is using a script to act as a dead link checker, but it has to jump through an annoying hoop which is that it requires you to login. This is where ClientCookie comes in. I am able to login successfully, and I am able to navigate around the site, except for the ability to check the files I am interested in.
I am interested in checking if I am able to download ZIP files, to make sure that they exist correctly. When I attempt to grab these files I get a wonderful 403 Forbidden error.
I can access these file via my web browser, so I know the files are there.
The main problem I am having is that I can not figure out how to check what the headers are that Firefox or IE is sending in order to retreive the files. If you know how to view those headers I would appreciate help on that. In FireFox I am using the Web Developer 0.8 extension, which allow me to check the headers of all of the other pages, but since it is a download I can't check the headers of the request sent for the zip file.
If you can tell me how to figure out what these headers are I am sure I can use this to figure out exactly what the problem is.
My code also resembles the following (but this is still a little psuedo code)
code:
import ClientCookie, re, urllib
params = urllib.encodeurl({ 'login headers': 'the data })
f = ClientCookie.urlopen('LoginResultPage',params)
f = ClientCookie.urlopen('PageWhichLinksIwantToCheck')
pat = 'regexToExtractNeededLinks'
a = re.findall(pat,f.read())
for x in a:
try:
f = ClientCookie.urlopen('LinkToZipFile')
print 'Download sucessful'
except IOError:
pinrt 'Download failed'
Dan @ Code Town