the "Cneonction: close" thing is a quirk of Netscaler loadbalancers. It's done to nullify any "Connection: close" headers the webserver spits out, as the Netscaler wants to manage it better. It's scrambled instead of removed so that it doesn't have to regenerate packets (length is the same) and it's scrambled semi-randomly so that people don't just assume it's a misspelling and add compatibility for it.
X-Ignore-X is longer than close, which I suppose would mess up the packet length. Or maybe having an unrecognized value for the Connection key would still default to a close? Just guessing here.
TCP checksums are fairly simple; a TCP stack basically just sums up the 16-bit words in a packet and stores the result in the checksum field; this will not detect 16-bit words being swapped around.
My guess is that the load balancer tried to invalidate the header while preserving the TCP checksum.
The simplest explanation for "OCR is watching you" is an old web attack called "HTTP Response Splitting"; it happens when a server generates headers (like a Referer) based on user input, but doesn't escape out newlines.
I have a web scraping script written in Python that I want to make multi-threaded. It scrapes web pages from a list, and enters results into a DB. Can someone (the author maybe) show me a simple example of how to make a multi-threaded Python script?
Check out the docs on the "multiprocessing" or "threading" module. In particular multiprocessing.Pool is handy for controlling the number of parallel things you have going on at once.
# Assume we have functions GetUrls() that retrieves a list
# of the urls we want to get, and Download(url) which
# downloads the content of a url and sticks it in the
# database.
import multiprocessing
pool = multiprocessing.Pool(processes=100)
urls = GetUrls()
pool.map(Download, urls)
Go to python.org and read the documentation on threading, concurrency. The mailing list under community is very good, but they expect you to have read the docs and goodled first.
That works, however some sites return different headers depending on whether they get a HEAD or GET request. -I sends a HEAD request, while -i sends a GET request.