Burp Suite is the tool I'd feel lost without when testing web applications, we even bought the pro version, since it's a great tool with a low price tag. One of its great features is generating proof-of-concept HTML forms for Cross-Site Request Forgery (CSRF or XSRF) testing, and it usually just works out of the box. As it works using HTTP POST data, it has no information about the character-level encoding of the data, so when it comes to applications with accented characters (not a rare thing in Hungary), it just generates garbage, which needs to be fixed manually, but it's not a big problem.
However, today, I met another limitation; when testing an ASP.NET application
with quite a big ViewState (the HTTP post request was around 150 KB),
Burp outputs only the first 4096 byte or so, and then continues to build the
next field, even without closing the <input>
tag or its value
attribute.
(It's also obvious from this that it uses string manipulation to serialize
data into HTML, which sounds odd from a security-related software product.)
Since I really needed a working solution, I created a simple Python script to parse the XML export of a HTTP request from Burp and create an HTML page with a form that have values sent in the request preset. I used LXML to both parse the input XML and serialize the HTML output to avoid the pitfalls Burp met, and first, I loaded the Burp XML request file. XPath was used to get the first item (such exports can store more than one), and to extract the method, URL and request information. Using the single-element tuple assignment syntax asserts that the right-hand side of the assignment contains one and only one element, asserting the sanity of the input.
from lxml import etree
root = etree.parse(input_file).getroot()
item = root.xpath("/items/item")[0]
(method,) = item.xpath("method/text()")
if method.lower() != "post":
raise ValueError("Only POST requests are supported")
(url,) = item.xpath("url/text()")
(request,) = item.xpath("request")
Burp can encode the request body using Base64, so it should be checked for and
decoded if necessary. The resulting body contains the HTTP headers and the
encoded POST data, separated by an empty line, so splitting it is pretty
straightforward. The second parameter of the split
method stops after the
first split, and naming the first result with an underscore makes it apparent
for both humans and machines that we don't care about that piece of data.
from base64 import b64decode
contents = request.text
if request.get("base64"):
contents = b64decode(contents)
_, body = contents.split("\r\n\r\n", 1)
I wrote a small generator function that yields the names and values of each form field as tuples of Unicode objects. I initially used string manipulation, then discovered that Python had me covered with urlparse.
from urlparse import parse_qsl
def decode_form_urlencoded_values(request_body, encoding):
for pair in parse_qsl(request_body, keep_blank_values=True):
yield tuple(i.decode(encoding) for i in pair)
With this done, I just had to build the resulting HTML. I used LXML's E-Factory and Python's argument list unpacking to make it happen in a more or less readable way.
from lxml.html import builder as E
import codecs
output = E.HTML(
E.HEAD(E.META(**{'http-equiv': 'Content-type',
'content': 'text/html; charset=' + encoding})),
E.BODY(
E.FORM(
E.INPUT(type="submit"),
*(E.INPUT(type="hidden", name=name, value=value) for name, value
in decode_form_urlencoded_values(body, encoding)),
action=url, method=method
)
)
)
with codecs.open(output_file, 'wb', encoding) as html_output:
html_output.write(html.tostring(output, encoding=unicode))
The complete and working script can be downloaded from my GitHub repository, and in case you've been wondering if it was worth it; yes, the PoC proved that the target application with the 150 KB ViewState was indeed vulnerable to XSRF.