<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
	<title>VSzA techblog</title>
	<link rel="alternate" type="text/html" href="http://techblog.vsza.hu/"/>
	<link rel="self" type="application/atom+xml" href="http://techblog.vsza.hu/atom.xml"/>
	<updated>2013-05-06T19:42:24+02:00</updated>
   <generator uri="http://github.com/stef/utterson">utterson v0.4</generator>
   <id>http://techblog.vsza.hu/</id>
	<entry>
		<title>Bootstrapping MySQL for testing</title>
		<link rel="alternate" type="text/html" href="http://techblog.vsza.hu/posts/Bootstrapping_MySQL_for_testing.html"/>
		<updated>2013-05-06T19:42:24+02:00</updated>
      <id>http://techblog.vsza.hu/posts/Bootstrapping_MySQL_for_testing.html</id>
      <author><name>dnet</name></author>
		<category term="POSTCATEGORY" scheme="http://www.sixapart.com/ns/types#category"/>
		<content type="html" xml:lang="en" xml:base="http://techblog.vsza.hu"><![CDATA[
      <p>When I created <a href="https://github.com/dnet/registr">registr</a>, I wanted a way to test it on the same RDBMS as the
one I use for <a href="http://www.redmine.org/">Redmine</a>, MySQL. For the purposes of testing, I wanted to
start a fresh instance of <code>mysqld</code> that could be ran without superuser
privileges, without affecting other running MySQL instances, and with
minimal resource consumtion.</p>

<p>Although the test suite was developed in Python, the idea can be used with any
language that makes it possible to create temporary directories in a manner
that avoids race conditions and spawn processes. The code can be found in the
<a href="https://github.com/dnet/registr/blob/mysql/test_redmine.py">TestRedmineMySQL class</a>, and it follows the steps described below.</p>

<ul>
<li>Create a temporary directory (<code>path</code>)</li>
<li>Create a directory inside <code>path</code> (<code>datadir</code>)</li>
<li>Generate two fil&#x65;names inside <code>path</code> (<code>socket</code> and <code>pidfile</code>)</li>
<li>Spawn the <code>mysqld_safe</code> binary with the following parameters.
<ul>
<li><code>--socket=</code> and the value of <code>socket</code> makes MySQL accept connections throught that file</li>
<li><code>--datadir=</code> and the value of <code>datadir</code> makes MySQL store all databases in that directory</li>
<li><code>--skip-networking</code> disables the TCP listener, thus minimizes interference with other instances</li>
<li><code>--skip_grant_tables</code> disables access control, since we don't need that for testing</li>
<li><code>--pid-file=</code> and the value of <code>pidfile</code> makes MySQL store the process ID in that file</li>
</ul></li>
<li>Do what you want with the database</li>
<li>Open the file named <code>pidfile</code> and read an integer from the only row</li>
<li>Send a <code>SIGTERM</code> to the PID</li>
<li>Wait for the process to finish.</li>
</ul>

<p>The above way worked fine for me, didn't leave any garbage on the system, and
ran as fast as an Oracle product could do. :)</p>
]]></content>
	</entry>
	<entry>
		<title>Single mbox outbox vs. multiple IMAP accounts</title>
		<link rel="alternate" type="text/html" href="http://techblog.vsza.hu/posts/Single_mbox_outbox_vs._multiple_IMAP_accounts.html"/>
		<updated>2013-04-01T14:15:50+02:00</updated>
      <id>http://techblog.vsza.hu/posts/Single_mbox_outbox_vs._multiple_IMAP_accounts.html</id>
      <author><name>dnet</name></author>
		<category term="POSTCATEGORY" scheme="http://www.sixapart.com/ns/types#category"/>
		<content type="html" xml:lang="en" xml:base="http://techblog.vsza.hu"><![CDATA[
      <p>As <a href="http://techblog.vsza.hu/posts/Four_free_software_I_started_using_in_2012.html">I've mentioned in February 2013</a>, I started using mutt in December 2012
and as a transitional state, I've been using my three IMAP accounts in on-line
mode, like I did with KMail. All outgoing mail got recorded in an <a href="https://en.wikipedia.org/wiki/Mbox">mbox</a>
file called <code>~/Mail/Sent</code> for all three accounts, which was not intentional,
but a configuration glitch at first. But now I realized that it has two
positive side effects when I'm using cellular Internet connection. Since this
way, the MUA doesn't upload the message using IMAP to the <code>Sent</code> folder,
resulting in 50% less data sent, which makes sending mail faster and saves
precious megabytes in my mobile data plan.</p>

<p>However, I still prefer having my sent mail present in the <code>Sent</code> folder of my
IMAP accounts, so I needed a solution to transfer the contents of an mbox file
to IMAP folders based on the <code>From</code> field. I preferred Python for the task as
the standard library had support for both <a href="http://docs.python.org/library/imaplib.html">IMAP</a> and <a href="http://docs.python.org/library/mailbox.html">mbox</a> out of the
box, and I've already had <a href="https://github.com/dnet/dashboard/blob/master/imapflag.py">good experience with the former</a>. Many solutions
I found used Python as well, but none of them had support for multiple IMAP
accounts and many used deprecated classes, or treated the process as a one-shot
operation, while I planned to use this to upload my mbox regularly to IMAP.</p>

<p>So I decided to write a simple script, which I completed in about an hour or
two that did exactly what I need, and still had no dependencies to anything
that's not part of the standard library. The script has support for invocation
from other modules and the command line as well, core functionality was
implemented in the <code>process_mbox</code> method of the <code>OutboxSyncer</code> class.
The method gets the <code>Mailbox</code> object and a reference for a database as
parameters, latter is used to ensure that all messages are uploaded exactly
once, even in case of exceptions or parallel invocations.</p>

<pre><code class="python">for key, msg in mbox.iteritems():
    account, date_time = msg.get_from().split(' ', 1)
    contents = mbox.get_string(key)
    msg_hash = HASH_ALGO(contents).hexdigest()
    params = (msg_hash, account)
</code></pre>

<p>The built-in iterator of the mailbox is used to iterate through messages in a
memory-efficient way. Both <code>key</code> and <code>msg</code> are needed as former is needed to
obtain the raw message as a byte string (<code>contents</code>), while latter makes
parsed data, such as the sender (<code>account</code>) and the timestamp (<code>date_time</code>)
accessible. The contents of the message is hashed (currently using SHA-256)
to get a unique identifier for database storage. In the last line, params is
instantiated for later usage in parameterized database queries.</p>

<pre><code class="python">with db:
    cur.execute(
        'SELECT COUNT(*) FROM messages WHERE hash = ? AND account = ?',
        params)
    ((count,),) = cur.fetchall()
    if count == 0:
        cur.execute('INSERT INTO messages (hash, account) VALUES (?, ?)',
            params)
    else:
        continue
</code></pre>

<p>By using the context manager of the database object, checking whether the
message free for processing and locking it is done in a single transaction,
resulting in a <code>ROLLBACK</code> in case an exception gets thrown and in a <code>COMMIT</code>
otherwise. Assigning the variable <code>count</code> was done this way to assert that
the result has a single row with a single column. If the message is locked
or has already been uploaded, the mailbox iterator is advanced without
further processing using <code>continue</code>.</p>

<pre><code>try:
    acc_cfg = accounts[account]
    imap = self.get_imap_connection(account, acc_cfg)
    response, _ = imap.append(acc_cfg['folder'], r'\Seen',
            parsedate(date_time), contents)
    assert response == 'OK'
</code></pre>

<p>After the message is locked for processing, it gets uploaded to the IMAP
account into the folder specified in the configuration. The class has
a <code>get_imap_connection</code> method that calls the appropriate <code>imaplib</code>
constructors and takes care of connection pooling to avoid connection and
disconnection for every message processed. The return value of the IMAP
server is checked to avoid sil&#x65;nt fail.</p>

<pre><code class="python">except:
    with db:
        cur.execute('DELETE FROM messages WHERE hash = ? AND account = ?',
            params)
    raise
else:
    print('Appended', msg_hash, 'to', account)
    with db:
        cur.execute(
            'UPDATE messages SET success = 1 WHERE hash = ? AND account = ?',
            params)
</code></pre>

<p>In case of errors, the message lock gets released and the exception is
re-raised to stop the process. Otherwise, the <code>success</code> flag is set to <code>1</code>,
and processing continues with the next message. Source code is available in
<a href="https://github.com/dnet/outbox-sync">my GitHub repository</a> under MIT license, feel free to fork and send pull
requests or comment on the code there.</p>
]]></content>
	</entry>
	<entry>
		<title>Two tools to aid protocol reverse engineering</title>
		<link rel="alternate" type="text/html" href="http://techblog.vsza.hu/posts/Two_tools_to_aid_protocol_reverse_engineering.html"/>
		<updated>2013-03-14T16:49:50+01:00</updated>
      <id>http://techblog.vsza.hu/posts/Two_tools_to_aid_protocol_reverse_engineering.html</id>
      <author><name>dnet</name></author>
		<category term="POSTCATEGORY" scheme="http://www.sixapart.com/ns/types#category"/>
		<content type="html" xml:lang="en" xml:base="http://techblog.vsza.hu"><![CDATA[
      <p>Lately I analyzed a closed-source proprietary thick client application that
rolled its own cryptography, including the one used for the network layer.
To aid the protocol analysis, I needed two tools with a shared input. The
input was the flow of packets sent and received by the application, which I
first tried to extract using the hex output of <a href="http://www.wireshark.org/docs/man-pages/tshark.html">tshark</a>, but I realized
that it displayed data from layers above TCP I didn't need, and on the
other hand, it didn't perform TCP reassembly, which I didn't want to do
by hand or reinventing the wheel.</p>

<p>So I decided to use the output of the <a href="http://www.wireshark.org/docs/wsug_html_chunked/ChAdvFollowTCPSection.html">Follow TCP stream function of
Wireshark</a>, in hex mode to be precise. It can be saved to a plain text
file with a single click, and it just had what I needed: offsets and easily
parseable hex data. I've written a simple parser based on regular expressions
that could read such file, starting by defining the actual expressions. The
first one matches a single line, starting with whitespace in case of packets
<em>sent</em>, and nothing if <em>received</em> (group 1). This is followed by a hex offset
of the row (group 2), the row data encoded in 1 to 16 hex bytes (group 3),
and the ASCII dump of the row data. Latter is padded, so by limiting group 3
to 49 characters, it could be ignored effectively. I used the <code>re.I</code> flag so
I didn't have to write <code>a-fA-F</code> everywhere instead of <code>a-f</code> explicitly.</p>

<pre><code>import re

FLOW_ROW_RE = re.compile(r'^(\s*)([0-9a-f]+)\s+([0-9a-f\s]{1,49})', re.I)
NON_HEX_RE = re.compile(r'[^0-9a-f]', re.I)
</code></pre>

<p>The <code>Flow</code> class itself is a list of entries, so I made the class inherit
from <code>list</code> and added a custom constructor. I also added an inner class called
<code>Entry</code> for the entries and two constants to indicate packet directions.
I used a <a href="http://docs.python.org/2/library/collections.html#collections.namedtuple">namedtuple</a> to provide some formality over using a <code>dict</code>.
The constructor expects the name of a file from Wireshark, opens it and
populates the list using the parent constructor and a generator function
called <code>load_flow</code>.</p>

<pre><code class="python">from collections import namedtuple

class Flow(list):
    Entry = namedtuple('Entry', ['direction', 'data', 'offset'])
    SENT = 'sent'
    RECEIVED = 'received'
    DIRECTIONS = [SENT, RECEIVED]

    def __init__(self, fil&#x65;name):
        with file(fil&#x65;name, 'r') as flow_file:
            list.__init__(self, load_flow(flow_file))
</code></pre>

<p>This <code>load_flow</code> got a file object, which it used as an iterator, returning
each line of the input file. It got mapped using <code>imap</code> to regular expression
match objects, and filtered using <code>ifilter</code> to ignore rows that didn't match.
In the body of the loop, all three match groups are parsed, and sanity checks
are performed on the offset to make sure to bytes were lost during parsing.
For this purpose, a <code>dict</code> is used, initialized to zeros before the loop,
and incremented after each row to measure the number of bytes read in both
directions.</p>

<pre><code class="python">from binascii import unhexlify
from itertools import imap, ifilter

def load_flow(flow_file):
    offset_cache = {Flow.SENT: 0, Flow.RECEIVED: 0}
    for m in ifilter(None, imap(FLOW_ROW_RE.match, flow_file)):
        direction = Flow.SENT if m.group(1) == '' else Flow.RECEIVED
        offset = int(m.group(2), 16)
        data = unhexlify(NON_HEX_RE.sub('', m.group(3)))
        last_offset = offset_cache[direction]
        assert last_offset == offset
        offset_cache[direction] = last_offset + l&#x65;n(data)
</code></pre>

<p>The rest of the function is some code that (as of 14 March 2013) needs some
cleaning, and handles yielding <code>Flow.Entry</code> objects properly, squashing
entries spanning multiple rows at the same time.</p>

<p>As I mentioned in the beginning, there were two kinds of functionality I
needed, both of which use these <code>Flow</code> objects as an input. The first one
is a fake client/server that makes it possible to generate network traffic
quickly by using previously captured flows, called <code>flowfake</code>. It simply
replays flows from a selected viewpoint using plain sockets, either as
a client or a server.</p>

<p>The second one is more interesting and complex (at least for me) as it makes
possible to view the differences (or similarities, depending on the use-case)
between 2 to 4 flows (latter being an ad-hoc limit based on the colors
defined) using simple algorithms and colors to aid visual analysis. For better
understanding, see the screenshot below to understand how it works on four
flows. The whole project is available under MIT license in a <a href="https://github.com/dnet/flowtools">GitHub repo</a>.</p>

<p><img src="http://techblog.vsza.hu/images/flowdiff-screenshot.png" alt="Screenshot of flowdiff" title="" /></p>
]]></content>
	</entry>
	<entry>
		<title>Generating XSRF PoC from Burp with Python</title>
		<link rel="alternate" type="text/html" href="http://techblog.vsza.hu/posts/Generating_XSRF_PoC_from_Burp_with_Python.html"/>
		<updated>2013-02-20T17:23:30+01:00</updated>
      <id>http://techblog.vsza.hu/posts/Generating_XSRF_PoC_from_Burp_with_Python.html</id>
      <author><name>dnet</name></author>
		<category term="POSTCATEGORY" scheme="http://www.sixapart.com/ns/types#category"/>
		<content type="html" xml:lang="en" xml:base="http://techblog.vsza.hu"><![CDATA[
      <p><a href="http://portswigger.net/burp/">Burp Suite</a> is the tool I'd feel lost without when testing web applications,
we even bought the pro version, since it's a great tool with a low price tag.
One of its great features is <a href="http://portswigger.net/burp/help/suite_functions_csrfpoc.html">generating proof-of-concept HTML forms</a>
for <a href="https://en.wikipedia.org/wiki/Cross-site_request_forgery">Cross-Site Request Forgery</a> (CSRF or XSRF) testing, and it usually
just works out of the box. As it works using HTTP POST data, it has no
information about the character-level encoding of the data, so when it comes
to applications with accented characters (not a rare thing in Hungary), it
just generates garbage, which needs to be fixed manually, but it's not a big
problem.</p>

<p>However, today, I met another limitation; when testing an ASP.NET application
with quite a big <a href="https://en.wikipedia.org/wiki/ASP.NET#View_state">ViewState</a> (the HTTP post request was around 150 KB),
Burp outputs only the first 4096 byte or so, and then continues to build the
next field, even without closing the <code>&lt;input&gt;</code> tag or its <code>value</code> attribute.
(It's also obvious from this that it uses string manipulation to serialize
data into HTML, which sounds odd from a security-related software product.)</p>

<p>Since I really needed a working solution, I created a simple Python script
to parse the XML export of a HTTP request from Burp and create an HTML page
with a form that have values sent in the request preset. I used <a href="http://docs.python.org/2/library/urlparse.html">LXML</a> to
both parse the input XML and serialize the HTML output to avoid the pitfalls
Burp met, and first, I loaded the Burp XML request file. XPath was used to
get the first item (such exports can store more than one), and to extract
the method, URL and request information. Using the single-element tuple
assignment syntax asserts that the right-hand side of the assignment contains
one and only one element, asserting the sanity of the input.</p>

<pre><code class="python">from lxml import etree

root = etree.parse(input_file).getroot()
item = root.xpath("/items/item")[0]
(method,) = item.xpath("method/text()")
if method.lower() != "post":
    raise ValueError("Only POST requests are supported")
(url,) = item.xpath("url/text()")
(request,) = item.xpath("request")
</code></pre>

<p>Burp can encode the request body using Base64, so it should be checked for and
decoded if necessary. The resulting body contains the HTTP headers and the
encoded POST data, separated by an empty line, so splitting it is pretty
straightforward. The second parameter of the <code>split</code> method stops after the
first split, and naming the first result with an underscore makes it apparent
for both humans and machines that we don't care about that piece of data.</p>

<pre><code class="python">from base64 import b64decode

contents = request.text
if request.get("base64"):
    contents = b64decode(contents)
_, body = contents.split("\r\n\r\n", 1)
</code></pre>

<p>I wrote a small generator function that yields the names and values of each
form field as tuples of Unicode objects. I initially used string manipulation,
then discovered that Python had me covered with <a href="http://docs.python.org/2/library/urlparse.html">urlparse</a>.</p>

<pre><code class="python">from urlparse import parse_qsl

def decode_form_url&#x65;ncoded_values(request_body, encoding):
    for pair in parse_qsl(request_body, keep_blank_values=True):
        yield tuple(i.decode(encoding) for i in pair)
</code></pre>

<p>With this done, I just had to build the resulting HTML. I used LXML's
<a href="http://lxml.de/lxmlhtml.html#creating-html-with-the-e-factory">E-Factory</a> and Python's <a href="http://docs.python.org/2/tutorial/controlflow.html#unpacking-argument-lists">argument list unpacking</a> to make it happen in
a more or less readable way.</p>

<pre><code class="python">from lxml.html import builder as E
import codecs

output = E.HTML(
    E.HEAD(E.META(**{'http-equiv': 'Content-type',
        'content': 'text/html; charset=' + encoding})),
    E.BODY(
        E.FORM(
            E.INPUT(type="submit"),
            *(E.INPUT(type="hidden", name=name, value=value) for name, value
                in decode_form_url&#x65;ncoded_values(body, encoding)),
            action=url, method=method
            )
        )
    )
with codecs.open(output_file, 'wb', encoding) as html_output:
    html_output.write(html.tostring(output, encoding=unicode))
</code></pre>

<p>The complete and working script can be downloaded from
<a href="https://github.com/dnet/burp-scripts/blob/master/burp-csrf.py">my GitHub repository</a>, and in case you've been wondering if it was worth
it; yes, the PoC proved that the target application with the 150 KB ViewState
was indeed vulnerable to XSRF.</p>
]]></content>
	</entry>
	<entry>
		<title>LibreOffice 4.0 workaround for read-only FS</title>
		<link rel="alternate" type="text/html" href="http://techblog.vsza.hu/posts/LibreOffice_4.0_workaround_for_read-only_FS.html"/>
		<updated>2013-02-10T14:53:52+01:00</updated>
      <id>http://techblog.vsza.hu/posts/LibreOffice_4.0_workaround_for_read-only_FS.html</id>
      <author><name>dnet</name></author>
		<category term="POSTCATEGORY" scheme="http://www.sixapart.com/ns/types#category"/>
		<content type="html" xml:lang="en" xml:base="http://techblog.vsza.hu"><![CDATA[
      <p><a href="http://www.libreoffice.org/download/4-0-new-features-and-fixes/">LibreOffice 4.0</a> got released on 7<sup>th</sup> February, and since it
offered improved OOXML interoperability, I immediately downloaded and
installed it on my laptop. It worked quite well, but after the next boot, it
just flashed the window for a tenth of a second, and displayed the following
output on the console.</p>

<pre><code>Fatal Python error: Py_Initialize: Unable to get the locale encoding
Traceback (most recent call last):
  File "&lt;frozen importlib._bootstrap&gt;", line 1558, in _find_and_load
  File "&lt;frozen importlib._bootstrap&gt;", line 1525, in _find_and_load_unlocked
  File "&lt;frozen importlib._bootstrap&gt;", line 586, in _check_name_wrapper
  File "&lt;frozen importlib._bootstrap&gt;", line 1023, in load_module
  File "&lt;frozen importlib._bootstrap&gt;", line 1004, in load_module
  File "&lt;frozen importlib._bootstrap&gt;", line 562, in module_for_loader_wrapper
  File "&lt;frozen importlib._bootstrap&gt;", line 854, in _load_module
  File "&lt;frozen importlib._bootstrap&gt;", line 990, in get_code
  File "&lt;frozen importlib._bootstrap&gt;", line 1051, in _cache_bytecode
  File "&lt;frozen importlib._bootstrap&gt;", line 1065, in set_data
OSError: [Errno 30] Read-only file system: '/usr/local/opt/libreoffice4.0/program/../program/python-core-3.3.0/lib/encodings/__pycache__'
</code></pre>

<p>I symlinked <code>/opt</code> to <code>/usr/local/opt</code>, and for many reasons (including
faster boot, storing <code>/usr</code> on an SSD) I mount <code>/usr</code> in read-only mode by
default, and use the following snippet in <code>/etc/apt/apt.conf.d/12remount</code>
to do the magic upon system upgrade and software installs.</p>

<pre><code>DPkg
{
    Pre-Invoke {"mount -o remount,rw /usr &amp;&amp; mount -o remount,exec /var &amp;&amp; mount -o remount,exec /tmp";};
    Post-Invoke {"mount -o remount,ro /usr ; mount -o remount,noexec /var &amp;&amp; mount -o remount,noexec /tmp";};
}
</code></pre>

<p>It seems that LibreOffice 4.0 tries to put compiled Python objects into a
persistent cache, and since it resides on a read-only filesystem, it cannot
even create the <code>__pycache__</code> directories needed for that. My workaround is
the following shell script that needs to be ran just once, and works quite
well by letting LibreOffice put its cached <code>pyc</code> files into <code>/tmp</code>.</p>

<pre><code>#!/bin/sh
mount /usr -o rw,remount
find /opt/libreoffice4.0/program/python-core-3.3.0/lib -type d \
    -exec ln -s /tmp {}/__pycache__ \;
mount /usr -o ro,remount
</code></pre>
]]></content>
	</entry>
	<entry>
		<title>Complementing Python GPGME with M2Crypto</title>
		<link rel="alternate" type="text/html" href="http://techblog.vsza.hu/posts/Complementing_Python_GPGME_with_M2Crypto.html"/>
		<updated>2012-12-31T20:00:59+01:00</updated>
      <id>http://techblog.vsza.hu/posts/Complementing_Python_GPGME_with_M2Crypto.html</id>
      <author><name>dnet</name></author>
		<category term="POSTCATEGORY" scheme="http://www.sixapart.com/ns/types#category"/>
		<content type="html" xml:lang="en" xml:base="http://techblog.vsza.hu"><![CDATA[
      <p>While the <a href="http://www.gnupg.org/related_software/gpgme/">GPGME</a> Python bindings provide interface to most of the
functionality provided by <a href="http://www.gnupg.org/">GnuPG</a>, so I could generate keys and perform
encryption and decryption using them, I found that it wasn't possible to
list, which public keys can decrypt a PGP encrypted file. Of course, it's
always a possibility to invoke the <code>gpg</code> binary, but I wanted to avoid
spawning processes if possible.</p>

<p>As <a href="http://stackoverflow.com/a/1042139">Heikki Toivonen mentioned in a Stack Overflow thread</a>, the
<a href="http://chandlerproject.org/Projects/MeTooCrypto">M2Crypto</a> library had a PGP module, and based on a <a href="http://svn.osafoundation.org/m2crypto/trunk/demo/pgp/pgpstep.py">demo code</a>, it
seemed to be able to parse OpenPGP files into meaningful structures, including
<code>pke_packet</code> that contains an attribute called <code>keyid</code>. I installed the module
from the Debian package <code>python-m2crypto</code>, tried calling the PGP parser
functionality, and found that</p>

<ul>
<li>the <code>keyid</code> attribute is called <code>_keyid</code> now, and</li>
<li>after returning the <code>pke_packet</code> instances, it raises an <code>XXXError</code> in
 case of OpenPGP output generated by GnuPG 1.4.12.</li>
</ul>

<p>It's also important to note that the M2Crypto keyid is an 8 bytes long raw
byte string, while GPGME uses 16 characters long uppercase hex strings for the
same purpose. I chose to convert the former to the latter format, resulting in
a <code>set</code> of hexadecimal key IDs. Later, I could check, which keys available in
the current keyring are able to decrypt the file. The <code>get_acl</code> function thus
returns a <code>dict</code> mapping e-mail addresses to a boolean value that indicates
the key's ability to decrypt the file specified in the <code>fil&#x65;name</code> parameter.</p>

<pre><code>from M2Crypto import PGP
from contextlib import closing
from binascii import hexlify
import gpgme

def get_acl(fil&#x65;name):
    with file(fil&#x65;name, 'rb') as stream:
        with closing(PGP.packet_stream(stream)) as ps:
            own_pk = set(packet_stream2hexkeys(ps))
    return dict(
            (k.uids[0].email, any(s.keyid in own_pk for s in k.subkeys))
            for k in gpgme.Context().keylist())

def packet_stream2hexkeys(ps):
    try:
        while True:
            pkt = ps.read()
            if pkt is None:
                break
            elif pkt and isinstance(pkt, PGP.pke_packet):
                yield hexlify(pkt._keyid).upper()
    except:
        pass
</code></pre>
]]></content>
	</entry>
	<entry>
		<title>Connecting Baofeng UV-5R to a Linux box</title>
		<link rel="alternate" type="text/html" href="http://techblog.vsza.hu/posts/Connecting_Baofeng_UV-5R_to_a_Linux_box.html"/>
		<updated>2012-12-23T00:52:15+01:00</updated>
      <id>http://techblog.vsza.hu/posts/Connecting_Baofeng_UV-5R_to_a_Linux_box.html</id>
      <author><name>dnet</name></author>
		<category term="POSTCATEGORY" scheme="http://www.sixapart.com/ns/types#category"/>
		<content type="html" xml:lang="en" xml:base="http://techblog.vsza.hu"><![CDATA[
      <p>Ever since I bought my <a href="http://www.universal-radio.com/catalog/ht/0205.html">Baofeng UV-5R</a> handheld VHF/UHF FM transceiver, I
wanted to hook it up to my notebook – partly to populate the channel list
without fighting the crippled UI, partly out of curiosity. First, I had to
build a cable, since I didn't receive one in the package, and it would've cost
at least around 20 bucks to get my hands on one – plus the delay involved in
postal delivery. In a <a href="http://groups.yahoo.com/group/baofeng_uv5r/message/7184">Yahoo! group, jwheatleyus mentioned</a> the following
pinout:</p>

<ul>
<li>3.5mm Plug Programming Pins
<ul>
<li>Sleeve: Mic – (and PTT) Rx Data</li>
<li>Ring: Mic +</li>
<li>Tip: +V</li>
</ul></li>
<li>2.5mm Plug
<ul>
<li>Sleeve: Speaker – (and PTT) Data GND</li>
<li>Ring: Tx Data</li>
<li>Tip: Speaker +</li>
</ul></li>
<li>Connect Sleeve to Sleeve for PTT</li>
</ul>

<p>I took apart the headset bundled with the gear, and verified this pinout in
case of the Mic/Speaker/PTT lines with a multimeter, so I only had to connect
these pins to the notebook. Since I already had an <a href="http://www.ftdichip.com/Products/Cables/USBTTLSerial.htm">FTDI TTL-232R-5V</a> cable
lying around for use with my <a href="http://www.evilmadscientist.com/2010/diavolino/">Diavolino</a> (actually, I won both of them on
the <a href="http://events.ccc.de/congress/2010/wiki/Hardware_Hacking_Area#LoL_Shield_Contest">LoL shield contest at 27C3</a>), I created a breakout board that can be
connected to the radio, and had pin headers just in the right order for the
FTDI cable and two others for speaker and mic lines. The schematic and the
resulting hardware can be seen below.</p>

<p><img src="http://techblog.vsza.hu/images/baofeng_breakout.png" alt="Baofeng UV-5R breakout board" title="" /></p>

<p>With the physical layer ready, I only had to find some way to manipulate the
radio using software running on the notebook. While many software available
for this radio is either closed and/or available for Windows only, I found
<a href="http://chirp.danplanet.com">Chirp</a>, a FLOSS solution written in Python (thus available for all sane
platforms) which – as of this writing – could access Baofeng UV-5R in the
experimental <a href="http://chirp.danplanet.com/projects/chirp/wiki/Download#Development-builds">daily builds</a>. Like most Python software, Chirp doesn't
require any install procedures either, downloading and extracting the
tarball led to a functional and minimalistic GUI. First, I set the second
tuner to display the name of the channel, and uploaded a channel list with
the <a href="http://www.ha5kdr.hu/projektek/sstv">Hármashatár-hegy SSTV relay</a> (thus the name HHHSSTV) at position 2,
with the following results.</p>

<p><img src="http://techblog.vsza.hu/images/hhhsstv.jpg" alt="HHHSSTV channel stored on Baofeng UV-5R" title="" /></p>

<p>I could also access an interesting tab named <em>other settings</em> that made it
possible to edit the message displayed upon startup and limit the frequency
range in both bands.</p>

<p><img src="http://techblog.vsza.hu/images/baofeng-bootmsg.png" alt="Other settings and their effect on Baofeng UV-5R" title="" /></p>

<p>Although Chirp states that the driver for UV-5R is still experimental, I
didn't have any problems with it, and as it's written in Python, its code is
readable and extensible, while avoiding cryptic dependencies. It's definitely
worth a try, and if lack of PC connectivity without proprietary software was
a reason for you to avoid this radio, then I have good news for you.</p>
]]></content>
	</entry>
	<entry>
		<title>Leaking data using DIY USB HID device</title>
		<link rel="alternate" type="text/html" href="http://techblog.vsza.hu/posts/Leaking_data_using_DIY_USB_HID_device.html"/>
		<updated>2012-10-29T19:09:01+01:00</updated>
      <id>http://techblog.vsza.hu/posts/Leaking_data_using_DIY_USB_HID_device.html</id>
      <author><name>dnet</name></author>
		<category term="POSTCATEGORY" scheme="http://www.sixapart.com/ns/types#category"/>
		<content type="html" xml:lang="en" xml:base="http://techblog.vsza.hu"><![CDATA[
      <p>Years ago, there was a competition, where contestants had to extract date out
of a system that was protected by a state-of-the-art anti-leaking solution.
Such chall&#x65;nges are based on the fact that private information should be
available for use on a protected computer, but it must stay within the physical
boundaries of the system. Obvious methods like networking and removable storage
devices are usually covered by these mechanisms, but as with DRM, it's
difficult – if not impossible – to think of every possibility. For example,
in the aforementioned chall&#x65;nge, some guys used the audio output to get a
file off the box – and when I heard about the <a href="http://littlewire.cc/">Little Wire</a> project, I
started thinking about a new vector.</p>

<p>The requirements for my solution was to be able to extract data out of a</p>

<ul>
<li>Windows 7 box</li>
<li>with no additional software installed</li>
<li>logged in as non-administrative account</li>
<li>that only allows a display, a keyboard and a mouse to be connected.</li>
</ul>

<p>My idea was to use a USB HID device, since these can be connected to such
system without additional drivers or special privileges. I've already built
such a device for <a href="http://hsbp.org/fuji">interfacing JTAG</a> using an Arduino clone and <a href="http://www.obdev.at/products/vusb/">V-USB</a>,
so I could reuse both hardware and experience, but avoid using an external
programmer. The V-USB library made it possible for me to create an USB device
without buying purpose-built hardware, by bit-banging the USB protocol with
the general-purpose input and output pins of an AVR ATmega328 microcontroller.
When used correctly, the AVR device shows up as a regular keyboard in the
Windows 7 devices dialog.</p>

<p><img src="http://techblog.vsza.hu/images/usbpwn-win7.png" alt="USBpwn in the Windows 7 devices dialog" title="" /></p>

<p>Keyboard was a logical choice for data extraction, since it was the only part
of the HID specification that has a three bit wide output channel that's
controllable without drivers and/or administrative privileges: the NUM, CAPS
and SCROLL lock status LEDs. I've crafted a simple protocol that used NUM and
CAPS as two data bits and SCROLL as a clock signal. When the SCROLL LED was
on, the other two LEDs could be sampled for data. The newline (that could be
achieved by “pressing” the Enter/Return key, since we're already “keyboards”)
was the acknowledgement signal, making the protocol fully synchronous.
For example, the bits <code>1101</code> could be sent in the following way:</p>

<pre><code>            __________________________________
   NUM ____/
                               _______________
  CAPS _______________________/
                ______            ______
SCROLL ________/      \__________/      \_____

                  01                11
</code></pre>

<p>On the Windows host, an extractor agent was needed, that performed the transfer
using the following code snippet:</p>

<pre><code class="cpp">set_lock(NUM,  (frame &amp; 0x01) == 0x01);
set_lock(CAPS, (frame &amp; 0x02) == 0x02);
set_lock(SCROLL, 1);
getchar();
toggle_key(SCROLL);
</code></pre>

<p>Bits were sent from LSB to MSB, n bytes were sent from 0 to n-1, stored at the
nth position in the EEPROM. I tried using an SD card to store the data received,
but it <a href="http://electronics.stackexchange.com/questions/43401/how-can-v-usb-screw-up-the-built-in-spi-of-an-atmega328p">conflicted with the V-USB library</a>, so I had to use the built-in
EEPROM – the MCU I used was the ATmega328, which had 1 kbyte of it, which
limited the size of the largest file that could be extracted.</p>

<p>Of course, the aforementoned agent had to be placed on the Windows box before
transmitting file contents. The problem was similar to using dumb bindshell
shellcodes to upload binary content, and most people solved it by using
<code>debug.com</code>. Although it's there on most versions of Windows, it has its
limitations: the output file can be 64 kilobytes at maximum, and it requires
data to be typed using hexadecimal characters, which requires at least two
characters per byte.</p>

<p>In contrast, base64 requires only 4 characters per 3 bytes (33% overhead
instead of 100%), and there's a way to do it on recent Windows systems using
a good old friend of ours: Visual Basic. I created a <a href="https://github.com/dnet/base64-vbs.py/blob/master/base64.vbs">simple VBS skeleton</a>
that decodes base64 strings and saves the binary output to a file, and
another <a href="https://github.com/dnet/base64-vbs.py/blob/master/b64vbs.py">simple Python script</a> that fills the skeleton base64-encoded
content, and also compresses it (like JS and CSS minifiers on the web).
The output of the minified version is something like the one below.</p>

<pre><code class="vbscript">Dim a,b
Set a=CreateObject("Msxml2.DOMDocument.3.0").CreateElement("base64")
a.dataType="bin.base64"
a.text="TVpQAAIAAAAEAA8A//8AALgAAAAAAAAAQAAaAAAAAAAAAAAAAAAAAA..."
Set b=CreateObject("ADODB.Stream")
b.Type=1
b.Open
b.Write a.nodeTypedValue
b.SaveToFile "foo.exe",2
</code></pre>

<p>The result is such a solution that makes it possible to carry a Windows agent
(a simple <code>exe</code> program) that can be typed in from the Flash memory of the AVR,
which, when executed, can leak any file using the LEDs. I successfully
demonstrated these abilities at <a href="https://hacktivity.com/en/hacktivity-2012/programs/usb-universal-security-bug1/">Hacktivity 2012</a>, my slideshow is available
for download <a href="http://sil&#x65;ntsignal.hu/docs/S2_VSzA_Hacktivity2012.pdf">on the Sil&#x65;nt Signal homepage</a>, videos should be posted soon.
The hardware itself can be seen below, the self-made USB interface shield is
the same as <a href="http://vusb.wikidot.com/hardware#toc3">the one in the V-USB wiki hardware page</a>.</p>

<p><img src="http://techblog.vsza.hu/images/usbpwn-parts.jpg" alt="USBpwn hardware" title="" /></p>

<p>The hardware itself is bulky, and I won't try to make it smaller and faster
any time soon, since I've already heard enough people considering it
weaponized. Anyway, the proof-of-concept hardware and software solution</p>

<ul>
<li>can type in 13 characters per seconds from the flash memory of the AVR,</li>
<li>which results in 10 bytes per seconds (considering base64 encoding),</li>
<li>and after deploying the agent, it can read LEDs with 1.24 effective bytes per second.</li>
</ul>

<p>All the code is available in my GitHub repositories:</p>

<ul>
<li>the <a href="https://github.com/dnet/usbpwn-device">code running on the device</a>, written in C, released under OBDEV license,</li>
<li>the <a href="https://github.com/dnet/usbpwn-host">Windows agent</a>, also written in C, but released under MIT license, and</li>
<li>the <a href="https://github.com/dnet/base64-vbs.py">base64 VBS encoder</a>, written in Python, also released under MIT license.</li>
</ul>
]]></content>
	</entry>
	<entry>
		<title>DEF CON 20 CTF grab bag 300 writeup</title>
		<link rel="alternate" type="text/html" href="http://techblog.vsza.hu/posts/DEF_CON_20_CTF_grab_bag_300_writeup.html"/>
		<updated>2012-06-04T13:46:53+02:00</updated>
      <id>http://techblog.vsza.hu/posts/DEF_CON_20_CTF_grab_bag_300_writeup.html</id>
      <author><name>dnet</name></author>
		<category term="POSTCATEGORY" scheme="http://www.sixapart.com/ns/types#category"/>
		<content type="html" xml:lang="en" xml:base="http://techblog.vsza.hu"><![CDATA[
      <p>As a proud member of the Hungarian team called “senkihaziak”, I managed to
solve the following chall&#x65;nge for 300 points in the grab bag category on the
<a href="http://ddtek.biz/">20th DEF CON Capture The Flag</a> contest. The description consisted of an
IP address, a port number, a password, and a hint.</p>

<p><img src="http://techblog.vsza.hu/images/dc20-gb300-desc.png" alt="Description of the chall&#x65;nge" title="" /></p>

<p>Connecting with <a href="http://netcat.sourceforge.net/">netcat</a> to the specified IP address and port using TCP and
sending the password followed by a newline triggered the server to send back
the actual chall&#x65;nge, utilizing <a href="https://en.wikipedia.org/wiki/ANSI_escape_code">ANSI escape sequences</a> for colors.</p>

<p><img src="http://techblog.vsza.hu/images/dc20-gb300-netcat.png" alt="Output of netcat after connecting and sending the password" title="" /></p>

<p>As <a href="http://buhera.blog.hu/">Buherátor</a> pointed it out, the matrices are parts of a scheme designed
to <a href="http://oao.no/wpe/2010/11/hide-pin-codes-in-random-matrix/">hide PIN codes in random matrices</a> in which only the cardholder knows
which digits are part of the PIN code. The service sent three matrices for
which the PIN code was known and the chall&#x65;nge was to find the PIN code for
the fourth one. As we hoped, the position of the digits within the matrices
were the same for all four, so all we needed to do was to find a set of valid
positions for each matrix, and apply their intersection to the fourth. I chose
Python for the task, and began with connecting to the service.</p>

<pre><code>PW = '5fd78efc6620f6\n'
TARGET = ('140.197.217.85', 10435)
PROMPT = 'Enter ATM PIN:'

def main():
  with closing(socket.socket()) as s:
    s.connect(TARGET)
    s.send(PW)
    buf = ''
    while PROMPT not in buf:
      buf += s.recv(4096)
    pin = buffer2pin(buf)
    s.send(pin + '\n')
</code></pre>

<p>The <code>buffer2pin</code> function parses the response of the service and returns the
digits of the PIN code, separated with spaces. First, the ANSI escape sequences
are stripped from the input buffer. Then, the remaining contents are split into
an array of lines (<code>buf.split('\n')</code>), trailing and leading whitespaces
get stripped (<code>imap(str.strip, ...)</code>), and finally, lines that doesn't contain
a single digit surrounded with spaces get filtered out.</p>

<pre><code>ESCAPE_RE = re.compile('\x1b\\[0;[0-9]+;[0-9]+m')
INTERESTING_RE = re.compile(' [0-9] ')

def buffer2pin(buf):
  buf = ESCAPE_RE.sub('', buf)
  buf = filter(INTERESTING_RE.search, imap(str.strip, buf.split('\n')))
  ...
</code></pre>

<p>By now, <code>buf</code> contains strings like <code>'3 5 8  4 1 2'</code> and
<code>'User entered: 4 5 2 7'</code>, so it's time to build the sets of valid positions.
The initial sets contain all valid numbers, and later, these sets get updated
with an intersection operation. For each example (a matrix with a valid PIN
code) the script joins the six lines of the matrix and removes all spaces.
This results in <code>base</code> holding 36 digits as a string. Finally, the innen <code>for</code>
loop iterates over the four digits in the last line of the current example
(User entered: <em>4 5 2 7</em>) and finds all occurences in the matrix. The resulting
list of positions is intersected with the set of valid positions for the
current digit (<code>sets[n]</code>). I know that using regular expressions for this
purpose is a little bit of an overkill, but it's
<a href="http://stackoverflow.com/questions/4664850/find-all-occurrences-of-a-substring-in-python">the least evil of the available solutions</a>.</p>

<pre><code>EXAMPLES = 3
DIGITS = 4
INIT_RANGE = range(36)

def buffer2pin(buf):
  ...
  sets = [set(INIT_RANGE) for _ in xrange(DIGITS)]
  for i in xrange(EXAMPLES):
    base = ''.join(buf[i * 7:i * 7 + 6]).replace(' ', '')
    for n, i in enumerate(ifilter(str.isdigit, buf[i * 7 + 6])):
      sets[n].intersection_update(m.start() for m in re.finditer(i, base))
  ...
</code></pre>

<p>The only thing that remains is to transform the fourth matrix into a 36 chars
long string like the other three, and pick the digits of the resulting PIN code
using the sets, which – hopefully – only contain one element each by now.</p>

<pre><code>def buffer2pin(buf):
  ...
  quest = ''.join(buf[3 * 7:3 * 7 + 6]).replace(' ', '')
  return ' '.join(quest[digit.pop()] for digit in sets)
</code></pre>

<p>The resulting script worked almost perfectly, but after the first run, we found
out that after sending a correct PIN code, several more chall&#x65;nges were sent,
so the whole logic had to be put in an outer loop. The <a href="https://gist.github.com/665d29f9282644726726">final script</a> can be
found on Gist, and it produced the following output, resulting in 300 points.</p>

<p><img src="http://techblog.vsza.hu/images/dc20-gb300-key.png" alt="Result of a successful run, displaying the key" title="" /></p>
]]></content>
	</entry>
	<entry>
		<title>Mounting Sympa shared directories with FUSE</title>
		<link rel="alternate" type="text/html" href="http://techblog.vsza.hu/posts/Mounting_Sympa_shared_directories_with_FUSE.html"/>
		<updated>2012-03-29T17:35:37+02:00</updated>
      <id>http://techblog.vsza.hu/posts/Mounting_Sympa_shared_directories_with_FUSE.html</id>
      <author><name>dnet</name></author>
		<category term="POSTCATEGORY" scheme="http://www.sixapart.com/ns/types#category"/>
		<content type="html" xml:lang="en" xml:base="http://techblog.vsza.hu"><![CDATA[
      <p>The <a href="https://www.db.bme.hu/">database laboratory course</a> at the Budapest University of Technology
and Economics which I collaborate with as a lecturer uses Sympa for mailing
lists and file sharing. Latter is not one of the most used features of this
software, and the web interface feels sluggish, not to mention the lots of
leftover files in my Downloads directory for each attempt to view one page
of a certain file. I understood that using the same software for these two
tasks made managing user accounts easier, so I tried to come up with a
solution that makes it easier to handle these files with the existing setup.</p>

<p>First, I searched whether an API for Sympa exists and I found that while they
created the <a href="https://www.sympa.org/manual/soap">Sympa SOAP server</a>, it only handles common use-cases related
to mailing lists management, so it can be considered a dead end. This meant
that my solution had to use the web interface, so I selected an old and a new
tool for the task: <a href="http://lxml.de/">LXML</a> for parsing, since I already knew of its power,
and <a href="http://docs.python-requests.org/">requests</a> for handling HTTP, because of its fame. These two tools made
it possible to create half of the solution first, resulting in a <a href="https://github.com/dnet/sympa-python-api/blob/master/sympa.py">Sympa API</a>
that can be used independently of the file system bridge.</p>

<p>Two things I found particularly great about requests were that its handling of
sessions was superior than any APIs I've ever seen, and that it was possible to
retrieve the <a href="http://docs.python-requests.org/en/v0.10.7/user/quickstart/#response-content">results in multiple formats</a> (raw socket, bytes, Unicode text).
Since I only had one Sympa installation to test with, I only hacked the code so
far to make it work, so for example, I had to use regular expressions to strip
the XML <em>and</em> HTML encoding information, since both stated <code>us-ascii</code> while the
output was in ISO-8859-2, correctly stated in the HTTP <code>Content-type</code> header.</p>

<p>In the second half of the time, I had to create a bridge between the file
system and the API I created, and <a href="http://fuse.sourceforge.net/">FUSE</a> was my natural choice. Choosing the
Python binding was not so easy, as a Debian user, the <code>python-fuse</code> package
seemed like a logical choice, but as <a href="http://stackoverflow.com/users/149482/matt-joiner">Matt Joiner</a> wrote in his answer on a
<a href="http://stackoverflow.com/a/5044703/246098">related Stack Overflow question</a>, <a href="http://code.google.com/p/fusepy/">fusepy</a> was a better choice. Using
one of the examples, I managed to build an experimental version of
<a href="https://github.com/dnet/sympa-python-api/blob/master/sympafs.py">SympaFS</a> with naive caching and session management, but it works!</p>

<pre><code class="no-highlight">&#x24; mkdir /tmp/sympa
&#x24; python sympafs.py https://foo.tld/lists foo@bar.tld adatlabor /tmp/sympa
Password:
&#x24; mount | fgrep sympa
SympaFS on /tmp/sympa type fuse (rw,nosuid,nodev,relatime,user_id=1000,
group_id=1000)
&#x24; ls -l /tmp/sympa/2012
összesen 0
-r-xr-xr-x 1 root root  11776 febr   9 00:00 CensoredFile1.doc
-r-xr-xr-x 1 root root 161792 febr  22 00:00 CensoredFile2.xls
-r-xr-xr-x 1 root root  39424 febr   9 00:00 CensoredFile3.doc
dr-xr-xr-x 2 root root      0 febr  14 00:00 CensoredDir1
dr-xr-xr-x 2 root root      0 ápr    4  2011 CensoredDir2
&#x24; file /tmp/sympa/2012/CensoredFile1.doc
Composite Document File V2 Document, Little Endian, Os: Windows, Version
5.1, Code page: 1252, Author: Censored, Last Saved By: User, Name of
Creating Application: Microsoft Excel, Last Printed: Tue Feb 14 15:00:39
2012, Create Time/Date: Wed Feb  8 21:51:10 2012, Last Saved Time/Date:
Wed Feb 22 08:10:20 2012, Security: 0
&#x24; fusermount -u /tmp/sympa
</code></pre>
]]></content>
	</entry>
	<entry>
		<title>Reverse engineering chinese scope with USB</title>
		<link rel="alternate" type="text/html" href="http://techblog.vsza.hu/posts/Reverse_engineering_chinese_scope_with_USB.html"/>
		<updated>2012-03-04T23:42:00+01:00</updated>
      <id>http://techblog.vsza.hu/posts/Reverse_engineering_chinese_scope_with_USB.html</id>
      <author><name>dnet</name></author>
		<category term="POSTCATEGORY" scheme="http://www.sixapart.com/ns/types#category"/>
		<content type="html" xml:lang="en" xml:base="http://techblog.vsza.hu"><![CDATA[
      <p>The members of <a href="http://hsbp.org">H.A.C.K.</a> – one of the less wealthy hackerspaces – felt happy
at first, when the place could afford to buy a slightly used <a href="http://www.uni-trend.com/UT2025B.html">UNI-T UT2025B</a>
digital storage oscilloscope. Besides being useful as a part of the infrastructure,
having a USB and an RS-232 port seized our imagination – one of the interesting
use-cases is the ability to capture screenshots from the device to illustrate
documentation. As I tried interfacing the device, I found that supporting
multiple platforms meant Windows XP and 2000 for the developers, which are not
very common in the place.</p>

<p>I installed the original software in a virtual machine, and tried the serial
port first, but found out, that although most of the functionality worked,
taking screenshots is one available only using USB. I connected the scope
using USB next, and although the vendor-product tuple was present in the
list of USB IDs, so <code>lsusb</code> could identify it, no drivers in the kernel tried
to take control of the device. So I started looking for USB sniffing software
and found that on Linux, <a href="http://wiki.wireshark.org/CaptureSetup/USB">Wireshark is capable of doing just that</a>.
I forwarded the USB device into the VM and captured a screenshot transmission
for analysis. Wireshark was very handy during analysis as well – just like in
case of TCP/IP – so it was easy to spot the multi-kilobyte bulk transfer among
tiny 64 byte long control packets.</p>

<p><img src="http://techblog.vsza.hu/images/wireshark-usb.png" alt="Wireshark analysis of screenshot transmission via USB" title="" /></p>

<p>I started looking for simple ways to reproduce the exact same conversation
using free software – I've used <a href="http://www.libusb.org/">libusb</a> before while experimenting with
<a href="http://www.obdev.at/products/vusb/index.html">V-USB</a> on the <a href="http://hsbp.org/fuji">Free USB JTAG interface</a> project, but C requires
compilation, and adding things like image processing makes the final product
harder to use on other computers. For these purposes, I usually choose Python,
and as it turned out, the <a href="http://pyusb.sourceforge.net/">PyUSB</a> library makes it possible to access
libusb 0.1, libusb 1.0 and OpenUSB through a single pythonic layer. Using this
knowledge, it was pretty straightforward to modify their
<a href="https://github.com/walac/pyusb/blob/master/docs/tutorial.rst">getting started example</a> and replicate the “PC end” of the conversation.
The core of the <a href="https://github.com/dnet/ut2025b/blob/master/getshot.py">resulting code</a> is the following.</p>

<pre><code class="python">dev = usb.core.find(idVendor=0x5656, idProduct=0x0832)
if dev is None:
    print &gt;&gt;sys.stderr, 'USB device cannot be found, check connection'
    sys.exit(1)

dev.set_configuration()
dev.ctrl_transfer(ReqType.CTRL_OUT, 177, 0x2C, 0)
dev.ctrl_transfer(ReqType.CTRL_IN, 178, 0, 0, 8)
for i in [0xF0] + [0x2C] * 10 + [0xCC] * 10 + [0xE2]:
    dev.ctrl_transfer(ReqType.CTRL_OUT, 177, i, 0)

try:
    dev.ctrl_transfer(ReqType.CTRL_OUT, 176, 0, 38)
    for bufsize in [8192] * 4 + [6144]:
        buf = dev.read(Endpoint.BULK_IN, bufsize, 0)
        buf.tofile(sys.stdout)
    dev.ctrl_transfer(ReqType.CTRL_OUT, 177, 0xF1, 0)
except usb.core.USBError:
    print &gt;&gt;sys.stderr, 'Image transfer error, try again'
    sys.exit(1)
</code></pre>

<p>Using this, I managed to get a binary dump of 38912 bytes, which contained
the precious screenshot. From my experience with the original software, I
already knew that the resolution is 320 by 240 pixels – which meant that
4 bits made up each pixel. Using this information, I started generating
bitmaps from the binary dump in the hope of identifying some patterns
visually as I already knew what was on the screen. The first results
were the result of converting each 4-bit value to a pixel coloured on a
linear scale from 0 = black to 15 = white, and looked like the following.</p>

<p><img src="http://techblog.vsza.hu/images/scope-trial.png" alt="Early version of a converted screenshot" title="" /></p>

<p>Most of the elements looked like they're in the right spot, and both
horizontal and vertical lines seemed intact, apart from the corners.
Also, the linear mapping resulted in an overly bright image, and as it
seemed, the firmware was transmitting 4-bit (16 color) images, even though
the device only had a monochrome LCD – and the Windows software downgraded
the quality before displaying it on the PC on purpose. After some fiddling,
I figured out that the pixels were transmitted in 16-bit words, and the
order of the pixels inside these were 3, 4, 1, 2 (“mixed endian”). After I
added code to compensate for this and created a more readable color mapping
I finally had a <a href="https://github.com/dnet/ut2025b/blob/master/pd2png.py">script that could produce colorful PNGs out of the BLOBs</a>,
see below for an example.</p>

<p><img src="http://techblog.vsza.hu/images/scope-colorful.png" alt="Final version of a converted screenshot" title="" /></p>

<p>In the end, my solution is not only free as in both senses and runs on
more platforms, but can capture 8 times more colors than the original one.
All code is published under MIT license, and further contributions are welcome
both on <a href="https://github.com/dnet/ut2025b">the GitHub repository</a> and the <a href="http://hsbp.org/ut2025b">H.A.C.K. wiki page</a>. I also
gave a talk about the project in Hungarian, the video recording and the
slides can be found on the bottom of the wiki page.</p>
]]></content>
	</entry>
	<entry>
		<title>Mangling RSS feeds with Python</title>
		<link rel="alternate" type="text/html" href="http://techblog.vsza.hu/posts/Mangling_RSS_feeds_with_Python.html"/>
		<updated>2011-10-28T19:57:38+02:00</updated>
      <id>http://techblog.vsza.hu/posts/Mangling_RSS_feeds_with_Python.html</id>
      <author><name>dnet</name></author>
		<category term="POSTCATEGORY" scheme="http://www.sixapart.com/ns/types#category"/>
		<content type="html" xml:lang="en" xml:base="http://techblog.vsza.hu"><![CDATA[
      <p>There are blogs on the web, that are written/configured in a way, that the RSS
or Atom feed contains only a teaser (or no content at all), and one must open
a link to get the real content – and thus load all the crap on the page,
something RSS feeds were designed to avoid. <a href="http://dittygirl.wordpress.com">Dittygirl</a> has added one of
those sites in her feed reader, and told me that it takes lots of resources on
her netbook to load the whole page – not to mention the discomfort of leaving
the feed reader.</p>

<p>I accepted the chall&#x65;nge, and decided to write a Python RSS gateway in less
than 30 minutes. I chose <a href="http://webpython.codepoint.net/wsgi_application_interface">plain WSGI</a>, something I wanted to play with, and
this project was a perfect match for its simplicity and lightweightness. Plain
WSGI applications are Python modules with a callable named <code>application</code>, which
the webserver will call every time, an HTTP request is made. The callable gets
two parameters,</p>

<ul>
<li>a dictionary of environment values (including the Path of the query,
IP address of the browser, etc.), and</li>
<li>a callable, which can be used to signal the web server about the progress.</li>
</ul>

<p>In this case, the script ignores the path, so only the second parameter is used.</p>

<pre><code>def application(environ, start_response):
  rss = getfeed()
  response_headers = [('Content-Type', 'text/xml; charset=UTF-8'),
                      ('Content-Length', str(l&#x65;n(rss)))]
  start_response('200 OK', response_headers)
  return [rss]
</code></pre>

<p>Simple enough, the function emits a successful HTTP status, the necessary
headers, and returns the content. The list (array) format is needed because a
WSGI application can be a generator too (using a yield statement), which can
be handy when rendering larger content, so the server expects an iterable result.</p>

<p>The real “business logic” is in the <code>getfeed</code> function, which first tries to
load a cache, to avoid abusing the resources of the target server. I chose JSON
as it's included in the standard Python libraries, and easy to debug.</p>

<pre><code>try:
  with open(CACHE, 'rb') as f:
    cached = json.load(f)
  etag = cached['etag']
except:
  etag = ''
</code></pre>

<p>Next, I load the original feed, using the cached ETag value to encourage
<a href="http://ruturajv.wordpress.com/2005/12/27/conditional-get-request/">conditional HTTP GET</a>. The <code>urllib2.urlopen</code> function can operate on a
<code>Request</code> object, which takes a third parameter, that can be used to add HTTP
headers. If the server responds with a <code>HTTP 304 Not Modified</code>, <code>urlopen</code>
raises an <code>HTTPError</code>, and the script knows that the cache can be used.</p>

<pre><code>try:
  feedfp = urlopen(Request('http://HOSTNAME/feed/',
      None, {'If-None-Match': etag}))
except HTTPError as e:
  if e.code != 304:
    raise
  return cached['content'].encode('utf-8')
</code></pre>

<p>I used <a href="http://lxml.de/">lxml</a> to handle the contents, as it's a really convenient and fast
library for XML/HTML parsing and manipulation. I compiled the <a href="http://www.w3schools.com/xpath/">XPath</a>
queries used for every item in the head of the module for performance reasons.</p>

<pre><code class="python">GUID = etree.XPath('guid/text()')
IFRAME = etree.XPath('iframe')
DESC = etree.XPath('description')
</code></pre>

<p>To avoid unnecessary copying, lxml's etree can parse the object returned by
<code>urlopen</code> directly, and returns an object, which behaves like a DOM on steroids.
The <code>GUID</code> XPath extracts the URL of the current feed item, and the HTML parser
of lxml takes care of it. The actual contents of the post is helpfully put in a
<code>div</code> with the class <code>post-content</code>, so I took advantage of lxml's HTML helper
functions to get the <code>div</code> I needed.</p>

<p>While I was there, I also removed the first <code>iframe</code> from the post, which
contains the Facebook <del>tracker bug</del> Like button. Finally, I
cleared the <code>class</code> attribute of the <code>div</code> element, and serialized its contents
to HTML to replace the useless description of the feed item.</p>

<pre><code>feed = etree.parse(feedfp)
for entry in feed.xpath('/rss/channel/item'):
  ehtml = html.parse(GUID(entry)[0]).getroot()
  div = ehtml.find_class('post-content')[0]
  div.remove(IFRAME(div)[0])
  div.set('class', '')
  DESC(entry)[0].text = etree.CDATA(etree.tostring(div, method="html"))
</code></pre>

<p>There are two things left. First, the URL that points to the feed itself needs
to be modified to produce a valid feed, and the result needs to be serialized
into a string.</p>

<pre><code class="python">link = feed.xpath('/rss/channel/a:link',
  namespaces={'a': 'http://www.w3.org/2005/Atom'})[0]
link.set('href', 'http://URL_OF_FEED_GATEWAY/')
retval = etree.tostring(feed)
</code></pre>

<p>The second and final step is to save the ETag we got from the HTTP response and
the transformed content to the cache in order to minimize the amount of
resources (ab)used.</p>

<pre><code class="python">with open(CACHE, 'wb') as f:
  json.dump(dict(etag=feedfp.info()['ETag'], content=retval), f)
return retval
</code></pre>

<p>You might say, that it's not fully optimized, the design is monolithic, and so
on – but it was done in less than 30 minutes, and it's been working perfectly
ever since. It's a typical quick-and-dirty hack, and although it contains no
technical breakthrough, I learned a few things, and I hope someone else might
also do by reading it. Happy hacking!</p>
]]></content>
	</entry>
	<entry>
		<title>Secure web services with Python part 1 - UserNameToken</title>
		<link rel="alternate" type="text/html" href="http://techblog.vsza.hu/posts/Secure_web_services_with_Python_part_1_-_UserNameToken.html"/>
		<updated>2011-10-20T14:23:08+02:00</updated>
      <id>http://techblog.vsza.hu/posts/Secure_web_services_with_Python_part_1_-_UserNameToken.html</id>
      <author><name>dnet</name></author>
		<category term="POSTCATEGORY" scheme="http://www.sixapart.com/ns/types#category"/>
		<content type="html" xml:lang="en" xml:base="http://techblog.vsza.hu"><![CDATA[
      <p>In 2004 <a href="http://www.oasis-open.org/">OASIS</a> created <a href="https://en.wikipedia.org/wiki/WS-Security">WS-Security</a>, which describes several techniques
for securing web services. The simplest is <a href="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-username-token-profile-1.0.pdf">UsernameToken</a> (PDF warning),
which can be thought of as the equivalent of HTTP authentication in the SOAP
world – the client supplies a username and a password, and latter can be
transmitted either in cleartext or in a digested form.</p>

<p>The digest algorithm is quite simple (<code>Base64(SHA-1(nonce + created + password))</code>)
and by using a nonce, this protocol can prevent replay attacks, while a timestamp
can reduce the memory requirements since nonces can expire after a specified
amount of time. A sample envelope can be seen below, I removed the longish URLs
for the sake of readability, these can be found in the PDF linked in the
previous paragraph. If you're into security, you can try to guess the password
based on the username, and then try to verify the digest based on that. ;)</p>

<pre><code>&lt;soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"&gt;
 &lt;soap:Header&gt;
  &lt;wsse:Security xmlns:wsse="..." xmlns:wsu="..." soap:mustUnderstand="1"&gt;
   &lt;wsse:UsernameToken wsu:Id="UsernameToken-3"&gt;
    &lt;wsse:Username&gt;admin&lt;/wsse:Username&gt;
    &lt;wsse:Password Type="...#PasswordDigest"&gt;fTI7fNcwD69Z3dOT1bYfvSbQPb8=&lt;/wsse:Password&gt;
    &lt;wsse:Nonce EncodingType="...#Base64Binary"&gt;1DLfpq3fLJ5O8Dlrnr4blQ==&lt;/wsse:Nonce&gt;
    &lt;wsu:Created&gt;2011-05-05T17:20:22.319Z&lt;/wsu:Created&gt;
   &lt;/wsse:UsernameToken&gt;
  &lt;/wsse:Security&gt;
 &lt;/soap:Header&gt;
 &lt;soap:Body&gt;
  ...
 &lt;/soap:Body&gt;
&lt;/soap:Envelope&gt;
</code></pre>

<p>Python had little support for UsernameToken, <a href="https://fedorahosted.org/suds/">SUDS</a>, the preferred web
service client mentioned cleartext support in <a href="https://fedorahosted.org/suds/wiki/Documentation#WS-SECURITY">their documentation</a>, so
I set up a simple service using <a href="http://cxf.apache.org/">Apache CXF</a> and tried to access it. As it
turned out, the implementation violated the OASIS standard by not specifying
the <code>Type</code> attribute of the <code>Password</code> element, which would've indicated
whether the password were transmitted in cleartext or in a digested form.</p>

<p>It was a trivial fix, and while I was there, I also <a href="https://github.com/dnet/suds/commit/eb977c620be53caf71a7c3c74c960f7ccaacecae">added a standards-compliant
digest support</a>, and tested it with Apache CXF. I <a href="https://lists.fedoraproject.org/pipermail/suds/2011-May/001447.html">sent a patch</a> to the
SUDS mailing list in May 2011, but got no response ever since, so I have no
information if/when this improvement will get into mainline SUDS.</p>

<p>On the server side, things got trickier. The preferred Python web service
implementation is <a href="https://github.com/soaplib/soaplib">soaplib</a>/<a href="https://github.com/arskom/rpclib">rpclib</a>, and I did some research, whether
it's possible to implement UsernameToken support in it. It turned out, that
there's a project called <a href="http://sec-wall.gefira.pl/">sec-wall</a> which takes this to a whole new level
by creating a (reverse) proxy, and this way, security can be detached from
the service to another layer, which also satisfies the UNIX part of my mind.</p>

<p><img src="http://techblog.vsza.hu/images/sec-wall-overview.png" alt="Overview of sec-wall" title="" /></p>

<p>I started hacking on sec-wall, first with some <a href="http://bazaar.launchpad.net/~sec-wall-dev/sec-wall/trunk/revision/190">code cleanup</a>, then I
managed to fix up the codebase so that all test passed on Python 2.7, too.
After getting myself familiar with the project, I created an environment
with Soaplib as a server, sec-wall as the proxy, SUDS as a client, and
tried both UsernameToken configurations. It worked pretty well, with minor
glitches, such as sec-wall expecting a nonce and creation time even when
cleartext password was used. I helped the developer, <a href="http://www.gefira.pl/blog/">Dariusz Suchojad</a>
<a href="http://bazaar.launchpad.net/~sec-wall-dev/sec-wall/trunk/revision/192">fixing the problem</a>, so in the end, I could create a pure Python
solution utilizing UsernameToken to secure webservices.</p>

<p>That previous sentence could be a great one to end this post with, so this
paragraph is kind of an extra for those who kept on reading. The current
<a href="http://bazaar.launchpad.net/~sec-wall-dev/sec-wall/trunk/view/head:/code/src/secwall/wsse.py">WSSE implementation in sec-wall</a> “lets all nonces in”, so I created a
class that overrode this implementation using <a href="http://memcached.org/">memcached</a>. There are two
Python clients for it, so I developed and tested both. Below is the code for
<a href="http://www.tummy.com/Community/software/python-memcached/">python-memcached</a>, which is pure Python, whereas <a href="http://sendapatch.se/projects/pylibmc/">pylibmc</a> uses
native code, but mimics the interface of former, so only the second line needs
to be changed to switch between the implementations.</p>

<pre><code>from secwall.wsse import WSSE
from memcache import Client

class WSSEmc(WSSE):
  keyfmt = 'WSSEmc_nonce_{0}'

  def __init__(self):
    self.mc = Client(['127.0.0.1:11211'], debug=0)

  def check_nonce(self, wsse_nonce, now, nonce_freshness_time):
    if not nonce_freshness_time:
      return False
    key = self.keyfmt.wsse_nonce
    if self.mc.get(key):
      return True
    self.mc.set(key, '1', time=nonce_freshness_time)
    return False
</code></pre>

<p>I hope to publish at least a second part in this subject, focusing on digital
signatures in the next two months (it's part of my <a href="https://diplomaterv.vik.bme.hu/Theses/Python-nyelvu-web-szolgaltatasok-kibovitese">Masters thesis</a>,
which is due December 9, 2011).</p>
]]></content>
	</entry>
	<entry>
		<title>Optimizing Django ORM in f33dme</title>
		<link rel="alternate" type="text/html" href="http://techblog.vsza.hu/posts/Optimizing_Django_ORM_in_f33dme.html"/>
		<updated>2011-10-19T12:47:18+02:00</updated>
      <id>http://techblog.vsza.hu/posts/Optimizing_Django_ORM_in_f33dme.html</id>
      <author><name>dnet</name></author>
		<category term="POSTCATEGORY" scheme="http://www.sixapart.com/ns/types#category"/>
		<content type="html" xml:lang="en" xml:base="http://techblog.vsza.hu"><![CDATA[
      <p>As I was hacking around with <a href="https://github.com/dcramer/django-oursql">django-oursql</a> vs <a href="https://github.com/asciimoo/f33dme">f33dme</a>, I started sniffing
the network traffic between the Python process and the MySQL server to follow up
on <a href="https://bugs.launchpad.net/oursql/+bug/838120">a bug</a> in <a href="https://launchpad.net/oursql">oursql</a>. I found that the following queries (yes, plural!)
ran every time a feed item was marked as read.</p>

<pre><code>SELECT &#96;f33dme_item&#96;.&#96;id&#96;, &#96;f33dme_item&#96;.&#96;title&#96;, &#96;f33dme_item&#96;.&#96;content&#96;,
  &#96;f33dme_item&#96;.&#96;url&#96;, &#96;f33dme_item&#96;.&#96;date&#96;, &#96;f33dme_item&#96;.&#96;added&#96;,
  &#96;f33dme_item&#96;.&#96;feed_id&#96;, &#96;f33dme_item&#96;.&#96;score&#96;, &#96;f33dme_item&#96;.&#96;archived&#96;
FROM &#96;f33dme_item&#96; WHERE &#96;f33dme_item&#96;.&#96;id&#96; = ?

SELECT (1) AS &#96;a&#96; FROM &#96;f33dme_item&#96; WHERE &#96;f33dme_item&#96;.&#96;id&#96; = ?  LIMIT 1

UPDATE &#96;f33dme_item&#96; SET &#96;title&#96; = ?, &#96;content&#96; = ?, &#96;url&#96; = ?, &#96;date&#96; = ?,
  &#96;added&#96; = ?, &#96;feed_id&#96; = ?, &#96;score&#96; = ?, &#96;archived&#96; = ?
WHERE &#96;f33dme_item&#96;.&#96;id&#96; = ?
</code></pre>

<p>The above queries not only multiply the round-trip overhead by three, but the
first and the last ones generate quite a bit of a traffic, by sending the content
of all the fields (including the <code>content</code> which might contain a full-blown blog
post like this) to and from the ORM, respectively. The innocent-looking lines
of code that generated them were the following ones.</p>

<pre><code>item = Item.objects.get(id=item_id)
if not item:
  return HttpResponse('No item found')
item.archived = True
item.save()
</code></pre>

<p>By looking at the queries above first, it's pretty clear, that the <code>get</code> method
needs to query all the columns, since later code might access any of the fields.
The same can be said about the <code>update</code>, which knows nothing about the contents
of the database – it even has to check if a row with the ID specified
exists to figure out whether to use an <code>INSERT</code> or and <code>UPDATE</code> DML query.</p>

<p>Of course, the developers of the Django ORM met this situation as well, and
even documented it along with nice examples in the <a href="https://docs.djangoproject.com/en/dev/ref/models/querysets/#update">QuerySet API reference</a>.
All I needed was to adapt the code a little bit, and as the documentation states,
the <code>update</code> method even “returns the number of affected rows”, so the check for
the existence of the item can be preserved. The improved code is the following.</p>

<pre><code>if Item.objects.filter(id=item_id).update(archived=True) != 1:
  return HttpResponse('No item found')
</code></pre>

<p>The first line is a bit longish (although still 62 characters only), but
replaced four lines of the original code. When read carefully, one might even
find it readable, and it produces the following SQL queries in the background.</p>

<pre><code>UPDATE &#96;f33dme_item&#96; SET &#96;archived&#96; = ? WHERE &#96;f33dme_item&#96;.&#96;id&#96; = ?
</code></pre>
]]></content>
	</entry>
</feed>
