VSzA techblog

Using MS Word templates with LaTeX quickly

2012-09-12

After a successful penetration test, I wanted to publish a detailed writeup about it, but the template we use at the company that includes a logo and some text in the footer was created using Microsoft Word, and I prefer using LaTeX for typesetting. It would have been possible to recreate the template from scratch, but I preferred to do it quick and, as it turned out, not so dirty.

First, I saved a document written using the template from Word to PDF, opened it up in Inkscape and removed the body (e.g. everything except the header and the footer). Depending on the internals of the PDF saving mechanism, it might be necessary to use ungroup one or more times to avoid removing more than needed. After this simple editing, I saved the result as another PDF, called s2bg.pdf.

Next, I created a file named s2.sty with the following lines.

\ProvidesPackage{s2}

\RequirePackage[top=4cm, bottom=2.8cm, left=2.5cm, right=2.5cm]{geometry}
\RequirePackage{wallpaper}
\CenterWallPaper{1}{s2bg.pdf}

The first line sets the package name, while the next three adjust the margins (which I calculated by using the ones set in Word and some trial and error) and put the PDF saved in Inkscape to the background of every page. The wallpaper package is available in the texlive-latex-extra package on Debian systems.

As our company uses a specific shade of orange as a primary color, I also changed the \section command to use this color for section headings.

\RequirePackage{color}
\definecolor{s2col}{RGB}{240, 56, 31}

\makeatletter
\renewcommand{\section}{\@startsection{section}{1}{0mm}
{\baselineskip}%
{\baselineskip}{\normalfont\Large\sffamily\bfseries\color{s2col}}}%
\makeatother

Creating a package comes with the advantage, that only a single line needs to be added to a document to use all the formatting described above, just like with CSS. The following two documents only differ such that the one on the right has an extra \usepackage{s2} line in the header.

Same document without and with style package

Two documents published with this technique (although written in Hungarian) can be downloaded: the aforementioned writeup about client-side attacks and another one about things we did in 2011.


Installing CAcert on Android without root

2012-08-16

I've been using CAcert for securing some of my services with TLS/SSL, and when I got my Android phone I chose K-9 mail over the stock e-mail client because as the certificate installation page on the official CAcert site stated, it required root access to access the system certificate store. Now, one year and two upgrades (ICS, JB) later, I revisited the issue.

As of this writing, the CAcert site contains another method that also requires root access, but as Jethro Carr wrote in his blog, since at least ICS, it's possible to install certificates without any witchcraft, using not only PKCS12 but also PEM files. Since Debian ships the CAcert bundle, I used that file, but it's also possible to download the files from the official CAcert root certificate download page. Since I have Android SDK installed, I used adb (Android Debug Bridge) to copy the certificate to the SD card, but any other method (browser, FTP, e-mail, etc.) works too.

$ adb push /usr/share/ca-certificates/cacert.org/cacert.org.crt /sdcard/
2 KB/s (5179 bytes in 1.748s)

On the phone, I opened Settings > Security, scrolled to the bottom, and selected Install from storage. It prompted for a name of the certificate, and installed the certificate in a second without any further questions asked.

Installing the CAcert root certificate on Android

After this, the certificate can be viewed and by opening Trusted credentials and selecting the User tab, and browsing an HTTPS site with a CAcert-signed certificate becomes just as painless and secure as with any other built-in CA.

Using CAcert root certificate on Android


Extending Wireshark MySQL dissector

2012-06-10

I consider Wireshark one of the most successful FLOSS projects, and it outperforms many tools in the field of packet capture (USB sniffing is one of my favorite example), it is certainly the best tool I know for packet analysis, supportes by its many protocol-specific plugins called dissector. These fast little C/C++ libraries do one thing, and do that well by extracting all available information from packets of a certain protocol in a robust way.

In case of every network-related problem, Wireshark is one of the top three tools I use, since the analysis of network traffic provides an invaluable insight. This was also the case while I was experimenting with oursql, which provides Python bindings for libmysqlclient, the native MySQL client library, as an alternative to MySQLdb. While latter emulates client-side parameter interpolation (as JDBC does it too), former exploits the server-side prepared statements available since at least MySQL 4.0 (released in August 2002). The problem with Wireshark was that although it had dissector support for MySQL packets, dissection of COM_EXECUTE packets resulted in the following message.

Wireshark dissecting a MySQL execute packet before r39483

As the documentation of the MySQL ClientServer protocol states, COM_EXECUTE packets depend on information exchanged upon preparing the statement, which means that the dissector needed to be transformed to be stateful. It seemed that the original author of the MySQL dissector started working on the problem but then decided to leave it in an incomplete state.

To become familiar with the development of Wireshark, I started with the flags field of the COM_EXECUTE packet. The documentation linked above states, that although it was reserved for future use in MySQL 4.0, in later versions, it carries a meaningful value. However, the dissector always decoded the field with MySQL 4.0, regardless of the version actually used.

At first, I just added a new decoder and submitted a patch in the Wireshark Bug Database that decoded the four possible values according to the protocol documentation. Bill Meier took a look at it, and asked whether I could change the behaviour to treat version 4 and 5 differently, offering pointers with regard to storing per-connection data, effectively making the dissector stateful. I improved the patch and it finally made it into SVN.

Knowing this, I started implementing the dissection of the fields left, namely null_bit_map, new_parameter_bound_flag, type and values. The difficulty was that the lenght and/or offset of these packets depended on the number of placeholders sent during preparing the statement and also the number and index of parameters that were bound by binary streaming (COM_LONG_DATA packets). Since there already was a GHashTable member named smtms declared in the mysql_conn_data struct of per-connection data, and it was initialized upon the start of dissection and destroyed upon MYSQL_STMT_CLOSE and MYSQL_QUIT, and a my_stmt_data struct was declared with an nparam member, I just filled in the gaps by storing the number of parameters for every packet received in response to a COM_PREPARE.

diff --git a/packet-mysql.c b/packet-mysql.c
index 3adc116..9c71409 100644
--- a/packet-mysql.c
+++ b/packet-mysql.c
@@ -1546,15 +1546,22 @@ mysql_dissect_row_packet(tvbuff_t *tvb, int offset, proto_tree *tree)
 static int
 mysql_dissect_response_prepare(tvbuff_t *tvb, int offset, proto_tree *tree, mysql_conn_data_t *conn_data)
 {
+       my_stmt_data_t *stmt_data;
+       gint stmt_id;
+
        /* 0, marker for OK packet */
        offset += 1;
        proto_tree_add_item(tree, hf_mysql_stmt_id, tvb, offset, 4, ENC_LITTLE_ENDIAN);
+       stmt_id = tvb_get_letohl(tvb, offset);
        offset += 4;
        proto_tree_add_item(tree, hf_mysql_num_fields, tvb, offset, 2, ENC_LITTLE_ENDIAN);
        conn_data->stmt_num_fields = tvb_get_letohs(tvb, offset);
        offset += 2;
        proto_tree_add_item(tree, hf_mysql_num_params, tvb, offset, 2, ENC_LITTLE_ENDIAN);
        conn_data->stmt_num_params = tvb_get_letohs(tvb, offset);
+       stmt_data = se_alloc(sizeof(struct my_stmt_data));
+       stmt_data->nparam = conn_data->stmt_num_params;
+       g_hash_table_replace(conn_data->stmts, &stmt_id, stmt_data);
        offset += 2;
        /* Filler */
        offset += 1;

Later, I figured it out that using the GUI results in packets of the captured buffers being dissected in an undeterministic order and count, which could lead to problems, if the MYSQL_STMT_CLOSE packet was dissected before the COM_PREPARE, so I removed the destruction of the hashtable.

Having the hashtable filled with the number of parameters for each statement, I could make the dissector ignore the null_bit_map and decode the new_parameter_bound_flag. Former is unnecessary since NULL values are present as NULL typed parameters, while latter helps deciding whether the packet should contain values of not. If parameters are expected, the ones that are streamed are obviously not present in the COM_EXECUTE packet, which required the following modifications to be made.

  • The my_stmt_data struct was extended with a guint8* param_flags member.
  • Upon receiving a prepare response (and allocating the my_stmt_data struct), an array of 8-bit unsinged integers (guint8) was allocated and all bytes were set to 0.
  • During the dissection of a MYSQL_STMT_SEND_LONG_DATA packet, a specific bit of the matching byte in the param_flags array was set.
  • The dissector logic of COM_EXECUTE ignored packets that had the corresponding bit set.

All that's left was the dissection of actual values, which were less documented as most software depend on official MySQL code for client functionality. Because of this (and for testing purposes, too) I tried several programming languages and MySQL client libraries and captured the network traffic generated. Among these were

  • PHP (mysqli with libmysqlclient),
  • Python (oursql with libmysqlclient),
  • C (native libmysqlclient) and
  • Java (JDBC with MySQL Connector/J).

One of the things that wasn't mentioned anywhere was the way strings were encoded. First, I sent foobar in itself and the bytes reached the wire were "\x06foobar" which meant that the length of the string was encoded in the first byte. Next, I sent 300 characters and saw that the first byte was 0xfc and then came the length of the string in two bytes. Finally, I sent 66000 characters and got another magic character, 0xfd followed by the length of the string in three bytes. (Luckily, Wireshark has 24-bit little-endian integer read functionality built-in.) Another surprise was that more than one field type codes were encoded in the same way:

  • 0xf6 (NEWDECIMAL), 0xfc (BLOB), 0xfd (VAR_STRING) and 0xfe (STRING) are all strings encoded in the manner described above.
  • 0x07 (TIMESTAMP), 0x0a (DATE) and 0x0c (DATETIME) are all timestamps consisting of a calendar date and an optional time part.

At last, after many test captures, I managed to decode 15 different types with 10 different dissector algorithms. A const struct was created to keep the type-dissector mapping simple and extensible.

typedef struct mysql_exec_dissector {
    guint8 type;
    guint8 unsigned_flag;
    void (*dissector)(tvbuff_t *tvb, int *param_offset, proto_item *field_tree);
} mysql_exec_dissector_t;

static const mysql_exec_dissector_t mysql_exec_dissectors[] = {
    { 0x01, 0, mysql_dissect_exec_tiny },
    { 0x02, 0, mysql_dissect_exec_short },
    { 0x03, 0, mysql_dissect_exec_long },
    { 0x04, 0, mysql_dissect_exec_float },
    { 0x05, 0, mysql_dissect_exec_double },
    { 0x06, 0, mysql_dissect_exec_null },
    ...
    { 0xfe, 0, mysql_dissect_exec_string },
    { 0x00, 0, NULL },
};

I submitted the resulting patch in the Wireshark Bug Database, and Jeff Morriss pointed out the obvious memory leak caused by the removal of the hashtable destruction. He was very constructive, and told me about se_tree structures which provide the subset of GHashtable functionality I needed but are managed by the Wireshark memory manager, so no destruction was needed. After I modified my patch accordingly, it got accepted and finally made in into SVN. Here's an example screenshot how a successfully dissected MySQL COM_EXECUTE packet might look like in the new version.

Wireshark dissecting a MySQL execute packet after r39483

I hope you found this little story interesting, and maybe learned a thing or two about the way Wireshark and/or MySQL works. The task I accomplished required little previous knowledge in either of them, so all I can recommend is if you see an incomplete or missing dissector in Wireshark, check out the source code and start experimenting. Happy hacking!


DEF CON 20 CTF urandom 300 writeup

2012-06-04

As a proud member of the Hungarian team called “senkihaziak”, I managed to solve the following challenge for 300 points in the /urandom category on the 20th DEF CON Capture The Flag contest. The description consisted of an IP address, a port number, a password, and a hint.

Description of the challenge

Connecting with netcat to the specified IP address and port using TCP resulted in a password prompt being printed. After sending the password followed by a newline triggered the server to send back what seemed to be an awful lot of garbage, screwing up the terminal settings. A closer inspection using Wireshark revealed the actual challenge.

Network traffic after connecting and sending the password

Brief analysis of the network traffic dump confirmed that after about 500 bytes of text, exactly 200000 bytes of binary data was sent by the service, which equals to 100000 unsigned 16-bit numbers (uint16_t). As the text said, both the time and the number of moves to sort the array in-place was limited, and while I knew that quicksort is unbeatable in the former (it took 35 ms to sort the list on my 2.30GHz Core i3), I knew little to nothing about which algorithms support in-place sorting, and of those, which one requires the least number of exchanges.

KT came up with the idea of building a 64k long array to store the number of occurences of each index, filling it during reading, and iterating over it to achieve the sorted array. While this was a working concept in itself, it didn't give me what the challenge wanted – pairs of indices to exchange in order to reach the sorted state. To overcome this, I improved his idea by storing the position(s) on which the index occurs in a linked list insted of just the number of occurences. For easier understanding, here's an example of what this array of linked lists would look like on an array of 7 numbers.

Example of an array of linked lists

Since performance mattered, I chose C and began with establishing the TCP connection and handling the login.

#define PW "d0d2ac189db36e15\n"
#define BUFLEN 4096
#define PWPROMPTLEN 10

int main() {
  int i, sockfd;
  struct sockaddr_in serv_addr;
  char buf[BUFLEN];

  sockfd = socket(AF_INET, SOCK_STREAM, 0);
  memset(&serv_addr, '0', sizeof(serv_addr));

  serv_addr.sin_family = AF_INET;
  serv_addr.sin_port = htons(5601);
  inet_pton(AF_INET, "140.197.217.155", &serv_addr.sin_addr);

  connect(sockfd, (struct sockaddr *)&serv_addr, sizeof(serv_addr));
  for (i = 0; i < PWPROMPTLEN; i += read(sockfd, buf, BUFLEN));
  write(sockfd, PW, strlen(PW));
  ...
}

After login, there was 504 bytes of text to ignore, and then the numbers were read into an array.

#define MSGLEN 504
#define NUMBERS 100000

uint16_t nums[NUMBERS];

for (i = 0; i < MSGLEN; i += read(sockfd, buf, MSGLEN - i));
for (i = 0; i < NUMBERS * sizeof(uint16_t);
  i += read(sockfd, ((char*)nums) + i, NUMBERS * sizeof(uint16_t) - i));

After the numbers were available in a local array, the array of linked lists was built. The end of a linked list was marked with a NULL pointer, so the array was initialized with 0 bytes;

typedef struct numpos {
  int pos;
  struct numpos *next;
} numpos_t;

numpos_t *positions[MAXNUM];

memset(positions, 0, MAXNUM * sizeof(void*));
for (i = 0; i < NUMBERS; i++) {
  numpos_t *newpos = malloc(sizeof(numpos_t));
  newpos->next = positions[nums[i]];
  newpos->pos = i;
  positions[nums[i]] = newpos;
}

The heart of the program is the iteration over this array of lists. The outer loop goes over each number in ascending order, while the inner loop iterates over the linked lists. An auxiliary counter named j tracks the current index on the output array. Inside the loops the current number is exchanged with the one at the current index in the original array, and the positions array of linked lists is also changed to reflect the layout of the output array.

int n, j = 0;
numpos_t *cur, *cur2;

for (n = 0; n < MAXNUM; n++) {
  for (cur = positions[n]; cur != NULL; cur = cur->next) {
    if (cur->pos != j) {
      sprintf(buf, "%d:%d\n", cur->pos, j);
      write(sockfd, buf, strlen(buf));
      tmp = nums[j];
      nums[j] = n;
      for (cur2 = positions[tmp]; cur2 != NULL; cur2 = cur2->next) {
        if (cur2->pos == j) {
          cur2->pos = cur->pos;
          break;
        }
      }
      nums[cur->pos] = tmp;
    }
    j++;
  }
}

Finally, there's only one thing left to do: send an empty line and wait for the key to arrive.

write(sockfd, "\n", 1);
while (1) {
  if ((i = read(sockfd, buf, BUFLEN))) {
    buf[i] = '\0';
    printf("Got %d bytes of response: %s\n", i, buf);
  }
}

After an awful lot of local testing, the final version of the program worked perfectly for the first time it was ran on the actual server, and printed the following precious key.

Result of the successful run, displaying the key


DEF CON 20 CTF grab bag 300 writeup

2012-06-04

As a proud member of the Hungarian team called “senkihaziak”, I managed to solve the following challenge for 300 points in the grab bag category on the 20th DEF CON Capture The Flag contest. The description consisted of an IP address, a port number, a password, and a hint.

Description of the challenge

Connecting with netcat to the specified IP address and port using TCP and sending the password followed by a newline triggered the server to send back the actual challenge, utilizing ANSI escape sequences for colors.

Output of netcat after connecting and sending the password

As Buherátor pointed it out, the matrices are parts of a scheme designed to hide PIN codes in random matrices in which only the cardholder knows which digits are part of the PIN code. The service sent three matrices for which the PIN code was known and the challenge was to find the PIN code for the fourth one. As we hoped, the position of the digits within the matrices were the same for all four, so all we needed to do was to find a set of valid positions for each matrix, and apply their intersection to the fourth. I chose Python for the task, and began with connecting to the service.

PW = '5fd78efc6620f6\n'
TARGET = ('140.197.217.85', 10435)
PROMPT = 'Enter ATM PIN:'

def main():
  with closing(socket.socket()) as s:
    s.connect(TARGET)
    s.send(PW)
    buf = ''
    while PROMPT not in buf:
      buf += s.recv(4096)
    pin = buffer2pin(buf)
    s.send(pin + '\n')

The buffer2pin function parses the response of the service and returns the digits of the PIN code, separated with spaces. First, the ANSI escape sequences are stripped from the input buffer. Then, the remaining contents are split into an array of lines (buf.split('\n')), trailing and leading whitespaces get stripped (imap(str.strip, ...)), and finally, lines that doesn't contain a single digit surrounded with spaces get filtered out.

ESCAPE_RE = re.compile('\x1b\\[0;[0-9]+;[0-9]+m')
INTERESTING_RE = re.compile(' [0-9] ')

def buffer2pin(buf):
  buf = ESCAPE_RE.sub('', buf)
  buf = filter(INTERESTING_RE.search, imap(str.strip, buf.split('\n')))
  ...

By now, buf contains strings like '3 5 8 4 1 2' and 'User entered: 4 5 2 7', so it's time to build the sets of valid positions. The initial sets contain all valid numbers, and later, these sets get updated with an intersection operation. For each example (a matrix with a valid PIN code) the script joins the six lines of the matrix and removes all spaces. This results in base holding 36 digits as a string. Finally, the innen for loop iterates over the four digits in the last line of the current example (User entered: 4 5 2 7) and finds all occurences in the matrix. The resulting list of positions is intersected with the set of valid positions for the current digit (sets[n]). I know that using regular expressions for this purpose is a little bit of an overkill, but it's the least evil of the available solutions.

EXAMPLES = 3
DIGITS = 4
INIT_RANGE = range(36)

def buffer2pin(buf):
  ...
  sets = [set(INIT_RANGE) for _ in xrange(DIGITS)]
  for i in xrange(EXAMPLES):
    base = ''.join(buf[i * 7:i * 7 + 6]).replace(' ', '')
    for n, i in enumerate(ifilter(str.isdigit, buf[i * 7 + 6])):
      sets[n].intersection_update(m.start() for m in re.finditer(i, base))
  ...

The only thing that remains is to transform the fourth matrix into a 36 chars long string like the other three, and pick the digits of the resulting PIN code using the sets, which – hopefully – only contain one element each by now.

def buffer2pin(buf):
  ...
  quest = ''.join(buf[3 * 7:3 * 7 + 6]).replace(' ', '')
  return ' '.join(quest[digit.pop()] for digit in sets)

The resulting script worked almost perfectly, but after the first run, we found out that after sending a correct PIN code, several more challenges were sent, so the whole logic had to be put in an outer loop. The final script can be found on Gist, and it produced the following output, resulting in 300 points.

Result of a successful run, displaying the key


Extracting DB schema migration from Redmine

2012-04-21

Although I consider keeping SQL schema versioned a good habit, and several great solutions exist that automatize the task of creating migration scripts to transform the schema of the database from version A to B, for most of my projects, I find it sufficient to record a hand-crafted piece of SQL in the project/issue log. For latter, I mostly use Redmine, which offers a nice REST-style API for the issue tracker. Since it returns XML, I chose XSL to do the necessary transformations to extract the SQL statements stored in the issue logs.

For purposes of configuration, I chose something already in the system: Git, my choice of SCM solution. One can store hierarchical key-value pairs in a systemwide, user- or repository-specific way, all transparently accessible through a simple command line interface. For purposes of bridging the gap between Git and the XML/XSL, I chose shell scripting and xsltproc since producing a working prototype is only a matter of minutes.

The end product is a shell script that extracts the Git-style history expression from command line and passes it directly to the git logcommand, which in turn parses it just like the user would assume. The output is formatted in a way that the only output is the first lines of commit messages in the specified range of commits. If the commancd fails, the original message is shown, so the script doesn't need to know anything about git commit range parsing or other internals.

GL=$(git log --pretty=format:"%s" --abbrev-commit )

if [ $? -ne 0 ]; then
  echo "Git error occured: $GL" 1>&2
  exit 1
fi

Since the HTML-formatted issue log messages are double-encoded in the API XML output, two round of XSL transformation needs to be done. The first round extracts log entries probably containing SQL fragments and with the output method set to text, it decodes HTML entities embedded into XML.

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="text"/>
 <xsl:template match="/">
  <xsl:for-each
    select="issue/journals/journal/notes[contains(text(), 'sql')]">
   <xsl:value-of select="text()"/>
  </xsl:for-each>
 </xsl:template>
</xsl:stylesheet>

The above XSL takes the output of the XML REST API and produces XHTML fragments for every issue log entry. The following part of the shell script extracts the issue numbers from the commit messages (egrep and sed), calls the issue API (curl) with each ID exactly once (sort -u), passes the output through the first XSL and concatenate these along with an artifical/fake XML root in order to produce well-formed XML, ready for the second pass.

echo '<?xml version="1.0" encoding="utf-8"?><fakeroot>'
echo "$GL" | egrep -o '#[0-9]+' | sort -u | sed 's/#//' \
  | while read ISSUE; do
    curl --silent "$BASE/issues/$ISSUE.xml?key=$KEY&include=journals" \
      | xsltproc "$DIR/notes.xsl" -
  done
echo '</fakeroot>'

The second pass extracts code tags with language set as sql, and the method is again set to text, causing a second expansion of HTML entities. The output of this final XSL transformation is a concatenation of SQL statements required to transform the database schema to be in sync with the commit range specified.

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="text"/>
 <xsl:template match="/">
  <xsl:for-each select="fakeroot/pre/code[@class = 'sql']">
   <xsl:value-of select="normalize-space(text())"/>
   <xsl:text>&#10;</xsl:text>
  </xsl:for-each>
 </xsl:template>
</xsl:stylesheet>

While this stylesheet follows almost the same logic as the first one, it's worth noting the usage of normalize-space() and a literal newline, which formats the output in a nice way – SQL fragments are separated from each other by a single newline, no matter if there's any trailing or leading whitespace present in the code. The code is available under MIT license on GitHub.


Unofficial Android app for alldatasheet.com

2012-04-17

In February 2012, I read the Hack a Day article about ElectroDroid, and the following remark triggered challenge accepted in my mind.

A ‘killer app’ for electronic reference tools would be a front end for
alldatasheet.com that includes the ability to search, save, and display
the datasheet for any imaginable component.

First, I checked whether any applications like that exists on the smartphone application markets. I found several applications of high quality but tied to certain chip vendors, such as Digi-Key and NXP. There's also one that implies to be an alldatasheet.com application, it even calls itself Datasheet (Alldatasheet.com), but as one commenter writes

All this app does is open a web browser to their website.
Nothing more. A bookmark can suffice.

I looked around the alldatasheet.com website and found the search to be rather easy. Although there's no API available, the HTML output can be easily parsed with the MIT-licensed jsoup library. First I tried to build a separate Java API for the site, and a separate Android UI, with former having no dependencies on the Android library. The API can be found in the hu.vsza.adsapi package, and as of version 1.0, it offers two classes. The Search class has a method called searchByParName which can be used to use the functionality of the left form on the website. Here's an example:

List<Part> parts = Search.searchByPartName("ATMEGA168", Search.Mode.MATCH);

for (Part part : part) {
    doSomethingWithPart(part);
}

The Part class has one useful method called getPdfConnection, which returns an URLConnection instance that can be used to read the PDF datasheet about the electronics part described by the object. It spoofs the User-Agent HTTP header and sends the appropriate Referer values wherever it's necessary to go throught the process of downloading the PDF. This can be used like this:

URLConnection pdfConn = selectedPart.getPdfConnection();
pdfConn.connect();
InputStream input = new BufferedInputStream(pdfConn.getInputStream());
OutputStream output = new FileOutputStream(fileName);

byte data[] = new byte[1024];
long total = 0;
while ((count = input.read(data)) != -1) output.write(data, 0, count);

output.flush();
output.close();
input.close();

The Android application built around this API displays a so-called Spinner (similar to combo lists on PCs) to select search mode and a text input to enter the part name, and a button to initiate search. Results are displayed in a list view displaying the name and the description of each part. Touching a part downloads the PDF to the SD card and opens it with the default PDF reader (or prompts for selection if more than one are installed).

ADSdroid version 1.0 screenshots

You can download version 1.0 by clicking on the version number link or using the QR code below. It only does one thing (search by part name), and even that functionality is experimental, so I'm glad if anyone tries it and in case of problems, contacts me in e-mail. The source code is available on GitHub, licensed under MIT.

ADSdroid version 1.0 QR code


Mounting Sympa shared directories with FUSE

2012-03-29

The database laboratory course at the Budapest University of Technology and Economics which I collaborate with as a lecturer uses Sympa for mailing lists and file sharing. Latter is not one of the most used features of this software, and the web interface feels sluggish, not to mention the lots of leftover files in my Downloads directory for each attempt to view one page of a certain file. I understood that using the same software for these two tasks made managing user accounts easier, so I tried to come up with a solution that makes it easier to handle these files with the existing setup.

First, I searched whether an API for Sympa exists and I found that while they created the Sympa SOAP server, it only handles common use-cases related to mailing lists management, so it can be considered a dead end. This meant that my solution had to use the web interface, so I selected an old and a new tool for the task: LXML for parsing, since I already knew of its power, and requests for handling HTTP, because of its fame. These two tools made it possible to create half of the solution first, resulting in a Sympa API that can be used independently of the file system bridge.

Two things I found particularly great about requests were that its handling of sessions was superior than any APIs I've ever seen, and that it was possible to retrieve the results in multiple formats (raw socket, bytes, Unicode text). Since I only had one Sympa installation to test with, I only hacked the code so far to make it work, so for example, I had to use regular expressions to strip the XML and HTML encoding information, since both stated us-ascii while the output was in ISO-8859-2, correctly stated in the HTTP Content-type header.

In the second half of the time, I had to create a bridge between the file system and the API I created, and FUSE was my natural choice. Choosing the Python binding was not so easy, as a Debian user, the python-fuse package seemed like a logical choice, but as Matt Joiner wrote in his answer on a related Stack Overflow question, fusepy was a better choice. Using one of the examples, I managed to build an experimental version of SympaFS with naive caching and session management, but it works!

$ mkdir /tmp/sympa
$ python sympafs.py https://foo.tld/lists foo@bar.tld adatlabor /tmp/sympa
Password:
$ mount | fgrep sympa
SympaFS on /tmp/sympa type fuse (rw,nosuid,nodev,relatime,user_id=1000,
group_id=1000)
$ ls -l /tmp/sympa/2012
összesen 0
-r-xr-xr-x 1 root root  11776 febr   9 00:00 CensoredFile1.doc
-r-xr-xr-x 1 root root 161792 febr  22 00:00 CensoredFile2.xls
-r-xr-xr-x 1 root root  39424 febr   9 00:00 CensoredFile3.doc
dr-xr-xr-x 2 root root      0 febr  14 00:00 CensoredDir1
dr-xr-xr-x 2 root root      0 ápr    4  2011 CensoredDir2
$ file /tmp/sympa/2012/CensoredFile1.doc
Composite Document File V2 Document, Little Endian, Os: Windows, Version
5.1, Code page: 1252, Author: Censored, Last Saved By: User, Name of
Creating Application: Microsoft Excel, Last Printed: Tue Feb 14 15:00:39
2012, Create Time/Date: Wed Feb  8 21:51:10 2012, Last Saved Time/Date:
Wed Feb 22 08:10:20 2012, Security: 0
$ fusermount -u /tmp/sympa

Tracking history of docx files with Git

2012-03-27

Just as with PHP, OOXML, and specifically, docx is not my favorite format, but when I use it, I prefer tracking the history using my preferred SCM of choice, Git. What makes it perfect to track documents is not only the fact that setting up a repository takes one command and a few miliseconds, but its ability to use an external program to transform artifacts (files) to text before displaying differences, which results in meaningful diffs.

The process of setting up an environment like this is described best in Chapter 7.2 of Pro Git. The solution I found best to convert docx files to plain text was docx2txt, especially since it's available as a Debian package in the official repositories, so it takes only an apt-get install docx2txt to have it installed on a Debian/Ubuntu box.

The only problem was that Git executes the text conversion program with the name of the input file given as the first and only argument, and docx2txt (in contrast with catdoc or antiword, which uses the standard output) saves the text content of foo.docx in foo.txt. Because of this, I needed to create a wrapper in the form of the following small shell script.

#!/bin/sh
docx2txt <$1

That being done, the only thing left to do is configuring Git to use this wrapper for docx files by issuing the following commands in the root of the repository.

$ git config diff.docx.textconv /path/to/wrapper.sh
$ echo "*.docx diff=docx" >>.git/info/attributes

End-to-end secure REST service using CakePHP

2012-03-14

While PHP is not my favorite language and platform of choice, I have to admit its ease of deployment, and that's one of the reasons I've used it to build some of my web-related projects, including the REST API and the PNG output of HackSense, and even the homepage of my company. Some of these also used CakePHP, which tries to provide the flexibility and “frameworkyness” of Ruby on Rails while keeping it easy to deploy. It also has the capability of simple and rapid REST API development, which I often prefer to the bloatedness of SOAP.

One of the standardized non-functional services of SOAP is WS-Security, and while it's great for authentication and end-to-end signed messages, its encryption scheme not only has a big overhead, but it had been cracked in 2011, thus cannot be considered secure. That being said, I wanted a solution that can be applied to a REST API, does not waste resources (e.g. spawning OS processes per HTTP call), and uses as many existing code as feasible.

The solution I came up with is a new layout for CakePHP that uses the GnuPG module of PHP, which in turn uses the native GnuPG library. This also means, that the keyring of the user running the web server has to be used. Also, Debian (and thus Ubuntu) doesn't ship this module as a package, so it needs to be compiled, but it's no big deal. Here's what I did:

# apt-get install libgpgme11-dev php5-dev
# wget http://pecl.php.net/get/gnupg-1.3.2.tgz
# tar -xvzf gnupg-1.3.2.tgz
# phpize && ./configure && make && make install
# echo "extension=gnupg.so" >/etc/php5/conf.d/gnupg.ini
# /etc/init.d/apache2 reload

These versions made sense in February 2012, so make sure that libgpgme, PHP and the PHP GnuPG module refers to the latest version available. After the last command has executed successfully, PHP scripts should be able to make use of the GnuPG package. I crafted the following layout in views/layouts/gpg.ctp:

<?php

$gpg = new gnupg();
$gpg->addencryptkey(Configure::read('Gpg.enckey'));
$gpg->addsignkey(Configure::read('Gpg.sigkey'));
$gpg->setarmor(0);
$out = $gpg->encryptsign($content_for_layout);
header('Content-Length: ' . strlen($out));
header('Content-Type: application/octet-stream');
print $out;

?>

By using Configure::read($key), the keys used for making signatures and encryption can be stored away from the code, I put the following two lines in config/core.php:

Configure::write('Gpg.enckey', 'ID of the recipient's public key');
Configure::write('Gpg.sigkey', 'Fingerprint of the signing key');

And at last, actions that require this security layer only need a single line in the controller code (e.g. controllers/foo_controller.php):

$this->layout = 'gpg';

Make sure to set this as close to the beginning of the function as you can to avoid leaking error messages to attackers triggering errors in the code before the layout is set to the secured one.

And that's it, the layout makes sure that all information sent from the view is protected both from interception and modification. During testing, I favored using armored output, I only disabled it after moving it to production, so if it's needed, only two lines need modification: setarmor(0) should be setarmor(1) and the Content-Type should be set to text/plain. Have fun!


Reverse engineering chinese scope with USB

2012-03-04

The members of H.A.C.K. – one of the less wealthy hackerspaces – felt happy at first, when the place could afford to buy a slightly used UNI-T UT2025B digital storage oscilloscope. Besides being useful as a part of the infrastructure, having a USB and an RS-232 port seized our imagination – one of the interesting use-cases is the ability to capture screenshots from the device to illustrate documentation. As I tried interfacing the device, I found that supporting multiple platforms meant Windows XP and 2000 for the developers, which are not very common in the place.

I installed the original software in a virtual machine, and tried the serial port first, but found out, that although most of the functionality worked, taking screenshots is one available only using USB. I connected the scope using USB next, and although the vendor-product tuple was present in the list of USB IDs, so lsusb could identify it, no drivers in the kernel tried to take control of the device. So I started looking for USB sniffing software and found that on Linux, Wireshark is capable of doing just that. I forwarded the USB device into the VM and captured a screenshot transmission for analysis. Wireshark was very handy during analysis as well – just like in case of TCP/IP – so it was easy to spot the multi-kilobyte bulk transfer among tiny 64 byte long control packets.

Wireshark analysis of screenshot transmission via USB

I started looking for simple ways to reproduce the exact same conversation using free software – I've used libusb before while experimenting with V-USB on the Free USB JTAG interface project, but C requires compilation, and adding things like image processing makes the final product harder to use on other computers. For these purposes, I usually choose Python, and as it turned out, the PyUSB library makes it possible to access libusb 0.1, libusb 1.0 and OpenUSB through a single pythonic layer. Using this knowledge, it was pretty straightforward to modify their getting started example and replicate the “PC end” of the conversation. The core of the resulting code is the following.

dev = usb.core.find(idVendor=0x5656, idProduct=0x0832)
if dev is None:
    print >>sys.stderr, 'USB device cannot be found, check connection'
    sys.exit(1)

dev.set_configuration()
dev.ctrl_transfer(ReqType.CTRL_OUT, 177, 0x2C, 0)
dev.ctrl_transfer(ReqType.CTRL_IN, 178, 0, 0, 8)
for i in [0xF0] + [0x2C] * 10 + [0xCC] * 10 + [0xE2]:
    dev.ctrl_transfer(ReqType.CTRL_OUT, 177, i, 0)

try:
    dev.ctrl_transfer(ReqType.CTRL_OUT, 176, 0, 38)
    for bufsize in [8192] * 4 + [6144]:
        buf = dev.read(Endpoint.BULK_IN, bufsize, 0)
        buf.tofile(sys.stdout)
    dev.ctrl_transfer(ReqType.CTRL_OUT, 177, 0xF1, 0)
except usb.core.USBError:
    print >>sys.stderr, 'Image transfer error, try again'
    sys.exit(1)

Using this, I managed to get a binary dump of 38912 bytes, which contained the precious screenshot. From my experience with the original software, I already knew that the resolution is 320 by 240 pixels – which meant that 4 bits made up each pixel. Using this information, I started generating bitmaps from the binary dump in the hope of identifying some patterns visually as I already knew what was on the screen. The first results were the result of converting each 4-bit value to a pixel coloured on a linear scale from 0 = black to 15 = white, and looked like the following.

Early version of a converted screenshot

Most of the elements looked like they're in the right spot, and both horizontal and vertical lines seemed intact, apart from the corners. Also, the linear mapping resulted in an overly bright image, and as it seemed, the firmware was transmitting 4-bit (16 color) images, even though the device only had a monochrome LCD – and the Windows software downgraded the quality before displaying it on the PC on purpose. After some fiddling, I figured out that the pixels were transmitted in 16-bit words, and the order of the pixels inside these were 3, 4, 1, 2 (“mixed endian”). After I added code to compensate for this and created a more readable color mapping I finally had a script that could produce colorful PNGs out of the BLOBs, see below for an example.

Final version of a converted screenshot

In the end, my solution is not only free as in both senses and runs on more platforms, but can capture 8 times more colors than the original one. All code is published under MIT license, and further contributions are welcome both on the GitHub repository and the H.A.C.K. wiki page. I also gave a talk about the project in Hungarian, the video recording and the slides can be found on the bottom of the wiki page.


Accented characters in hyperref PDF fields

2012-01-18

I've always found hyperref one of the best features of LaTeX, and although it supported Unicode, certain accented characters (in my case, ő and ű) were treated abnormally in case of PDF metadata fields, such as author and title. I mostly ignored the issue and reworded the contents, until I met a situation, where changing the data was not an option. To illustrate the issue, the following example was saved as wrong.tex and got compiled with the pdflatex wrong.tex command.

\documentclass{report}

\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage[unicode, pdftitle={Árvíztűrő tükörfúrógép}]{hyperref}

\begin{document}
foobar
\end{document}

The result could be checked with pdfinfo and was far from what I expected.

$ pdfinfo wrong.pdf | grep Title
Title:          Árvízt¶r® tükörfúrógép

I searched the web, and was disappointed at first, having found unsolved forum threads, such as one written by also a Hungarian. Finally, I opened up the TeX section of the Stack Exchange network, and started typing a title for my question. Based on this, the forum offered a number of probably related posts, and I browsed through them out of curiosity. As it turned out, the solution lied within a post about Polish characters in pdftitle, and in retrospect, it seems obvious – like any other great idea. As Schweinebacke writes, “The optional argument of \usepackage is read by the LaTeX kernel, so hyperref cannot change scanning of the argument”. The problem can be eliminated simply by moving the title setup into a separate \hypersetup command – and behold, the pilcrow and the registered sign is gone, as seen in the following example.

$ diff wrong.tex right.tex
5c5,6
< \usepackage[unicode, pdftitle={Árvíztűrő tükörfúrógép}]{hyperref}
---
> \usepackage[unicode]{hyperref}
> \hypersetup{pdftitle={Árvíztűrő tükörfúrógép}}
$ pdfinfo right.pdf | grep Title
Title:          Árvíztűrő tükörfúrógép

Proxmark3 vs. udev

2012-01-06

In the summer, I successfully made my Proxmark3 work by working around every symptom of bit rot that made it impossible to run in a recent environment. One bit that survived the aforementioned effect was the single udev entry that solved the controversy of the principle of least privilege and the need of raw USB access. As the official HOWTO mentioned, putting the following line into the udev configuration (/etc/udev/rules.d/026-proxmark.rules on Debian) ensured that the Proxmark3 USB device node will be accessible by any user in the dnet group.

SYSFS{idVendor}=="9ac4", SYSFS{idProduct}=="4b8f", MODE="0660", GROUP="dnet"

However, the SYSFS{} notation became obsolete in newer udev releases, and at first, I followed the instincts of a real programmer by disregarding a mere warning. But with a recent udev upgrade, complete removal of support for the obsolete notation came, so I had to face messages like the following on every boot.

unknown key 'SYSFS{idVendor}' in /etc/udev/rules.d/026-proxmark.rules:1
invalid rule '/etc/udev/rules.d/026-proxmark.rules:1'

The solution is detailed on many websites, including the blogpost of jpichon, who also met the issue in a Debian vs. custom hardware situation. The line in the udev configuration has to be changed to something like the following.

SUBSYSTEM=="usb", ATTR{idVendor}=="9ac4", ATTR{idProduct}=="4b8f", MODE="0660", GROUP="dnet"

Mangling RSS feeds with Python

2011-10-28

There are blogs on the web, that are written/configured in a way, that the RSS or Atom feed contains only a teaser (or no content at all), and one must open a link to get the real content – and thus load all the crap on the page, something RSS feeds were designed to avoid. Dittygirl has added one of those sites in her feed reader, and told me that it takes lots of resources on her netbook to load the whole page – not to mention the discomfort of leaving the feed reader.

I accepted the challenge, and decided to write a Python RSS gateway in less than 30 minutes. I chose plain WSGI, something I wanted to play with, and this project was a perfect match for its simplicity and lightweightness. Plain WSGI applications are Python modules with a callable named application, which the webserver will call every time, an HTTP request is made. The callable gets two parameters,

  • a dictionary of environment values (including the Path of the query, IP address of the browser, etc.), and
  • a callable, which can be used to signal the web server about the progress.

In this case, the script ignores the path, so only the second parameter is used.

def application(environ, start_response):
  rss = getfeed()
  response_headers = [('Content-Type', 'text/xml; charset=UTF-8'),
                      ('Content-Length', str(len(rss)))]
  start_response('200 OK', response_headers)
  return [rss]

Simple enough, the function emits a successful HTTP status, the necessary headers, and returns the content. The list (array) format is needed because a WSGI application can be a generator too (using a yield statement), which can be handy when rendering larger content, so the server expects an iterable result.

The real “business logic” is in the getfeed function, which first tries to load a cache, to avoid abusing the resources of the target server. I chose JSON as it's included in the standard Python libraries, and easy to debug.

try:
  with open(CACHE, 'rb') as f:
    cached = json.load(f)
  etag = cached['etag']
except:
  etag = ''

Next, I load the original feed, using the cached ETag value to encourage conditional HTTP GET. The urllib2.urlopen function can operate on a Request object, which takes a third parameter, that can be used to add HTTP headers. If the server responds with a HTTP 304 Not Modified, urlopen raises an HTTPError, and the script knows that the cache can be used.

try:
  feedfp = urlopen(Request('http://HOSTNAME/feed/',
      None, {'If-None-Match': etag}))
except HTTPError as e:
  if e.code != 304:
    raise
  return cached['content'].encode('utf-8')

I used lxml to handle the contents, as it's a really convenient and fast library for XML/HTML parsing and manipulation. I compiled the XPath queries used for every item in the head of the module for performance reasons.

GUID = etree.XPath('guid/text()')
IFRAME = etree.XPath('iframe')
DESC = etree.XPath('description')

To avoid unnecessary copying, lxml's etree can parse the object returned by urlopen directly, and returns an object, which behaves like a DOM on steroids. The GUID XPath extracts the URL of the current feed item, and the HTML parser of lxml takes care of it. The actual contents of the post is helpfully put in a div with the class post-content, so I took advantage of lxml's HTML helper functions to get the div I needed.

While I was there, I also removed the first iframe from the post, which contains the Facebook tracker bug Like button. Finally, I cleared the class attribute of the div element, and serialized its contents to HTML to replace the useless description of the feed item.

feed = etree.parse(feedfp)
for entry in feed.xpath('/rss/channel/item'):
  ehtml = html.parse(GUID(entry)[0]).getroot()
  div = ehtml.find_class('post-content')[0]
  div.remove(IFRAME(div)[0])
  div.set('class', '')
  DESC(entry)[0].text = etree.CDATA(etree.tostring(div, method="html"))

There are two things left. First, the URL that points to the feed itself needs to be modified to produce a valid feed, and the result needs to be serialized into a string.

link = feed.xpath('/rss/channel/a:link',
  namespaces={'a': 'http://www.w3.org/2005/Atom'})[0]
link.set('href', 'http://URL_OF_FEED_GATEWAY/')
retval = etree.tostring(feed)

The second and final step is to save the ETag we got from the HTTP response and the transformed content to the cache in order to minimize the amount of resources (ab)used.

with open(CACHE, 'wb') as f:
  json.dump(dict(etag=feedfp.info()['ETag'], content=retval), f)
return retval

You might say, that it's not fully optimized, the design is monolithic, and so on – but it was done in less than 30 minutes, and it's been working perfectly ever since. It's a typical quick-and-dirty hack, and although it contains no technical breakthrough, I learned a few things, and I hope someone else might also do by reading it. Happy hacking!


Erlang HTTPd directory traversal

2011-10-21

During a security-focused week in August, I had the perfect environment to take a look at the security of the libraries bundled with one of my favorite languages, Erlang. The language itself follows a functional paradigm, so the connection with the outside world is made possible with so-called port drivers implemented in C. Suprisingly, very little code is written that way, and even that codebase is elegant, and seemed secure.

So I started poking at the libraries implemented in pure Erlang, priorizing network-facing modules, which are mostly concentrated in the inets OTP application. Latter contains clients and servers for some of the most used internet protocols, such as FTP, TFTP and HTTP. As they are written in pure Erlang, traditional binary vulnerabilities (such as buffer overflows) are almost impossible to find, so I tried looking for logical errors, using my previous experience with such services.

After some time, I started poking the HTTP server called inets:httpd. One of the most common vulnerabilities with HTTP daemons serving static content is the ability to access files outside the so-called document root, which is called directory traversal. It's called that way, because most of the time, these issues are exploited using repeated ../ or ..\ to move higher and higher in the directory tree of the file system, slowly traversing out of the “sandbox” called document root.

I started following the flow of control in the code, and found an elegant way of handling these issues, and at first sight, everything seemed OK with it. In the November 2010 version of lib/inets/src/http_server/httpd_request.erl the following check is made at line 316 (I whitespace-formatted the code a little bit for the sake of readability).

Path2 = [X || X <- string:tokens(Path, "/"), X =/= "."], %% OTP-5938
validate_path(Path2, 0, RequestURI)

The first line uses a construct called list comprehension, which is syntactic sugar for generating and filtering a list, thus avoiding boilerplate code to create a new list, some kind of loop, and a bunch of if statements. The string:tokens splits the requested path at slashes, =/= expresses nonequivalence, so Path2 contains a list (array) of strings that represent the parts of the URL except the ones that contain a single dot ("."). For example, a request for /foo/bar/./qux results in Path2 being a list with three strings in it: ["foo", "bar", "qux"].

The second line passes this list, a zero, and the requested URI to a function called validate_path, which can be found next to it at line 320.

validate_path([], _, _) ->
    ok;
validate_path([".." | _], 0, RequestURI) ->
    {error, {bad_request, {forbidden, RequestURI}}};
validate_path([".." | Rest], N, RequestURI) ->
    validate_path(Rest, N - 1, RequestURI);
validate_path([_ | Rest], N, RequestURI) ->
    validate_path(Rest, N + 1, RequestURI).

In Erlang, a function can be declared as many times as needed, and the runtime tries to match the arguments in order of declaration. Being a functional language, Erlang contains no such thing as a loop, but solves most of those problems with recursion. The first two declarations are the two possible exits of the function.

  • If there are no (more) items in the list, the request is OK.
  • If the second argument is zero, and the first item of the list is "..", the request is denied with an HTTP 403 Forbidden message.

The last two declarations are the ones processing the items, one by one.

  • If the item is "..", the second argument is decremented.
  • In any other case, the second argument is incremented.

As you can see, it accepts URLs that contain ".." parts, as long as they don't lead outside the document root, by simply counting how deep the path goes inside the document root. The first line has a comment, referring to a ticket number (or similar), and by doing a little web search, I found the Erlang OTP R10B release 10 README from 2006, which implies, that this code was designed with directory traversal attacks in mind.

There's only one problem with this solution, and it's called Windows. Erlang runs on a number of platforms, and Windows is one of them – with the exception that it uses backslash to separate paths (although most APIs accept slashes too), which leads to the following ugly result.

Erlang HTTPd started in OTP R14B3 on Windows

$ curl -silent -D - 'http://192.168.56.3:8080/../boot.ini' | head -n 1
HTTP/1.1 403 Forbidden

$ curl -D - 'http://192.168.56.3:8080/..\boot.ini'
HTTP/1.1 200 OK
Server: inets/5.6
Date: Fri, 21 Oct 2011 17:11:45 GMT
Content-Type: text/plain
Etag: dDGWXY211
Content-Length: 211
Last-Modified: Thu, 06 Mar 2008 21:23:24 GMT

[boot loader]
timeout=30
default=multi(0)disk(0)rdisk(0)partition(1)\WINDOWS
[operating systems]
multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="Microsoft Windows XP Professional" /noexecute=optin /fastdetect

Erlang/OTP R14B04 has been released on October 5, 2011, and it contains the simple fix I wrote that closes the vulnerability. I'd like to thank everyone at hekkcamp 2011 for their help, especially Buherátor who helped me with testing.



next posts >
< prev posts

CC BY-SA RSS Export
Proudly powered by Utterson