VSzA techblog

How I customized LaTeX beamer output

2012-10-23

Because of a response I got to my post about using MS Word templates with LaTeX, documenting the way I've been generating presentation material for my talks since 2010 got its priority raised. Like I said in the aforementioned post, I prefer using LaTeX for typesetting, and it's the same for creating those frames that appear behind my back during talks.

For the first few talks, I used the default template, then I switched to the Warsaw theme, which I frequently encounter in the beamer-using subset of IT security presenters. In 2011, Lasky – the graphics genius around Silent Signal – designed a presentation template that I actually liked, so I started fiddling with beamer to adapt the new theme.

First, I created a file called s2beamer.sty with the usual header, and two lines that set a sensible base. I had to disable shadow, since it couldn't handle setting background images, as it assumed that the background was a solid color.

\ProvidesPackage{s2beamer}

\useinnertheme[shadow=false]{rounded}
\usecolortheme{whale}

The theme I used included a separate background image for the title slide, and another for the rest (the ones with actual content), so I defined a command to create a title slide, which sets the background before and after the first frame. I added newlines only for readability, they must be removed or commented in the .sty file.

\newcommand{\titleframe}{
    \usebackgroundtemplate{
        \includegraphics[width=\paperwidth,height=\paperheight]
            {img/titlebg.jpg}
    }
    \frame{\titlepage}
    \usebackgroundtemplate{
        \includegraphics[width=\paperwidth,height=\paperheight]
            {img/slidebg.jpg}
    }
}

After setting the background, the colors needed to be changed too. First, I set the color of the body text and structural elements (such as list bullets). The color of the frame titles was set to the same orangeish one.

\setbeamercolor{normal text}{fg=white}
\usecolortheme[RGB={239,73,35}]{structure}
\setbeamercolor{frametitle}{fg=structure, bg=}

The default list bullets are actual bullets, which looked bad on the background image, so I changed them to little arrows. Also, I hid the navigation controls (placed in the lower-right corner) since they conflicted with the footer of the theme, and most people doesn't even know what they're for.

\setbeamertemplate{items}[default]
\setbeamertemplate{navigation symbols}{}

URLs and code snippets are written using fixed-size fonts (as they should be), but I prefer using Inconsolata for this purpose, instead of the default.

\RequirePackage{inconsolata}

In case of code snippets, I prefer using the listings package, and I configured it to use such colors for syntax highlighting that go well with the theme (newline added in case of the last line for readability only).

\RequirePackage{listings}
\definecolor{s2}{RGB}{240, 56, 31}
\definecolor{s2y}{RGB}{255, 200, 143}
\lstset{basicstyle=\footnotesize\ttfamily, breaklines=true,
    tabsize=2, keywordstyle=\color{s2}, stringstyle=\color{s2y}}

Since I give talks both in English and Hungarian, I added the following expression to set the order of first and last name according to babel language set by the document. (The \hunnexlabel command is defined when the babel language is set to magyar.) It also turns the e-mail address into a hyperlink that launches the default e-mail client when clicked on.

\ifdefined\hunnewlabel
    \renewcommand{\name}{\lastname\ \firstname}
\else
    \renewcommand{\name}{\firstname\ \lastname}
\fi

\author{\name\\\texttt{\href{mailto:\email}{\email}}}

The above lines require the following commands in the document:

\renewcommand{\email}{vsza@silentsignal.hu}
\renewcommand{\firstname}{András}
\renewcommand{\lastname}{Veres-Szentkirályi}

With these all set, I can create presentation material that look awesome but are still generated from plain text, thus compatible with all sensible editors and SCMs. You can see the snippets above in action by taking a look at my Hacktivity 2012 slides (four of them are below this paragraph), and some of them in the ones I made for Hacktivity 2011.

Four slides from my Hacktivity 2012 talk


Hackish shell 1-liner for SSL session analysis

2012-10-22

Last week, I tried to measure the entropy of the session ID of an SSL/TLS-wrapped web application. I prefer Burp Suite Pro for such tasks, but in this case, it could only gather 5 to 10 session IDs per second. I fired up Wireshark and found that it didn't reuse the SSL/TLS context, but opened a new TCP socket and performed handshake for every new session, even though the HTTP server set the Connection header to keep-alive.

Since collecting session IDs is not exactly rocket science, I decided that it's faster to roll my own solution instead of waiting for the dead slow Burp sequencer. First, I put a simple HTTP request into a text file, carefully ending the lines Windows-style (\r\n) and putting an empty line at the end.

HEAD / HTTP/1.1
Host: domain.tld
User-Agent: Silent Signal
Connection: keep-alive

I used HEAD so that I could minimize the latency and server load by keeping the server from sending me the actual contents (the session ID got sent in a Set-Cookie header anyways). First, I sent as many requests as I could, completely disregarding the answers.

$ while /bin/true; do cat req.txt; done | \
    openssl s_client -connect domain.tld:443 2>&1 | fgrep Set-Cookie

As it turned out, the server stopped responding after around 100 requests, so I simply reduced the number of requests per connection to 100, and put the whole thing into a while loop, so that it would keep opening new SSL/TLS connections after every 100 requests. I also added a simple sed invocation so that the result can be directly used by Burp for analysis.

$ while /bin/true; do (for i in $(seq 100); do cat req.txt; done | \
    openssl s_client -connect domain.tld:443 2>&1 | fgrep Set-Cookie | \
    sed 's/^[^=]*=\([A-Z]*\);.*$/\1/' >>cookies.txt); done

In another terminal, I started watch -n1 'wc -l cookies.txt', so I also had a sense of progress, as the above shell 1-liner produced the 20000 tokens required by FIPS in a matter of minutes.


ADSdroid 1.2 released due to API change

2012-10-18

On October 6, 2012, Matthias Müller sent me an e-mail, telling me that the download functionality of ADSdroid was broken. As it turned out, AllDataSheet changed their website a little bit, resulting in the following exception getting thrown during download.

java.lang.IllegalArgumentException: Malformed URL: javascript:mo_search('444344','ATMEL','ATATMEGA168P');
    at org.jsoup.helper.HttpConnection.url(HttpConnection.java:53)
    at org.jsoup.helper.HttpConnection.connect(HttpConnection.java:25)
    at org.jsoup.Jsoup.connect(Jsoup.java:73)
    at hu.vsza.adsapi.Part.getPdfConnection(Part.java:32)
    at hu.vsza.adsdroid.PartList$DownloadDatasheet.doInBackground(PartList.java:56)
    at hu.vsza.adsdroid.PartList$DownloadDatasheet.doInBackground(PartList.java:48)
    at android.os.AsyncTask$2.call(AsyncTask.java:264)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:305)
    at java.util.concurrent.FutureTask.run(FutureTask.java:137)
    at android.os.AsyncTask$SerialExecutor.run(AsyncTask.java:208)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1076)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:569)
    at java.lang.Thread.run(Thread.java:856)
 Caused by: java.net.MalformedURLException: Unknown protocol: javascript
    at java.net.URL.<init>(URL.java:184)
    at java.net.URL.<init>(URL.java:127)
    at org.jsoup.helper.HttpConnection.url(HttpConnection.java:51)
    ... 12 more

The address (href) of the link (<a>) used for PDF download has changed from a simple HTTP one to a JavaScript call that JSoup, the library I used for HTML parsing and doing HTTP requests couldn't possibly handle. The source of the mo_search function can be found in js/datasheet_view.js. The relevant part can be seen below, I just inserted some whitespace for easier readability.

function mo_search(m1, m2, m3) {
    frmSearch2.ss_chk.value = m1;
    frmSearch2.sub_chk.value = m2;
    frmSearch2.pname_chk.value = m3;
    frmSearch2.action = 'http://www.alldatasheet.com/datasheet-pdf/pdf/'
        + frmSearch2.ss_chk.value + '/' + frmSearch2.sub_chk.value
        + '/' + frmSearch2.pname_chk.value + '.html';
    frmSearch2.submit();
}

That didn't seem that bad, so I wrote a simple regular expression to handle the issue.

import java.util.regex.*;

Pattern jsPattern = Pattern.compile(
    "'([^']+)'[^']*'([^']+)'[^']*'([^']+)'");

final Matcher m = jsPattern.matcher(foundPartHref);
if (m.find()) {
    foundPartHref = new StringBuilder(
        "http://www.alldatasheet.com/datasheet-pdf/pdf/")
        .append(m.group(1)).append('/')
        .append(m.group(2)).append('/')
        .append(m.group(3)).append(".html").toString();
}

The regular expression is overly liberal on purpose, in the hope that it can handle small changes in the AllDataSheet website in the future without upgrading the application. I pushed version 1.2 to GitHub, and it contains many other optimizations, too, including enabling ProGuard. The resulting APK is 30% smaller than previous versions, and it can be downloaded by using the link in the beginning of this sentence, or using the QR code below. It's also available from the F-Droid Android FOSS repository, which also ensures automatic upgrades.

ADSdroid QR code


Shift vs. division in managed languages

2012-09-27

From time to time, I hear it from low-level coder monkeys posing as either tech gurus or teachers that using the shift operators (<< and >> in C syntax) instead of multiplication and division in cases when the factor/divisor is an integer power of 2 results in faster code. While I've always been skeptical about such speculations – and I've been reassured several times by many sources, including MSDN forums and Stack Overflow – I haven't tried it for myself, especially in managed languages such as C# that are compiled to a byte code first.

Although Mono is not the reference implementation of C# and the .Net virtual machine, it's the one that runs on my notebook and allows for easy static compilation which makes it possible for me to inspect the machine code generated from the .Net executable file. First, I wrote a simple program that reads a byte from the standard input, divides it by 2, and writes the result to the standard output (mainly to avoid optimization that would replace division with compile-time evaluation).

using System;

class Program
{
    static void Main()
    {
        int a = Console.Read();
        int b = a / 2;
        Console.WriteLine(b);
    }
}

I compiled it with the Mono C# compiler and verified that it works (T = 84 in ASCII).

$ mcs monodiv.cs
$ file monodiv.exe
monodiv.exe: PE32 executable (console) Intel 80386 Mono/.Net assembly, for MS Windows
$ echo T | ./monodiv.exe
42

Dumping the .Net bytecode reveals that the first pass of compilation uses division.

$ monodis monodiv.exe
...
.method private static hidebysig 
       default void Main ()  cil managed 
{
    // Method begins at RVA 0x2058
    .entrypoint
    // Code size 17 (0x11)
    .maxstack 2
    .locals init (
            int32   V_0,
            int32   V_1)
    IL_0000:  call int32 class [mscorlib]System.Console::Read()
    IL_0005:  stloc.0 
    IL_0006:  ldloc.0 
    IL_0007:  ldc.i4.2 
    IL_0008:  div 
    IL_0009:  stloc.1 
    IL_000a:  ldloc.1 
    IL_000b:  call void class [mscorlib]System.Console::WriteLine(int32)
    IL_0010:  ret 
} // end of method Program::Main

Finally, transforming the bytecode into machine code assures us again that premature optimization is the root of all evil, as the code executed by the CPU at runtime contains the good old shift right (shr) opcode.

$ mono --aot=full monodiv.exe 
Mono Ahead of Time compiler - compiling assembly /home/dnet/_projekt/monodiv/monodiv.exe
Code: 38 Info: 4 Ex Info: 6 Unwind Info: 9 Class Info: 30 PLT: 3 GOT Info: 14 GOT: 48 Offsets: 47
Compiled 2 out of 2 methods (100%)
Methods without GOT slots: 2 (100%)
Direct calls: 0 (100%)
JIT time: 0 ms, Generation time: 0 ms, Assembly+Link time: 0 ms.
$ file monodiv.exe.so
monodiv.exe.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, not stripped
$ objdump -d monodiv.exe.so

monodiv.exe.so:     file format elf64-x86-64


Disassembly of section .text:

...

0000000000001020 <Program_Main>:
1020:       48 83 ec 08             sub    $0x8,%rsp
1024:       e8 17 00 00 00          callq  1040 <plt_System_Console_Read>
1029:       48 8b f8                mov    %rax,%rdi
102c:       c1 ef 1f                shr    $0x1f,%edi
102f:       03 f8                   add    %eax,%edi
1031:       d1 ff                   sar    %edi
1033:       e8 12 00 00 00          callq  104a <plt_System_Console_WriteLine_int>
1038:       48 83 c4 08             add    $0x8,%rsp
103c:       c3                      retq   
103d:       00 00                   add    %al,(%rax)
    ...

Using MS Word templates with LaTeX quickly

2012-09-12

After a successful penetration test, I wanted to publish a detailed writeup about it, but the template we use at the company that includes a logo and some text in the footer was created using Microsoft Word, and I prefer using LaTeX for typesetting. It would have been possible to recreate the template from scratch, but I preferred to do it quick and, as it turned out, not so dirty.

First, I saved a document written using the template from Word to PDF, opened it up in Inkscape and removed the body (e.g. everything except the header and the footer). Depending on the internals of the PDF saving mechanism, it might be necessary to use ungroup one or more times to avoid removing more than needed. After this simple editing, I saved the result as another PDF, called s2bg.pdf.

Next, I created a file named s2.sty with the following lines.

\ProvidesPackage{s2}

\RequirePackage[top=4cm, bottom=2.8cm, left=2.5cm, right=2.5cm]{geometry}
\RequirePackage{wallpaper}
\CenterWallPaper{1}{s2bg.pdf}

The first line sets the package name, while the next three adjust the margins (which I calculated by using the ones set in Word and some trial and error) and put the PDF saved in Inkscape to the background of every page. The wallpaper package is available in the texlive-latex-extra package on Debian systems.

As our company uses a specific shade of orange as a primary color, I also changed the \section command to use this color for section headings.

\RequirePackage{color}
\definecolor{s2col}{RGB}{240, 56, 31}

\makeatletter
\renewcommand{\section}{\@startsection{section}{1}{0mm}
{\baselineskip}%
{\baselineskip}{\normalfont\Large\sffamily\bfseries\color{s2col}}}%
\makeatother

Creating a package comes with the advantage, that only a single line needs to be added to a document to use all the formatting described above, just like with CSS. The following two documents only differ such that the one on the right has an extra \usepackage{s2} line in the header.

Same document without and with style package

Two documents published with this technique (although written in Hungarian) can be downloaded: the aforementioned writeup about client-side attacks and another one about things we did in 2011.


Installing CAcert on Android without root

2012-08-16

I've been using CAcert for securing some of my services with TLS/SSL, and when I got my Android phone I chose K-9 mail over the stock e-mail client because as the certificate installation page on the official CAcert site stated, it required root access to access the system certificate store. Now, one year and two upgrades (ICS, JB) later, I revisited the issue.

As of this writing, the CAcert site contains another method that also requires root access, but as Jethro Carr wrote in his blog, since at least ICS, it's possible to install certificates without any witchcraft, using not only PKCS12 but also PEM files. Since Debian ships the CAcert bundle, I used that file, but it's also possible to download the files from the official CAcert root certificate download page. Since I have Android SDK installed, I used adb (Android Debug Bridge) to copy the certificate to the SD card, but any other method (browser, FTP, e-mail, etc.) works too.

$ adb push /usr/share/ca-certificates/cacert.org/cacert.org.crt /sdcard/
2 KB/s (5179 bytes in 1.748s)

On the phone, I opened Settings > Security, scrolled to the bottom, and selected Install from storage. It prompted for a name of the certificate, and installed the certificate in a second without any further questions asked.

Installing the CAcert root certificate on Android

After this, the certificate can be viewed and by opening Trusted credentials and selecting the User tab, and browsing an HTTPS site with a CAcert-signed certificate becomes just as painless and secure as with any other built-in CA.

Using CAcert root certificate on Android


Extending Wireshark MySQL dissector

2012-06-10

I consider Wireshark one of the most successful FLOSS projects, and it outperforms many tools in the field of packet capture (USB sniffing is one of my favorite example), it is certainly the best tool I know for packet analysis, supportes by its many protocol-specific plugins called dissector. These fast little C/C++ libraries do one thing, and do that well by extracting all available information from packets of a certain protocol in a robust way.

In case of every network-related problem, Wireshark is one of the top three tools I use, since the analysis of network traffic provides an invaluable insight. This was also the case while I was experimenting with oursql, which provides Python bindings for libmysqlclient, the native MySQL client library, as an alternative to MySQLdb. While latter emulates client-side parameter interpolation (as JDBC does it too), former exploits the server-side prepared statements available since at least MySQL 4.0 (released in August 2002). The problem with Wireshark was that although it had dissector support for MySQL packets, dissection of COM_EXECUTE packets resulted in the following message.

Wireshark dissecting a MySQL execute packet before r39483

As the documentation of the MySQL ClientServer protocol states, COM_EXECUTE packets depend on information exchanged upon preparing the statement, which means that the dissector needed to be transformed to be stateful. It seemed that the original author of the MySQL dissector started working on the problem but then decided to leave it in an incomplete state.

To become familiar with the development of Wireshark, I started with the flags field of the COM_EXECUTE packet. The documentation linked above states, that although it was reserved for future use in MySQL 4.0, in later versions, it carries a meaningful value. However, the dissector always decoded the field with MySQL 4.0, regardless of the version actually used.

At first, I just added a new decoder and submitted a patch in the Wireshark Bug Database that decoded the four possible values according to the protocol documentation. Bill Meier took a look at it, and asked whether I could change the behaviour to treat version 4 and 5 differently, offering pointers with regard to storing per-connection data, effectively making the dissector stateful. I improved the patch and it finally made it into SVN.

Knowing this, I started implementing the dissection of the fields left, namely null_bit_map, new_parameter_bound_flag, type and values. The difficulty was that the lenght and/or offset of these packets depended on the number of placeholders sent during preparing the statement and also the number and index of parameters that were bound by binary streaming (COM_LONG_DATA packets). Since there already was a GHashTable member named smtms declared in the mysql_conn_data struct of per-connection data, and it was initialized upon the start of dissection and destroyed upon MYSQL_STMT_CLOSE and MYSQL_QUIT, and a my_stmt_data struct was declared with an nparam member, I just filled in the gaps by storing the number of parameters for every packet received in response to a COM_PREPARE.

diff --git a/packet-mysql.c b/packet-mysql.c
index 3adc116..9c71409 100644
--- a/packet-mysql.c
+++ b/packet-mysql.c
@@ -1546,15 +1546,22 @@ mysql_dissect_row_packet(tvbuff_t *tvb, int offset, proto_tree *tree)
 static int
 mysql_dissect_response_prepare(tvbuff_t *tvb, int offset, proto_tree *tree, mysql_conn_data_t *conn_data)
 {
+       my_stmt_data_t *stmt_data;
+       gint stmt_id;
+
        /* 0, marker for OK packet */
        offset += 1;
        proto_tree_add_item(tree, hf_mysql_stmt_id, tvb, offset, 4, ENC_LITTLE_ENDIAN);
+       stmt_id = tvb_get_letohl(tvb, offset);
        offset += 4;
        proto_tree_add_item(tree, hf_mysql_num_fields, tvb, offset, 2, ENC_LITTLE_ENDIAN);
        conn_data->stmt_num_fields = tvb_get_letohs(tvb, offset);
        offset += 2;
        proto_tree_add_item(tree, hf_mysql_num_params, tvb, offset, 2, ENC_LITTLE_ENDIAN);
        conn_data->stmt_num_params = tvb_get_letohs(tvb, offset);
+       stmt_data = se_alloc(sizeof(struct my_stmt_data));
+       stmt_data->nparam = conn_data->stmt_num_params;
+       g_hash_table_replace(conn_data->stmts, &stmt_id, stmt_data);
        offset += 2;
        /* Filler */
        offset += 1;

Later, I figured it out that using the GUI results in packets of the captured buffers being dissected in an undeterministic order and count, which could lead to problems, if the MYSQL_STMT_CLOSE packet was dissected before the COM_PREPARE, so I removed the destruction of the hashtable.

Having the hashtable filled with the number of parameters for each statement, I could make the dissector ignore the null_bit_map and decode the new_parameter_bound_flag. Former is unnecessary since NULL values are present as NULL typed parameters, while latter helps deciding whether the packet should contain values of not. If parameters are expected, the ones that are streamed are obviously not present in the COM_EXECUTE packet, which required the following modifications to be made.

  • The my_stmt_data struct was extended with a guint8* param_flags member.
  • Upon receiving a prepare response (and allocating the my_stmt_data struct), an array of 8-bit unsinged integers (guint8) was allocated and all bytes were set to 0.
  • During the dissection of a MYSQL_STMT_SEND_LONG_DATA packet, a specific bit of the matching byte in the param_flags array was set.
  • The dissector logic of COM_EXECUTE ignored packets that had the corresponding bit set.

All that's left was the dissection of actual values, which were less documented as most software depend on official MySQL code for client functionality. Because of this (and for testing purposes, too) I tried several programming languages and MySQL client libraries and captured the network traffic generated. Among these were

  • PHP (mysqli with libmysqlclient),
  • Python (oursql with libmysqlclient),
  • C (native libmysqlclient) and
  • Java (JDBC with MySQL Connector/J).

One of the things that wasn't mentioned anywhere was the way strings were encoded. First, I sent foobar in itself and the bytes reached the wire were "\x06foobar" which meant that the length of the string was encoded in the first byte. Next, I sent 300 characters and saw that the first byte was 0xfc and then came the length of the string in two bytes. Finally, I sent 66000 characters and got another magic character, 0xfd followed by the length of the string in three bytes. (Luckily, Wireshark has 24-bit little-endian integer read functionality built-in.) Another surprise was that more than one field type codes were encoded in the same way:

  • 0xf6 (NEWDECIMAL), 0xfc (BLOB), 0xfd (VAR_STRING) and 0xfe (STRING) are all strings encoded in the manner described above.
  • 0x07 (TIMESTAMP), 0x0a (DATE) and 0x0c (DATETIME) are all timestamps consisting of a calendar date and an optional time part.

At last, after many test captures, I managed to decode 15 different types with 10 different dissector algorithms. A const struct was created to keep the type-dissector mapping simple and extensible.

typedef struct mysql_exec_dissector {
    guint8 type;
    guint8 unsigned_flag;
    void (*dissector)(tvbuff_t *tvb, int *param_offset, proto_item *field_tree);
} mysql_exec_dissector_t;

static const mysql_exec_dissector_t mysql_exec_dissectors[] = {
    { 0x01, 0, mysql_dissect_exec_tiny },
    { 0x02, 0, mysql_dissect_exec_short },
    { 0x03, 0, mysql_dissect_exec_long },
    { 0x04, 0, mysql_dissect_exec_float },
    { 0x05, 0, mysql_dissect_exec_double },
    { 0x06, 0, mysql_dissect_exec_null },
    ...
    { 0xfe, 0, mysql_dissect_exec_string },
    { 0x00, 0, NULL },
};

I submitted the resulting patch in the Wireshark Bug Database, and Jeff Morriss pointed out the obvious memory leak caused by the removal of the hashtable destruction. He was very constructive, and told me about se_tree structures which provide the subset of GHashtable functionality I needed but are managed by the Wireshark memory manager, so no destruction was needed. After I modified my patch accordingly, it got accepted and finally made in into SVN. Here's an example screenshot how a successfully dissected MySQL COM_EXECUTE packet might look like in the new version.

Wireshark dissecting a MySQL execute packet after r39483

I hope you found this little story interesting, and maybe learned a thing or two about the way Wireshark and/or MySQL works. The task I accomplished required little previous knowledge in either of them, so all I can recommend is if you see an incomplete or missing dissector in Wireshark, check out the source code and start experimenting. Happy hacking!


DEF CON 20 CTF urandom 300 writeup

2012-06-04

As a proud member of the Hungarian team called “senkihaziak”, I managed to solve the following challenge for 300 points in the /urandom category on the 20th DEF CON Capture The Flag contest. The description consisted of an IP address, a port number, a password, and a hint.

Description of the challenge

Connecting with netcat to the specified IP address and port using TCP resulted in a password prompt being printed. After sending the password followed by a newline triggered the server to send back what seemed to be an awful lot of garbage, screwing up the terminal settings. A closer inspection using Wireshark revealed the actual challenge.

Network traffic after connecting and sending the password

Brief analysis of the network traffic dump confirmed that after about 500 bytes of text, exactly 200000 bytes of binary data was sent by the service, which equals to 100000 unsigned 16-bit numbers (uint16_t). As the text said, both the time and the number of moves to sort the array in-place was limited, and while I knew that quicksort is unbeatable in the former (it took 35 ms to sort the list on my 2.30GHz Core i3), I knew little to nothing about which algorithms support in-place sorting, and of those, which one requires the least number of exchanges.

KT came up with the idea of building a 64k long array to store the number of occurences of each index, filling it during reading, and iterating over it to achieve the sorted array. While this was a working concept in itself, it didn't give me what the challenge wanted – pairs of indices to exchange in order to reach the sorted state. To overcome this, I improved his idea by storing the position(s) on which the index occurs in a linked list insted of just the number of occurences. For easier understanding, here's an example of what this array of linked lists would look like on an array of 7 numbers.

Example of an array of linked lists

Since performance mattered, I chose C and began with establishing the TCP connection and handling the login.

#define PW "d0d2ac189db36e15\n"
#define BUFLEN 4096
#define PWPROMPTLEN 10

int main() {
  int i, sockfd;
  struct sockaddr_in serv_addr;
  char buf[BUFLEN];

  sockfd = socket(AF_INET, SOCK_STREAM, 0);
  memset(&serv_addr, '0', sizeof(serv_addr));

  serv_addr.sin_family = AF_INET;
  serv_addr.sin_port = htons(5601);
  inet_pton(AF_INET, "140.197.217.155", &serv_addr.sin_addr);

  connect(sockfd, (struct sockaddr *)&serv_addr, sizeof(serv_addr));
  for (i = 0; i < PWPROMPTLEN; i += read(sockfd, buf, BUFLEN));
  write(sockfd, PW, strlen(PW));
  ...
}

After login, there was 504 bytes of text to ignore, and then the numbers were read into an array.

#define MSGLEN 504
#define NUMBERS 100000

uint16_t nums[NUMBERS];

for (i = 0; i < MSGLEN; i += read(sockfd, buf, MSGLEN - i));
for (i = 0; i < NUMBERS * sizeof(uint16_t);
  i += read(sockfd, ((char*)nums) + i, NUMBERS * sizeof(uint16_t) - i));

After the numbers were available in a local array, the array of linked lists was built. The end of a linked list was marked with a NULL pointer, so the array was initialized with 0 bytes;

typedef struct numpos {
  int pos;
  struct numpos *next;
} numpos_t;

numpos_t *positions[MAXNUM];

memset(positions, 0, MAXNUM * sizeof(void*));
for (i = 0; i < NUMBERS; i++) {
  numpos_t *newpos = malloc(sizeof(numpos_t));
  newpos->next = positions[nums[i]];
  newpos->pos = i;
  positions[nums[i]] = newpos;
}

The heart of the program is the iteration over this array of lists. The outer loop goes over each number in ascending order, while the inner loop iterates over the linked lists. An auxiliary counter named j tracks the current index on the output array. Inside the loops the current number is exchanged with the one at the current index in the original array, and the positions array of linked lists is also changed to reflect the layout of the output array.

int n, j = 0;
numpos_t *cur, *cur2;

for (n = 0; n < MAXNUM; n++) {
  for (cur = positions[n]; cur != NULL; cur = cur->next) {
    if (cur->pos != j) {
      sprintf(buf, "%d:%d\n", cur->pos, j);
      write(sockfd, buf, strlen(buf));
      tmp = nums[j];
      nums[j] = n;
      for (cur2 = positions[tmp]; cur2 != NULL; cur2 = cur2->next) {
        if (cur2->pos == j) {
          cur2->pos = cur->pos;
          break;
        }
      }
      nums[cur->pos] = tmp;
    }
    j++;
  }
}

Finally, there's only one thing left to do: send an empty line and wait for the key to arrive.

write(sockfd, "\n", 1);
while (1) {
  if ((i = read(sockfd, buf, BUFLEN))) {
    buf[i] = '\0';
    printf("Got %d bytes of response: %s\n", i, buf);
  }
}

After an awful lot of local testing, the final version of the program worked perfectly for the first time it was ran on the actual server, and printed the following precious key.

Result of the successful run, displaying the key


DEF CON 20 CTF grab bag 300 writeup

2012-06-04

As a proud member of the Hungarian team called “senkihaziak”, I managed to solve the following challenge for 300 points in the grab bag category on the 20th DEF CON Capture The Flag contest. The description consisted of an IP address, a port number, a password, and a hint.

Description of the challenge

Connecting with netcat to the specified IP address and port using TCP and sending the password followed by a newline triggered the server to send back the actual challenge, utilizing ANSI escape sequences for colors.

Output of netcat after connecting and sending the password

As Buherátor pointed it out, the matrices are parts of a scheme designed to hide PIN codes in random matrices in which only the cardholder knows which digits are part of the PIN code. The service sent three matrices for which the PIN code was known and the challenge was to find the PIN code for the fourth one. As we hoped, the position of the digits within the matrices were the same for all four, so all we needed to do was to find a set of valid positions for each matrix, and apply their intersection to the fourth. I chose Python for the task, and began with connecting to the service.

PW = '5fd78efc6620f6\n'
TARGET = ('140.197.217.85', 10435)
PROMPT = 'Enter ATM PIN:'

def main():
  with closing(socket.socket()) as s:
    s.connect(TARGET)
    s.send(PW)
    buf = ''
    while PROMPT not in buf:
      buf += s.recv(4096)
    pin = buffer2pin(buf)
    s.send(pin + '\n')

The buffer2pin function parses the response of the service and returns the digits of the PIN code, separated with spaces. First, the ANSI escape sequences are stripped from the input buffer. Then, the remaining contents are split into an array of lines (buf.split('\n')), trailing and leading whitespaces get stripped (imap(str.strip, ...)), and finally, lines that doesn't contain a single digit surrounded with spaces get filtered out.

ESCAPE_RE = re.compile('\x1b\\[0;[0-9]+;[0-9]+m')
INTERESTING_RE = re.compile(' [0-9] ')

def buffer2pin(buf):
  buf = ESCAPE_RE.sub('', buf)
  buf = filter(INTERESTING_RE.search, imap(str.strip, buf.split('\n')))
  ...

By now, buf contains strings like '3 5 8 4 1 2' and 'User entered: 4 5 2 7', so it's time to build the sets of valid positions. The initial sets contain all valid numbers, and later, these sets get updated with an intersection operation. For each example (a matrix with a valid PIN code) the script joins the six lines of the matrix and removes all spaces. This results in base holding 36 digits as a string. Finally, the innen for loop iterates over the four digits in the last line of the current example (User entered: 4 5 2 7) and finds all occurences in the matrix. The resulting list of positions is intersected with the set of valid positions for the current digit (sets[n]). I know that using regular expressions for this purpose is a little bit of an overkill, but it's the least evil of the available solutions.

EXAMPLES = 3
DIGITS = 4
INIT_RANGE = range(36)

def buffer2pin(buf):
  ...
  sets = [set(INIT_RANGE) for _ in xrange(DIGITS)]
  for i in xrange(EXAMPLES):
    base = ''.join(buf[i * 7:i * 7 + 6]).replace(' ', '')
    for n, i in enumerate(ifilter(str.isdigit, buf[i * 7 + 6])):
      sets[n].intersection_update(m.start() for m in re.finditer(i, base))
  ...

The only thing that remains is to transform the fourth matrix into a 36 chars long string like the other three, and pick the digits of the resulting PIN code using the sets, which – hopefully – only contain one element each by now.

def buffer2pin(buf):
  ...
  quest = ''.join(buf[3 * 7:3 * 7 + 6]).replace(' ', '')
  return ' '.join(quest[digit.pop()] for digit in sets)

The resulting script worked almost perfectly, but after the first run, we found out that after sending a correct PIN code, several more challenges were sent, so the whole logic had to be put in an outer loop. The final script can be found on Gist, and it produced the following output, resulting in 300 points.

Result of a successful run, displaying the key


Extracting DB schema migration from Redmine

2012-04-21

Although I consider keeping SQL schema versioned a good habit, and several great solutions exist that automatize the task of creating migration scripts to transform the schema of the database from version A to B, for most of my projects, I find it sufficient to record a hand-crafted piece of SQL in the project/issue log. For latter, I mostly use Redmine, which offers a nice REST-style API for the issue tracker. Since it returns XML, I chose XSL to do the necessary transformations to extract the SQL statements stored in the issue logs.

For purposes of configuration, I chose something already in the system: Git, my choice of SCM solution. One can store hierarchical key-value pairs in a systemwide, user- or repository-specific way, all transparently accessible through a simple command line interface. For purposes of bridging the gap between Git and the XML/XSL, I chose shell scripting and xsltproc since producing a working prototype is only a matter of minutes.

The end product is a shell script that extracts the Git-style history expression from command line and passes it directly to the git logcommand, which in turn parses it just like the user would assume. The output is formatted in a way that the only output is the first lines of commit messages in the specified range of commits. If the commancd fails, the original message is shown, so the script doesn't need to know anything about git commit range parsing or other internals.

GL=$(git log --pretty=format:"%s" --abbrev-commit )

if [ $? -ne 0 ]; then
  echo "Git error occured: $GL" 1>&2
  exit 1
fi

Since the HTML-formatted issue log messages are double-encoded in the API XML output, two round of XSL transformation needs to be done. The first round extracts log entries probably containing SQL fragments and with the output method set to text, it decodes HTML entities embedded into XML.

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="text"/>
 <xsl:template match="/">
  <xsl:for-each
    select="issue/journals/journal/notes[contains(text(), 'sql')]">
   <xsl:value-of select="text()"/>
  </xsl:for-each>
 </xsl:template>
</xsl:stylesheet>

The above XSL takes the output of the XML REST API and produces XHTML fragments for every issue log entry. The following part of the shell script extracts the issue numbers from the commit messages (egrep and sed), calls the issue API (curl) with each ID exactly once (sort -u), passes the output through the first XSL and concatenate these along with an artifical/fake XML root in order to produce well-formed XML, ready for the second pass.

echo '<?xml version="1.0" encoding="utf-8"?><fakeroot>'
echo "$GL" | egrep -o '#[0-9]+' | sort -u | sed 's/#//' \
  | while read ISSUE; do
    curl --silent "$BASE/issues/$ISSUE.xml?key=$KEY&include=journals" \
      | xsltproc "$DIR/notes.xsl" -
  done
echo '</fakeroot>'

The second pass extracts code tags with language set as sql, and the method is again set to text, causing a second expansion of HTML entities. The output of this final XSL transformation is a concatenation of SQL statements required to transform the database schema to be in sync with the commit range specified.

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="text"/>
 <xsl:template match="/">
  <xsl:for-each select="fakeroot/pre/code[@class = 'sql']">
   <xsl:value-of select="normalize-space(text())"/>
   <xsl:text>&#10;</xsl:text>
  </xsl:for-each>
 </xsl:template>
</xsl:stylesheet>

While this stylesheet follows almost the same logic as the first one, it's worth noting the usage of normalize-space() and a literal newline, which formats the output in a nice way – SQL fragments are separated from each other by a single newline, no matter if there's any trailing or leading whitespace present in the code. The code is available under MIT license on GitHub.


Unofficial Android app for alldatasheet.com

2012-04-17

In February 2012, I read the Hack a Day article about ElectroDroid, and the following remark triggered challenge accepted in my mind.

A ‘killer app’ for electronic reference tools would be a front end for
alldatasheet.com that includes the ability to search, save, and display
the datasheet for any imaginable component.

First, I checked whether any applications like that exists on the smartphone application markets. I found several applications of high quality but tied to certain chip vendors, such as Digi-Key and NXP. There's also one that implies to be an alldatasheet.com application, it even calls itself Datasheet (Alldatasheet.com), but as one commenter writes

All this app does is open a web browser to their website.
Nothing more. A bookmark can suffice.

I looked around the alldatasheet.com website and found the search to be rather easy. Although there's no API available, the HTML output can be easily parsed with the MIT-licensed jsoup library. First I tried to build a separate Java API for the site, and a separate Android UI, with former having no dependencies on the Android library. The API can be found in the hu.vsza.adsapi package, and as of version 1.0, it offers two classes. The Search class has a method called searchByParName which can be used to use the functionality of the left form on the website. Here's an example:

List<Part> parts = Search.searchByPartName("ATMEGA168", Search.Mode.MATCH);

for (Part part : part) {
    doSomethingWithPart(part);
}

The Part class has one useful method called getPdfConnection, which returns an URLConnection instance that can be used to read the PDF datasheet about the electronics part described by the object. It spoofs the User-Agent HTTP header and sends the appropriate Referer values wherever it's necessary to go throught the process of downloading the PDF. This can be used like this:

URLConnection pdfConn = selectedPart.getPdfConnection();
pdfConn.connect();
InputStream input = new BufferedInputStream(pdfConn.getInputStream());
OutputStream output = new FileOutputStream(fileName);

byte data[] = new byte[1024];
long total = 0;
while ((count = input.read(data)) != -1) output.write(data, 0, count);

output.flush();
output.close();
input.close();

The Android application built around this API displays a so-called Spinner (similar to combo lists on PCs) to select search mode and a text input to enter the part name, and a button to initiate search. Results are displayed in a list view displaying the name and the description of each part. Touching a part downloads the PDF to the SD card and opens it with the default PDF reader (or prompts for selection if more than one are installed).

ADSdroid version 1.0 screenshots

You can download version 1.0 by clicking on the version number link or using the QR code below. It only does one thing (search by part name), and even that functionality is experimental, so I'm glad if anyone tries it and in case of problems, contacts me in e-mail. The source code is available on GitHub, licensed under MIT.

ADSdroid version 1.0 QR code


Mounting Sympa shared directories with FUSE

2012-03-29

The database laboratory course at the Budapest University of Technology and Economics which I collaborate with as a lecturer uses Sympa for mailing lists and file sharing. Latter is not one of the most used features of this software, and the web interface feels sluggish, not to mention the lots of leftover files in my Downloads directory for each attempt to view one page of a certain file. I understood that using the same software for these two tasks made managing user accounts easier, so I tried to come up with a solution that makes it easier to handle these files with the existing setup.

First, I searched whether an API for Sympa exists and I found that while they created the Sympa SOAP server, it only handles common use-cases related to mailing lists management, so it can be considered a dead end. This meant that my solution had to use the web interface, so I selected an old and a new tool for the task: LXML for parsing, since I already knew of its power, and requests for handling HTTP, because of its fame. These two tools made it possible to create half of the solution first, resulting in a Sympa API that can be used independently of the file system bridge.

Two things I found particularly great about requests were that its handling of sessions was superior than any APIs I've ever seen, and that it was possible to retrieve the results in multiple formats (raw socket, bytes, Unicode text). Since I only had one Sympa installation to test with, I only hacked the code so far to make it work, so for example, I had to use regular expressions to strip the XML and HTML encoding information, since both stated us-ascii while the output was in ISO-8859-2, correctly stated in the HTTP Content-type header.

In the second half of the time, I had to create a bridge between the file system and the API I created, and FUSE was my natural choice. Choosing the Python binding was not so easy, as a Debian user, the python-fuse package seemed like a logical choice, but as Matt Joiner wrote in his answer on a related Stack Overflow question, fusepy was a better choice. Using one of the examples, I managed to build an experimental version of SympaFS with naive caching and session management, but it works!

$ mkdir /tmp/sympa
$ python sympafs.py https://foo.tld/lists foo@bar.tld adatlabor /tmp/sympa
Password:
$ mount | fgrep sympa
SympaFS on /tmp/sympa type fuse (rw,nosuid,nodev,relatime,user_id=1000,
group_id=1000)
$ ls -l /tmp/sympa/2012
összesen 0
-r-xr-xr-x 1 root root  11776 febr   9 00:00 CensoredFile1.doc
-r-xr-xr-x 1 root root 161792 febr  22 00:00 CensoredFile2.xls
-r-xr-xr-x 1 root root  39424 febr   9 00:00 CensoredFile3.doc
dr-xr-xr-x 2 root root      0 febr  14 00:00 CensoredDir1
dr-xr-xr-x 2 root root      0 ápr    4  2011 CensoredDir2
$ file /tmp/sympa/2012/CensoredFile1.doc
Composite Document File V2 Document, Little Endian, Os: Windows, Version
5.1, Code page: 1252, Author: Censored, Last Saved By: User, Name of
Creating Application: Microsoft Excel, Last Printed: Tue Feb 14 15:00:39
2012, Create Time/Date: Wed Feb  8 21:51:10 2012, Last Saved Time/Date:
Wed Feb 22 08:10:20 2012, Security: 0
$ fusermount -u /tmp/sympa

Tracking history of docx files with Git

2012-03-27

Just as with PHP, OOXML, and specifically, docx is not my favorite format, but when I use it, I prefer tracking the history using my preferred SCM of choice, Git. What makes it perfect to track documents is not only the fact that setting up a repository takes one command and a few miliseconds, but its ability to use an external program to transform artifacts (files) to text before displaying differences, which results in meaningful diffs.

The process of setting up an environment like this is described best in Chapter 7.2 of Pro Git. The solution I found best to convert docx files to plain text was docx2txt, especially since it's available as a Debian package in the official repositories, so it takes only an apt-get install docx2txt to have it installed on a Debian/Ubuntu box.

The only problem was that Git executes the text conversion program with the name of the input file given as the first and only argument, and docx2txt (in contrast with catdoc or antiword, which uses the standard output) saves the text content of foo.docx in foo.txt. Because of this, I needed to create a wrapper in the form of the following small shell script.

#!/bin/sh
docx2txt <$1

That being done, the only thing left to do is configuring Git to use this wrapper for docx files by issuing the following commands in the root of the repository.

$ git config diff.docx.textconv /path/to/wrapper.sh
$ echo "*.docx diff=docx" >>.git/info/attributes

End-to-end secure REST service using CakePHP

2012-03-14

While PHP is not my favorite language and platform of choice, I have to admit its ease of deployment, and that's one of the reasons I've used it to build some of my web-related projects, including the REST API and the PNG output of HackSense, and even the homepage of my company. Some of these also used CakePHP, which tries to provide the flexibility and “frameworkyness” of Ruby on Rails while keeping it easy to deploy. It also has the capability of simple and rapid REST API development, which I often prefer to the bloatedness of SOAP.

One of the standardized non-functional services of SOAP is WS-Security, and while it's great for authentication and end-to-end signed messages, its encryption scheme not only has a big overhead, but it had been cracked in 2011, thus cannot be considered secure. That being said, I wanted a solution that can be applied to a REST API, does not waste resources (e.g. spawning OS processes per HTTP call), and uses as many existing code as feasible.

The solution I came up with is a new layout for CakePHP that uses the GnuPG module of PHP, which in turn uses the native GnuPG library. This also means, that the keyring of the user running the web server has to be used. Also, Debian (and thus Ubuntu) doesn't ship this module as a package, so it needs to be compiled, but it's no big deal. Here's what I did:

# apt-get install libgpgme11-dev php5-dev
# wget http://pecl.php.net/get/gnupg-1.3.2.tgz
# tar -xvzf gnupg-1.3.2.tgz
# phpize && ./configure && make && make install
# echo "extension=gnupg.so" >/etc/php5/conf.d/gnupg.ini
# /etc/init.d/apache2 reload

These versions made sense in February 2012, so make sure that libgpgme, PHP and the PHP GnuPG module refers to the latest version available. After the last command has executed successfully, PHP scripts should be able to make use of the GnuPG package. I crafted the following layout in views/layouts/gpg.ctp:

<?php

$gpg = new gnupg();
$gpg->addencryptkey(Configure::read('Gpg.enckey'));
$gpg->addsignkey(Configure::read('Gpg.sigkey'));
$gpg->setarmor(0);
$out = $gpg->encryptsign($content_for_layout);
header('Content-Length: ' . strlen($out));
header('Content-Type: application/octet-stream');
print $out;

?>

By using Configure::read($key), the keys used for making signatures and encryption can be stored away from the code, I put the following two lines in config/core.php:

Configure::write('Gpg.enckey', 'ID of the recipient's public key');
Configure::write('Gpg.sigkey', 'Fingerprint of the signing key');

And at last, actions that require this security layer only need a single line in the controller code (e.g. controllers/foo_controller.php):

$this->layout = 'gpg';

Make sure to set this as close to the beginning of the function as you can to avoid leaking error messages to attackers triggering errors in the code before the layout is set to the secured one.

And that's it, the layout makes sure that all information sent from the view is protected both from interception and modification. During testing, I favored using armored output, I only disabled it after moving it to production, so if it's needed, only two lines need modification: setarmor(0) should be setarmor(1) and the Content-Type should be set to text/plain. Have fun!


Reverse engineering chinese scope with USB

2012-03-04

The members of H.A.C.K. – one of the less wealthy hackerspaces – felt happy at first, when the place could afford to buy a slightly used UNI-T UT2025B digital storage oscilloscope. Besides being useful as a part of the infrastructure, having a USB and an RS-232 port seized our imagination – one of the interesting use-cases is the ability to capture screenshots from the device to illustrate documentation. As I tried interfacing the device, I found that supporting multiple platforms meant Windows XP and 2000 for the developers, which are not very common in the place.

I installed the original software in a virtual machine, and tried the serial port first, but found out, that although most of the functionality worked, taking screenshots is one available only using USB. I connected the scope using USB next, and although the vendor-product tuple was present in the list of USB IDs, so lsusb could identify it, no drivers in the kernel tried to take control of the device. So I started looking for USB sniffing software and found that on Linux, Wireshark is capable of doing just that. I forwarded the USB device into the VM and captured a screenshot transmission for analysis. Wireshark was very handy during analysis as well – just like in case of TCP/IP – so it was easy to spot the multi-kilobyte bulk transfer among tiny 64 byte long control packets.

Wireshark analysis of screenshot transmission via USB

I started looking for simple ways to reproduce the exact same conversation using free software – I've used libusb before while experimenting with V-USB on the Free USB JTAG interface project, but C requires compilation, and adding things like image processing makes the final product harder to use on other computers. For these purposes, I usually choose Python, and as it turned out, the PyUSB library makes it possible to access libusb 0.1, libusb 1.0 and OpenUSB through a single pythonic layer. Using this knowledge, it was pretty straightforward to modify their getting started example and replicate the “PC end” of the conversation. The core of the resulting code is the following.

dev = usb.core.find(idVendor=0x5656, idProduct=0x0832)
if dev is None:
    print >>sys.stderr, 'USB device cannot be found, check connection'
    sys.exit(1)

dev.set_configuration()
dev.ctrl_transfer(ReqType.CTRL_OUT, 177, 0x2C, 0)
dev.ctrl_transfer(ReqType.CTRL_IN, 178, 0, 0, 8)
for i in [0xF0] + [0x2C] * 10 + [0xCC] * 10 + [0xE2]:
    dev.ctrl_transfer(ReqType.CTRL_OUT, 177, i, 0)

try:
    dev.ctrl_transfer(ReqType.CTRL_OUT, 176, 0, 38)
    for bufsize in [8192] * 4 + [6144]:
        buf = dev.read(Endpoint.BULK_IN, bufsize, 0)
        buf.tofile(sys.stdout)
    dev.ctrl_transfer(ReqType.CTRL_OUT, 177, 0xF1, 0)
except usb.core.USBError:
    print >>sys.stderr, 'Image transfer error, try again'
    sys.exit(1)

Using this, I managed to get a binary dump of 38912 bytes, which contained the precious screenshot. From my experience with the original software, I already knew that the resolution is 320 by 240 pixels – which meant that 4 bits made up each pixel. Using this information, I started generating bitmaps from the binary dump in the hope of identifying some patterns visually as I already knew what was on the screen. The first results were the result of converting each 4-bit value to a pixel coloured on a linear scale from 0 = black to 15 = white, and looked like the following.

Early version of a converted screenshot

Most of the elements looked like they're in the right spot, and both horizontal and vertical lines seemed intact, apart from the corners. Also, the linear mapping resulted in an overly bright image, and as it seemed, the firmware was transmitting 4-bit (16 color) images, even though the device only had a monochrome LCD – and the Windows software downgraded the quality before displaying it on the PC on purpose. After some fiddling, I figured out that the pixels were transmitted in 16-bit words, and the order of the pixels inside these were 3, 4, 1, 2 (“mixed endian”). After I added code to compensate for this and created a more readable color mapping I finally had a script that could produce colorful PNGs out of the BLOBs, see below for an example.

Final version of a converted screenshot

In the end, my solution is not only free as in both senses and runs on more platforms, but can capture 8 times more colors than the original one. All code is published under MIT license, and further contributions are welcome both on the GitHub repository and the H.A.C.K. wiki page. I also gave a talk about the project in Hungarian, the video recording and the slides can be found on the bottom of the wiki page.



next posts >
< prev posts

CC BY-SA RSS Export
Proudly powered by Utterson