VSzA techblog

Extracting DB schema migration from Redmine


Although I consider keeping SQL schema versioned a good habit, and several great solutions exist that automatize the task of creating migration scripts to transform the schema of the database from version A to B, for most of my projects, I find it sufficient to record a hand-crafted piece of SQL in the project/issue log. For latter, I mostly use Redmine, which offers a nice REST-style API for the issue tracker. Since it returns XML, I chose XSL to do the necessary transformations to extract the SQL statements stored in the issue logs.

For purposes of configuration, I chose something already in the system: Git, my choice of SCM solution. One can store hierarchical key-value pairs in a systemwide, user- or repository-specific way, all transparently accessible through a simple command line interface. For purposes of bridging the gap between Git and the XML/XSL, I chose shell scripting and xsltproc since producing a working prototype is only a matter of minutes.

The end product is a shell script that extracts the Git-style history expression from command line and passes it directly to the git logcommand, which in turn parses it just like the user would assume. The output is formatted in a way that the only output is the first lines of commit messages in the specified range of commits. If the commancd fails, the original message is shown, so the script doesn't need to know anything about git commit range parsing or other internals.

GL=$(git log --pretty=format:"%s" --abbrev-commit )

if [ $? -ne 0 ]; then
  echo "Git error occured: $GL" 1>&2
  exit 1

Since the HTML-formatted issue log messages are double-encoded in the API XML output, two round of XSL transformation needs to be done. The first round extracts log entries probably containing SQL fragments and with the output method set to text, it decodes HTML entities embedded into XML.

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
 <xsl:output method="text"/>
 <xsl:template match="/">
    select="issue/journals/journal/notes[contains(text(), 'sql')]">
   <xsl:value-of select="text()"/>

The above XSL takes the output of the XML REST API and produces XHTML fragments for every issue log entry. The following part of the shell script extracts the issue numbers from the commit messages (egrep and sed), calls the issue API (curl) with each ID exactly once (sort -u), passes the output through the first XSL and concatenate these along with an artifical/fake XML root in order to produce well-formed XML, ready for the second pass.

echo '<?xml version="1.0" encoding="utf-8"?><fakeroot>'
echo "$GL" | egrep -o '#[0-9]+' | sort -u | sed 's/#//' \
  | while read ISSUE; do
    curl --silent "$BASE/issues/$ISSUE.xml?key=$KEY&include=journals" \
      | xsltproc "$DIR/notes.xsl" -
echo '</fakeroot>'

The second pass extracts code tags with language set as sql, and the method is again set to text, causing a second expansion of HTML entities. The output of this final XSL transformation is a concatenation of SQL statements required to transform the database schema to be in sync with the commit range specified.

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
 <xsl:output method="text"/>
 <xsl:template match="/">
  <xsl:for-each select="fakeroot/pre/code[@class = 'sql']">
   <xsl:value-of select="normalize-space(text())"/>

While this stylesheet follows almost the same logic as the first one, it's worth noting the usage of normalize-space() and a literal newline, which formats the output in a nice way – SQL fragments are separated from each other by a single newline, no matter if there's any trailing or leading whitespace present in the code. The code is available under MIT license on GitHub.

Org-mode to RSS and custom HTML


Org-mode is one of the many outliner solutions I've seen, and I prefer it because of its slogan "Your Life in Plain Text". It allows me to keep track of my life in form of notes, lists and plans using only a text editor, and as the name suggests, it has its origins in Emacs, and it's possible to export these files to a number of formats. My problem was that I found no easy way to customize the HTML output, and I wanted to create a solution that'd allow me to generate an HTML page and an RSS feed from an .org file of mine.

My first hack was a really rudimentary solution that used ugly regular expressions and sed tied together in a shell script. It had many problems: first of all, it depended heavily on the formatting of the document, even small deviations would've made it fail. Also, it was difficult (e.g. required less-readable constructs) to achieve things that are usually trivial using any XML-friendly environment, for instance closing tags if there are no remaining items, but before closing the whole document.

rm -f $OUTFILE
... HTML header ...
while read LINE; do
    OUT=$(echo "$LINE" | sed \
        -e 's/^[^*].*$//' \
        -e 's/^\*\ \(.*\)$/<h2>\1<\/h2>/' \
        -e 's/^\*\*\ DONE\ \[\[\([^]]*\)\]\[\([^]]*\)\]\]/<li class="done"><a href="\1">\2<\/a><\/li>/' \
        -e 's/^\*\*\ \[\[\([^]]*\)\]\[\([^]]*\)\]\]/<li><a href="\1">\2<\/a><\/li>/')
    echo "$OUT" | grep -v '/h2' >/dev/null 2>&1 || [ $N -eq 1 ] || echo "</ul>" >>$OUTFILE
    N=$[$N + 1]
    echo "$OUT" >>$OUTFILE
    echo "$OUT" | grep -v '/h2' >/dev/null 2>&1 || echo "<ul>" >>$OUTFILE
done <$INFILE
echo "</ul></body></html>" >>$OUTFILE

I thought of playing around with third party org-mode parsers, such as the Python one I used and improved called Orgnode, but that would've been also a compromise between a clean solution that involves to external org parsing and the simple but rude shell script, having few pros, but the cons of both. In the end I took a look on the export options of org-mode and found that it's capable of creating DocBook output. I hadn't used DocBook before, only heard of it from vmiklos, but figured out that it's an XML-based document markup language.

The remaining task is transforming XML to XML, and XSLT is the most powerful tool for it. I created two stylesheets, one for XHTML output and one for RSS, they almost instantly worked and produced the same (or better) output as the shell script. I also found a bug/feature in the DocBook export as it converts URL quoted special characters to their literal equivalents (such as %C3%B6 to ö), which may cause incompatibilities in some browsers and also makes the RSS invalid.

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
  <xsl:output method="xml" />
  <xsl:template match="/">
    <rss version="2.0">
        <lastBuildDate><xsl:value-of select="$lbd" /></lastBuildDate>
        <xsl:for-each select="db:article/db:section/db:section/db:title">
          <xsl:if test="not(contains(text(), 'DONE'))">
              <link><xsl:value-of select="db:link/@xlink:href" /></link>
              <guid><xsl:value-of select="db:link/@xlink:href" /></guid>
              <description><xsl:value-of select="db:link" /></description>
              <title><xsl:value-of select="db:link" /></title>

RSS needs the build date to be passed in RFC 2822 format, which is much easier to do in shell rather than some weird XSLT way. I used xsltproc which allows parameters to be passed on the command line, and the date command is capable of printing the date just in the right format. This way the RSS can be published using the following command.

$ xsltproc --stringparam lbd "$(date --rfc-2822)" \
    rss.xsl docbook.xml >feed.rss

Proudly powered by Utterson