VSzA techblog

Single mbox outbox vs. multiple IMAP accounts

2013-04-01

As I've mentioned in February 2013, I started using mutt in December 2012 and as a transitional state, I've been using my three IMAP accounts in on-line mode, like I did with KMail. All outgoing mail got recorded in an mbox file called ~/Mail/Sent for all three accounts, which was not intentional, but a configuration glitch at first. But now I realized that it has two positive side effects when I'm using cellular Internet connection. Since this way, the MUA doesn't upload the message using IMAP to the Sent folder, resulting in 50% less data sent, which makes sending mail faster and saves precious megabytes in my mobile data plan.

However, I still prefer having my sent mail present in the Sent folder of my IMAP accounts, so I needed a solution to transfer the contents of an mbox file to IMAP folders based on the From field. I preferred Python for the task as the standard library had support for both IMAP and mbox out of the box, and I've already had good experience with the former. Many solutions I found used Python as well, but none of them had support for multiple IMAP accounts and many used deprecated classes, or treated the process as a one-shot operation, while I planned to use this to upload my mbox regularly to IMAP.

So I decided to write a simple script, which I completed in about an hour or two that did exactly what I need, and still had no dependencies to anything that's not part of the standard library. The script has support for invocation from other modules and the command line as well, core functionality was implemented in the process_mbox method of the OutboxSyncer class. The method gets the Mailbox object and a reference for a database as parameters, latter is used to ensure that all messages are uploaded exactly once, even in case of exceptions or parallel invocations.

for key, msg in mbox.iteritems():
    account, date_time = msg.get_from().split(' ', 1)
    contents = mbox.get_string(key)
    msg_hash = HASH_ALGO(contents).hexdigest()
    params = (msg_hash, account)

The built-in iterator of the mailbox is used to iterate through messages in a memory-efficient way. Both key and msg are needed as former is needed to obtain the raw message as a byte string (contents), while latter makes parsed data, such as the sender (account) and the timestamp (date_time) accessible. The contents of the message is hashed (currently using SHA-256) to get a unique identifier for database storage. In the last line, params is instantiated for later usage in parameterized database queries.

with db:
    cur.execute(
        'SELECT COUNT(*) FROM messages WHERE hash = ? AND account = ?',
        params)
    ((count,),) = cur.fetchall()
    if count == 0:
        cur.execute('INSERT INTO messages (hash, account) VALUES (?, ?)',
            params)
    else:
        continue

By using the context manager of the database object, checking whether the message free for processing and locking it is done in a single transaction, resulting in a ROLLBACK in case an exception gets thrown and in a COMMIT otherwise. Assigning the variable count was done this way to assert that the result has a single row with a single column. If the message is locked or has already been uploaded, the mailbox iterator is advanced without further processing using continue.

try:
    acc_cfg = accounts[account]
    imap = self.get_imap_connection(account, acc_cfg)
    response, _ = imap.append(acc_cfg['folder'], r'\Seen',
            parsedate(date_time), contents)
    assert response == 'OK'

After the message is locked for processing, it gets uploaded to the IMAP account into the folder specified in the configuration. The class has a get_imap_connection method that calls the appropriate imaplib constructors and takes care of connection pooling to avoid connection and disconnection for every message processed. The return value of the IMAP server is checked to avoid silent fail.

except:
    with db:
        cur.execute('DELETE FROM messages WHERE hash = ? AND account = ?',
            params)
    raise
else:
    print('Appended', msg_hash, 'to', account)
    with db:
        cur.execute(
            'UPDATE messages SET success = 1 WHERE hash = ? AND account = ?',
            params)

In case of errors, the message lock gets released and the exception is re-raised to stop the process. Otherwise, the success flag is set to 1, and processing continues with the next message. Source code is available in my GitHub repository under MIT license, feel free to fork and send pull requests or comment on the code there.

permalink


next posts >
< prev post

CC BY-SA RSS Export
Proudly powered by Utterson