eulcommon.binfile.eudora – Eudora email index files

Map binary email table of contents files for the Eudora mail client to Python objects.

The Eudora email client has a long history through the early years of email. It supported versions for early Mac systems as well as early Windows OSes. Unfortunately, most of them use binary file formats that are entirely incompatible with one another. This module is aimed at one day reading all of them, but for now practicality and immediate needs demand that it focus on the files saved by a particular version on mid-90s Mac System 7.

That Eudora version stores email in flat (non-hierarchical) folders. It stores each folder’s email data in a single file akin to a Unix mbox file, but with some key differences, described below. In addition to this folder data file, each folder also stores a binary “table of contents” index. In this version, a folder called In stores its index in a file called In.toc. This file consists of a fixed-size binary header with folder metadata, followed by fixed-size binary email records containing cached email header metadata as well as the location of the full email in the mbox-like data file. As the contents of the folder are updated, these fixed-size binary email records are added, removed, and reordered, apparently compacting the file as necessary so that it matches the folder contents displayed to the application end user.

With the index serving to dictate the order of the emails and their contents, their locations and sizes inside the data storage file become less important. When emails are deleted from a folder, the index is updated, but they are not removed immediately from the data file. Instead that data space is marked as inactive and might be reused later when a new email is added to the folder. As a result, the folder data file may contain stale and out-of-order data and thus cannot be read directly as a standard mbox file.

This module, then, provides classes for parsing the binary structures of the index file and mapping them to Python objects. This binary file has gone through many formats. Only one is represented in this module, though it could certainly be expanded to support more. Parsers and information about other versions of the index file are available at http://eudora2unix.sourceforge.net/ and http://users.starpower.net/ksimler/eudora/toc.html; these were immensely helpful in reverse-engineering the version represented by this module.

This module exports the following names:
class eulcommon.binfile.eudora.Message(fobj=None, mm=None, offset=0)

A BinaryStructure for a single email’s metadata cached in the index file.

Only a few fields are currently represented; other fields contain interesting data but have not yet been reverse-engineered.

class eulcommon.binfile.eudora.Toc(fobj=None, mm=None, offset=0)

A BinaryStructure for an email folder index header.

Only a few fields are currently represented; other fields contain interesting data but have not yet been reverse-engineered.

messages

a generator yielding the Message structures in the index