eulcore.binfile – Map binary data to Python objects¶
Map binary data on-disk to read-only Python objects.
This module facilitates exposing stored binary data using common Pythonic
idioms. Fields in relocatable binary objects map to Python attributes using
a priori knowledge about how the binary structure is organized. This is akin
to the standard struct module, but with some slightly different use
cases. struct, for instance, offers a more terse syntax, which is
handy for certain simple structures. struct is also a bit faster
since it’s implemented in C. This module’s more verbose
BinaryStructure definitions give it a few
advantages over struct, though:
- This module allows users to define their own field types, where
structfield types are basically inextensible.- The object-based nature of
BinaryStructuremakes it easy to add non-structural properties and methods to subclasses, which would require a bit of reimplementing and wrapping from astructtuple.BinaryStructureinstances access fields through named properties instead of indexed tuples.structtuples are fine for structures a few fields long, but when a packed binary structure grows to dozens of fields, navigating itsstructtuple grows perilous.BinaryStructureunpacks fields only when they’re accessed, allowing us to define libraries of structures scores of fields long, understanding that any particular application might access only one or two of them.- Fields in a
BinaryStructurecan overlap eachother, greatly simplifying both C unions and fields with multiple interpretations (integer/string, signed/unsigned).- This module makes sparse structures easy. If you’re reverse-engineering a large binary structure and discover a 4-byte integer in the middle of 68 bytes of unidentified mess, this module makes it easy to add an
IntegerFieldat a known structure offset.structrequires you to split your'68x'into a'32xI32x'(or was that a'30xi34x'? Better recount.)
- This package exports the following names:
BinaryStructure– a base class for binary data structuresByteField– a field that maps fixed-length binary data to Python stringsLengthPrependedStringField– a field that maps variable-length binary strings to Python stringsIntegerField– a field that maps fixed-length binary data to Python numbers
General Usage¶
Suppose we have an 8-byte file whose binary data consists of the bytes 0, 1, 2, 3, etc.:
>>> with open('numbers.bin') as f:
... f.read()
...
'\x00\x01\x02\x03\x04\x05\x06\x07'
Suppose further that these contents represent sensible binary data, laid out such that the first two bytes are a literal string value. Except that sometimes, in the binary format we’re parsing, it might sometimes be necessary to interpret those first two bytes not as a literal string, but instead as a number, encoded as a big-endian unsigned integer. Following that is a variable-length string, encoded with the total string length in the third byte.
This structure might be represented as:
from eulcommon.binfile import *
class MyObject(BinaryStructure):
mybytes = ByteField(0, 2)
myint = IntegerField(0, 2)
mystring = LengthPrepededStringField(2)
Client code might then read data from that file:
>>> f = open('numbers.bin')
>>> obj = MyObject(f)
>>> obj.mybytes
'\x00\x01'
>>> obj.myint
1
>>> obj.mystring
'\x03\x04'
It’s not uncommon for such binary structures to be repeated at different points within a file. Consider if we overlay the same structure on the same file, but starting at byte 1 instead of byte 0:
>>> f = open('numbers.bin')
>>> obj = MyObject(f, offset=1)
>>> obj.mybytes
'\x01\x02'
>>> obj.myint
258
>>> obj.mystring
'\x04\x05\x06'
BinaryStructure¶
-
class
eulcommon.binfile.BinaryStructure(fobj=None, mm=None, offset=0)¶ A superclass for binary data structures superimposed over files.
Typical users will create a subclass containing field objects (e.g.,
ByteField,IntegerField). Each subclass instance is created with a file and with an optional offset into that file. When code accesses fields on the instance, they are calculated from the underlying binary file data.Instead of a file, it is occasionally appropriate to overlay an
mmapstructure (from themmapstandard library). This happens most often when oneBinaryStructureinstance creates another, passingself.mmapto the secondary object’s constructor. In this case, the caller may specify the mm argument instead of an fobj.Parameters: - fobj – a file object or filename to overlay
- mm – a
mmapobject to overlay - offset – the offset into the file where the structured data begins
Field classes¶
-
class
eulcommon.binfile.ByteField(start, end)¶ A field mapping fixed-length binary data to Python strings.
Parameters: - start – The offset into the structure of the beginning of the byte data.
- end – The offset into the structure of the end of the byte data.
This is actually one past the last byte of data, so a four-byte
ByteFieldstarting at index 4 would be defined asByteField(4, 8)and would include bytes 4, 5, 6, and 7 of the binary structure.
Typical users will create a ByteField inside a
BinaryStructuresubclass definition:class MyObject(BinaryStructure): myfield = ByteField(0, 4) # the first 4 bytes of the file
When you instantiate the subclass and access the field, its value will be the literal bytes at that location in the structure:
>>> o = MyObject('file.bin') >>> o.myfield 'ABCD'
-
class
eulcommon.binfile.LengthPrependedStringField(offset)¶ A field mapping variable-length binary strings to Python strings.
This field accesses strings encoded with their length in their first byte and string data following that byte.
Parameters: offset – The offset of the single-byte string length. Typical users will create a
LengthPrependedStringFieldinside aBinaryStructuresubclass definition:class MyObject(BinaryStructure): myfield = LengthPrependedStringField(0)
When you instantiate the subclass and access the field, its length will be read from that location in the structure, and its data will be the bytes immediately following it. So with a file whose first bytes are
'\x04ABCD':>>> o = MyObject('file.bin') >>> o.myfield 'ABCD'
-
class
eulcommon.binfile.IntegerField(start, end)¶ A field mapping fixed-length binary data to Python numbers.
This field accessses arbitrary-length integers encoded as binary data. Currently only big-endian, unsigned integers are supported.
Parameters: - start – The offset into the structure of the beginning of the byte data.
- end – The offset into the structure of the end of the byte data.
This is actually one past the last byte of data, so a four-byte
IntegerFieldstarting at index 4 would be defined asIntegerField(4, 8)and would include bytes 4, 5, 6, and 7 of the binary structure.
Typical users will create an IntegerField inside a
BinaryStructuresubclass definition:class MyObject(BinaryStructure): myfield = IntegerField(3, 6) # integer encoded in bytes 3, 4, 5
When you instantiate the subclass and access the field, its value will be big-endian unsigned integer encoded at that location in the structure. So with a file whose bytes 3, 4, and 5 are
'\x00\x01\x04':>>> o = MyObject('file.bin') >>> o.myfield 260