eulcore.binfile
– Map binary data to Python objects¶
Map binary data on-disk to read-only Python objects.
This module facilitates exposing stored binary data using common Pythonic
idioms. Fields in relocatable binary objects map to Python attributes using
a priori knowledge about how the binary structure is organized. This is akin
to the standard struct
module, but with some slightly different use
cases. struct
, for instance, offers a more terse syntax, which is
handy for certain simple structures. struct
is also a bit faster
since it’s implemented in C. This module’s more verbose
BinaryStructure
definitions give it a few
advantages over struct
, though:
- This module allows users to define their own field types, where
struct
field types are basically inextensible.- The object-based nature of
BinaryStructure
makes it easy to add non-structural properties and methods to subclasses, which would require a bit of reimplementing and wrapping from astruct
tuple.BinaryStructure
instances access fields through named properties instead of indexed tuples.struct
tuples are fine for structures a few fields long, but when a packed binary structure grows to dozens of fields, navigating itsstruct
tuple grows perilous.BinaryStructure
unpacks fields only when they’re accessed, allowing us to define libraries of structures scores of fields long, understanding that any particular application might access only one or two of them.- Fields in a
BinaryStructure
can overlap eachother, greatly simplifying both C unions and fields with multiple interpretations (integer/string, signed/unsigned).- This module makes sparse structures easy. If you’re reverse-engineering a large binary structure and discover a 4-byte integer in the middle of 68 bytes of unidentified mess, this module makes it easy to add an
IntegerField
at a known structure offset.struct
requires you to split your'68x'
into a'32xI32x'
(or was that a'30xi34x'
? Better recount.)
- This package exports the following names:
BinaryStructure
– a base class for binary data structuresByteField
– a field that maps fixed-length binary data to Python stringsLengthPrependedStringField
– a field that maps variable-length binary strings to Python stringsIntegerField
– a field that maps fixed-length binary data to Python numbers
General Usage¶
Suppose we have an 8-byte file whose binary data consists of the bytes 0, 1, 2, 3, etc.:
>>> with open('numbers.bin') as f:
... f.read()
...
'\x00\x01\x02\x03\x04\x05\x06\x07'
Suppose further that these contents represent sensible binary data, laid out such that the first two bytes are a literal string value. Except that sometimes, in the binary format we’re parsing, it might sometimes be necessary to interpret those first two bytes not as a literal string, but instead as a number, encoded as a big-endian unsigned integer. Following that is a variable-length string, encoded with the total string length in the third byte.
This structure might be represented as:
from eulcommon.binfile import *
class MyObject(BinaryStructure):
mybytes = ByteField(0, 2)
myint = IntegerField(0, 2)
mystring = LengthPrepededStringField(2)
Client code might then read data from that file:
>>> f = open('numbers.bin')
>>> obj = MyObject(f)
>>> obj.mybytes
'\x00\x01'
>>> obj.myint
1
>>> obj.mystring
'\x03\x04'
It’s not uncommon for such binary structures to be repeated at different points within a file. Consider if we overlay the same structure on the same file, but starting at byte 1 instead of byte 0:
>>> f = open('numbers.bin')
>>> obj = MyObject(f, offset=1)
>>> obj.mybytes
'\x01\x02'
>>> obj.myint
258
>>> obj.mystring
'\x04\x05\x06'
BinaryStructure
¶
-
class
eulcommon.binfile.
BinaryStructure
(fobj=None, mm=None, offset=0)¶ A superclass for binary data structures superimposed over files.
Typical users will create a subclass containing field objects (e.g.,
ByteField
,IntegerField
). Each subclass instance is created with a file and with an optional offset into that file. When code accesses fields on the instance, they are calculated from the underlying binary file data.Instead of a file, it is occasionally appropriate to overlay an
mmap
structure (from themmap
standard library). This happens most often when oneBinaryStructure
instance creates another, passingself.mmap
to the secondary object’s constructor. In this case, the caller may specify the mm argument instead of an fobj.Parameters: - fobj – a file object or filename to overlay
- mm – a
mmap
object to overlay - offset – the offset into the file where the structured data begins
Field classes¶
-
class
eulcommon.binfile.
ByteField
(start, end)¶ A field mapping fixed-length binary data to Python strings.
Parameters: - start – The offset into the structure of the beginning of the byte data.
- end – The offset into the structure of the end of the byte data.
This is actually one past the last byte of data, so a four-byte
ByteField
starting at index 4 would be defined asByteField(4, 8)
and would include bytes 4, 5, 6, and 7 of the binary structure.
Typical users will create a ByteField inside a
BinaryStructure
subclass definition:class MyObject(BinaryStructure): myfield = ByteField(0, 4) # the first 4 bytes of the file
When you instantiate the subclass and access the field, its value will be the literal bytes at that location in the structure:
>>> o = MyObject('file.bin') >>> o.myfield 'ABCD'
-
class
eulcommon.binfile.
LengthPrependedStringField
(offset)¶ A field mapping variable-length binary strings to Python strings.
This field accesses strings encoded with their length in their first byte and string data following that byte.
Parameters: offset – The offset of the single-byte string length. Typical users will create a
LengthPrependedStringField
inside aBinaryStructure
subclass definition:class MyObject(BinaryStructure): myfield = LengthPrependedStringField(0)
When you instantiate the subclass and access the field, its length will be read from that location in the structure, and its data will be the bytes immediately following it. So with a file whose first bytes are
'\x04ABCD'
:>>> o = MyObject('file.bin') >>> o.myfield 'ABCD'
-
class
eulcommon.binfile.
IntegerField
(start, end)¶ A field mapping fixed-length binary data to Python numbers.
This field accessses arbitrary-length integers encoded as binary data. Currently only big-endian, unsigned integers are supported.
Parameters: - start – The offset into the structure of the beginning of the byte data.
- end – The offset into the structure of the end of the byte data.
This is actually one past the last byte of data, so a four-byte
IntegerField
starting at index 4 would be defined asIntegerField(4, 8)
and would include bytes 4, 5, 6, and 7 of the binary structure.
Typical users will create an IntegerField inside a
BinaryStructure
subclass definition:class MyObject(BinaryStructure): myfield = IntegerField(3, 6) # integer encoded in bytes 3, 4, 5
When you instantiate the subclass and access the field, its value will be big-endian unsigned integer encoded at that location in the structure. So with a file whose bytes 3, 4, and 5 are
'\x00\x01\x04'
:>>> o = MyObject('file.bin') >>> o.myfield 260