- Memory efficiency:
- Low memory overhead for the objects
- Generator-based serialization & deserialization
- Serialization schema:
- Define data schema with
class
es andproperty
-like decorators - Use (multi-) inheritance to extend, modify and combine schemas
- Define data schema with
- User interface:
- Detailed error report for data that don't match the defined schema
To minimize possible confusions of the description below, we define the following vocabulary:
- serializable class: a class inheriting the
Serializable
class - serializable object: an instance of a serializable class
- serializable property: a
SerializableAttribute
,SerializableChildObject
orSerializableTextContent
property of a serializable class - key: key in the serialized format, that is, tag in XML, key in JSON/YAML
The key class is the Serializable
class. Inherit this class to create a
schema for serializable objects. Use SerializableAttribute
,
SerializableChildObject
and SerializableTextContent
to add schema on the
corresponding contents to this object.
A simple example:
from serializer import *
class Animal(Serializable):
type = SerializableAttribute(required=True)
description = SerializableTextContent()
class Zoo(Serializable):
animal = SerializableChildObject(Animal, required=True, multiple=True)
Use deserialize_xml
to deserialize a XML file with a serializable class
(deserialize_json
and deserialize_yaml
will be added later). This function
takes three arguments, the first being a file-like object to deserialize from,
the second being the root key, and the third being a factory function which
can create serializable objects (could be the serializable class itself).
from StringIO import StringIO
zoo = deserialize_xml(StringIO('''
<zoo>
<animal type="cat">The cat, often referred to as the domestic cat to
distinguish from other felids and felines, is a small, typically
furry, carnivorous mammal.</animal>
<animal type="dog">The domestic dog is a member of the genus Canis,
which forms part of the wolf-like canids, and is the most widely
abundant terrestrial carnivore.</animal>
</zoo>
'''), 'zoo', Zoo)
After deserialization, the serializable object zoo
has everything stored in
it. All serializable properties can be accessed just like normal properties.
for animal in zoo.animal:
print animal.type
# output
# > cat
# > dog
In addition to deserializing from a file, serializable objects can also be created and modified in the Python program.
cow = Animal(type='cow')
cow.description=("Cattle-colloquially cows-are the most common type of large "
"domesticated ungulates.")
zoo.animal.append(cow)
Pitfalls: uninitialized properties
Finally, use serialize_xml
to serialize to a XML file (serialze_json
and
serialize_yaml
will be added later). This function also takes three
arguments, the first being a file-like object to deserialize to , the second
being the root key, and the third being a serializable object. An optional
argument is pretty
, which enables pretty printing, and is set to False
by
default.
outstream = StringIO()
serialize_xml(outstream, 'zoo', zoo, pretty=True)
print outstream.getvalue()
# output
# > <zoo>
# > <animal type="cat">The cat, often referred to as the domestic cat to
# > distinguish from other felids and felines, is a small, typically
# > furry, carnivorous mammal.</animal>
# > <animal type="dog">The domestic dog is a member of the genus Canis,
# > which forms part of the wolf-like canids, and is the most widely
# > abundant terrestrial carnivore.</animal>
# > <animal type="cow">Cattle-colloquially cows-are the most common type
# > of large domesticated ungulates.</animal>
# > </zoo>
It's possible to inherit serializable classes to extend or combine schemas. Note that it's also possible to override some serializable properties with regular properties or methods, and they simply follow Python's MRO.
class AnimalCategory(Animal):
@property
def description(self): # remove the text content from the schema
raise NotImplementedError
subtype = SerializableChildObject(Animal, required=True, multiple=True)
class BetterZoo(Zoo):
category = SerializableChildObject(AnimalCategory, required=True,
multiple=True)
zoo2 = deserialize_xml(StringIO('<zoo><category type="mammal">'
'<subtype type="human">Humans (taxonomically, Homo sapiens) are the only '
'extant members of the subtribe Hominina.</subtype>'
'<subtype type="whale">Whales are a widely distributed and diverse group '
'of fully aquatic placental marine mammals.</subtype>'
'</category><animal type="fish">Fish are gill-bearing aquatic craniate '
'animals that lack limbs with digits.</animal></zoo>'),
'zoo', BetterZoo)
for category in zoo2.category:
for subtype in category.subtype:
print subtype.type
for animal in zoo2.animal:
print animal.type
# output
# > human
# > whale
# > fish
Sometimes we want the key and property name of a serializable property to be
different. For example, the key might be a reserved keyword in Python. Use
key
in SerializableAttribute
, SerializableChildObject
and
SerializableTextContent
to specify a different key.
SerializableAttribute
, SerializableChildObject
and
SerializableTextContent
can also be used just like Python's built-in
property
decorator. Besides, XML, JSON and YAML only support a limited
number of basic data types, but we often want to have some more specific type
of data. So, SerializableAttribute
and SerializableTextContent
also have
serializer
and serializer
methods which can be used to convert between
custom data types and basic data types.
class DetailedAnimal(Animal):
features = SerializableAttribute(required=False, key='feature')
@features.deserializer
def features(s):
return s.split()
@features.serializer
def features(v):
return ' '.join(v)
animal = deserialize_xml(StringIO('<animal type="bird" feature="fly sing">'
'Birds, also known as Aves, are a group of endothermic vertebrates, '
'characterised by feathers, toothless beaked jaws, the laying of '
'hard-shelled eggs, a high metabolic rate, a four-chambered heart, and a '
'strong yet lightweight skeleton.</animal>'), 'animal', DetailedAnimal)
for feature in animal.features:
print feature
# output
# > fly
# > sing
It's also possible to ask the serializer to ignore a serializable property.
Return IGNORE
in the serializer
method to do so.
A SerializableAttribute
, SerializableChildObject
, and
SerializableTextContent
property are uninitialized when a serializable
object is created, unless:
- a
default
value is set when defining the property - the property is set with argument passed to
__init__
- the property is deserialized from a file
If a property is uninitialized, accessing it will raise a
SerializableAttributeError
. To avoid complex logic checking if a
serializable property is initialized, always define default
value, or set it
in the inherited __init__
.