One of Elektra's core goals is low memory usage. Currently, there are many places within Elektra where keys and keysets are duplicated and copied around. Most of those copied keys are never modified, but are required to be detached from the lifetime of the original instance. We want to introduce an in-memory copy-on-write mechanism to lower our memory usage.
In the near future, Elektra will also gain facilities for change tracking and session recording, both of which will potentially again duplicate keys. There are also aspirations to create a new, simple internal cache that would also benefit from a copy-on-write mechanism.
- The lifetime of a
Key
and aKeySet
must be unaffected by copy-on-write.
There is already some kind of copy-on-write semantics within libelektra-core
to support the mmapstorage
plugin.
We can build on this and add a more generic copy-on-write to it.
Key * key = keyNew ("dir:/something", KEY_VALUE, "my value", KEY_END);
keyCopy (key_dup, key, ELEKTRA_CP_COW);
assert (keyString(key) == keyString(key_dup));
keySetString (key_dup, "other value"); // COW done here
assert (keyString(key) != keyString(key_dup));
assert (keySetName (key_dup, "dir:/valid") == -1); // must fail, as we have a COW key
assert (keyName(key) == keyName(key_dup)); // stays always valid
This is already implemented for the MMAP cache, so the implementation should be straightforward: Do the same COW duplications as done for MMAP but with a different flag.
For the metadata, however, also COW KeySets might be needed (at least with the current API). Example:
keyCopy (cow, key, ELEKTRA_CP_COW);
KeySet * cowMeta = keyMeta (cow);
ksAppendKey (cowMeta, keyNew ("meta:/whatever", KEY_VALUE, "abc", KEY_END));
ksRemoveByName (cowMeta, "meta:/type");
Pros:
- Elektra doesn't require MMAP
Cons:
-
Lifetime of a copied COW key MUST be less than the key it was copied from. We can not track how many keys point to the same data and name this way, so we can only free data and name if the key does not have the COW flag. If the original key gets deleted, using a COW key that points to the same data and name will lead to corrupt data. The same is true for updating values of the original key.
This is only problematic if we want to use COW for keys outside of
KDB
. If it is only for use withinKDB
, especially for usage as internal cache and in change tracking, we could always guarantee that the original keys are going to last as long as theKDB
instance. However, we need to document for the users of Elektra that keys returned fromkdbGet
are only valid untilkdbClose
. If they want to continue using them afterwards, they'd have to deep copy them.Triggering the delete problem:
Key * originalKey; Key * copiedKey; keyCopy (copiedKey, originalKey, ELEKTRA_CP_COW); assert (keyString (copiedKey) == keyString (originalKey)); assert (keyName (copiedKey) == keyName (originalKey)); keyDel (originalKey); keyString (copiedKey); // Error! Original value has been deleted. Pointer to data in copiedKey points to freed memory keyName (copiedKey); // Error! Original name has been deleted.
Triggering the update problem:
Key * originalKey; Key * copiedKey; keyCopy (copiedKey, originalKey, ELEKTRA_CP_COW); assert (keyString(copiedKey) == keyString(originalKey)); keySetString (originalKey, "new value!"); keyString(copiedKey); // Error! Original value has been deleted. Pointer to data in copiedKey points to potentially freed memory
The same problems in principle exist for
mmapstorage
wherekdbSet
frees (munmap
) the keyset. We can still let users access the flagELEKTRA_CP_COW
, we just need to clearly document what is forbidden. Maybe set theKEY_FLAG_RO_VALUE
on the original key, so that the API itself detects the error. There is, however, no flag forkeyDel
that we could set.
The struct _Key
will be extended with two more pointers, if we want to eliminate the lifetime problem:
struct _Key * origData
: points to the key containing the referenced datastruct _Key * origName
: points to the key containing the referenced name
Those two pointers are needed for memory management.
Each referenced key will also have its reference counter increased.
This way, an original key can be keyDel()
d without impacting the copied keys.
Three new key flags will be added:
KEY_FLAG_COW_VALUE
: the value points to a value of another keyKEY_FLAG_COW_NAME
: the name points to the name of another keyKEY_FLAG_COW_META
: metakeys are copy-on-write
A new copy flag will be added:
KEY_CP_COW
: instructskeyCopy
to copy whatever it should copy as copy-on-write. This will NOT be part ofKEY_CP_ALL
. We don't want developers outside of Elektra to accidentally use this.
If keyCopy()
is instructed to do a copy-on-write copy:
-
dest->data.v
anddest->data.c
point to the exact same location as in the source.dest->dataSize
is set to the same value assource->dataSize
.KEY_FLAG_COW_VALUE
is set ondest->flags
.KEY_FLAG_RO_VALUE
is set onsource->flags
.dest->originalData
is set tosource
.source->refs
is incremented. -
dest->key
points tosource->key
.dest->keySize
is set to the same value assource->keySize
.dest->ukey
points tosource->ukey
.dest->keyUSize
is set to the same value assource->keyUSize
.KEY_FLAG_COW_NAME
is set ondest->flags
.KEY_FLAG_RO_NAME
is set onsource->flags
.dest->originalName
is set tosource
.source->refs
is incremented. -
dest->meta
points to a new keyset. The keys indest->meta
are also copied withKEY_CP_COW
, i.e. they are also copy-on-write keys.KEY_FLAG_COW_META
is set ondest->flags
.KEY_FLAG_RO_META
is set onsource->flags
.
The source key will remain as a read-only key. This constraint is needed, because the source key is the only key we can free the resources on. If the data or the name would change in the source key, all COW-copied keys would suddenly have another value. For the same reason, the source key will need to live longer than all COW-copied keys from it.
A keyCopy()
without KEY_CP_COW
from an COW key will create a deep copy of the key.
These keys are "normal" non-COW keys and can live on their own.
Every key*()
method that modifies data on a COW-copied key will need to allocate new memory for this data and remove the KEY_FLAG_COW_DATA
flag.
Every key*()
method that modifies the name of a COW-copied key will need to allocate new memory for this name and remove the KEY_FLAG_COW_NAME
flag.
Every key*()
method that modifies metadata needs to make sure that the same happens for metakeys.
Keysets are not copy-on-write.
A ksDeepDup()
of a keyset with COW keys will create a keyset with deep-copied non-COW keys.
Internally we may need a ksCowDup()
function to create a keyset with copy-on-write keys from another keyset.
Whether this function will be part of the public API is a point for discussion.
Make Elektra's Key
and KeySet
data structures copy-on-write.
This requires some major refactoring of code within libelektra-core
.
Code that does only interact with the data structures via the public libelektra-core
API should not notice any differences.
The mmapstorage
plugin needs to be updated.
Unlike "mmapstorage-like COW implementation" keyDup, keyCopy, ksCopy and ksDup will always use COW.
ksCopy
and ksDup
is needed for (de)duplication of metadata.
Furthermore, the API has better usability if Key and KeySet behave the same, especially for bindings where duplication might happen implicitly.
For the Key
, we need to extract everything for the data and name into their own structs.
This is done for memory-management reasons, as we need to track how many keys point to the same data and/or name.
struct _KeyData {
union {
char * c;
void * v;
} data;
size_t dataSize;
uint16_t refs;
};
struct _KeyName {
char * key;
size_t keySize;
char * ukey;
size_t keyUSize;
uint16_t refs;
};
struct _Key {
struct _KeyData * keyData;
struct _KeyName * keyName;
KeySet * meta;
keyflag_t flags;
uint16_t refs;
};
@mpranj's thoughts regarding moving name and data to separate structures:
If they [key name and data] are a separate entity,
mmapstorage
will need a flag once again for each of those. This is used to mark whether the data is in an mmap region or not. (or we find some bit somewhere that we can steal for this purpose)Adding more indirections is probably not going to help performance. (I understand that we save memory here)
For KeySet
, we need to split out everything to do with the stored keys into a separate datastructure.
This includes the array itself, the sizes and the hashmap.
Why don't we just add the number of references to the original KeySet
?
- If we delete a copied KeySet, we don't know which KeySet is the original, so we couldn't decrement the counter. This could be dealt with storing a pointer to the original KeySet.
- If the original KeySet is deleted, we don't know which other KeySets point at the data, so updating their count would not work
- In similar fashion, if you update the original KeySet, the copied KeySets will also contain the new data (if the memory address does not change). This is unexpected behavior.
struct _KeySetData {
struct _Key ** array;
size_t size; /**< Number of keys contained in the KeySet */
size_t alloc; /**< Allocated size of array */
Opmphm * opmphm;
OpmphmPredictor * opmphmPredictor;
uint16_t refs; /**< Reference counter */
};
struct _KeySet {
struct _KeySetData * data;
ksflag_t flags;
uint16_t refs; /**< Reference counter */
};
We need reference counting for the internal COW datastructures.
We do it the same way reference counting currently works for Key
and KeySet
.
One tweak though is that the refcount should never be 0, as this does not make sense for internal datastructures.
This means we always increment the refcount after creation and always decrement before deletion, so that the refcount is never zero. An example implementation is shown below:
static void keySetValue(Key * key, void * value, size_t size) {
// [...] removal of current value from key
struct _KeyData data = keyDataNew (value, size);
keyDataIncRef (data);
key->data = data;
}
static void keyDel(Key * key) {
keyDataDecRef (key->data);
keyDataDel (key->data);
// [...] other cleanup
}
Instead of using different structs for _KeyData
, and _KeyName
use a more generic struct for reference counting.
This would avoid some duplication on the reference counting code for the key.
Keysets will still have their own data struct, as it contains more than just a pointer and a size.
typedef struct {
void * data;
size_t size;
uint16_t refs;
} RcBuffer;
struct _Key {
RcBuffer * uname;
RcBuffer * ename; // will be removed soon
RcBuffer * value;
KeySet * meta;
keyflag_t flags;
uint16_t refs;
};
In general, it should be possible to always do copy-on-write. From a users perspective, copy-on-write copies of a key (and a keyset) should behave the same. There is, however, one edge case: the user modifying the value of a key directly. This is shown in the following example:
Key * key;
struct foo myFoo = {
.x = 0
};
keySetBinary (key, &myFoo, sizeof(myFoo));
Key * dup = keyDup (key);
((struct foo *)keyValue (key))->x = 1;
// with COW
assert (((struct foo *)keyValue (dup))->x == 1);
// without COW
assert (((struct foo *)keyValue (dup))->x == 0);
This edge case can be accounted for by providing a private function keyDetach
, that forces that the key has its very own copy of the data.
((struct foo *)keyValue (keyDetach(key)))->x = 1;
// with COW
assert (((struct foo *)keyValue (dup))->x == 0);
// without COW
assert (((struct foo *)keyValue (dup))->x == 0);
If we do change the internal data structures it makes much more sense to fix the cache and mmapstorage afterwards (or in tandem). The most important constraint for mmap is that any structure (or bytes) that is an allocation unit (e.g. we malloc() the bytes needed for KeySet struct, so this is an unit) needs to have a flag to determine whether those bytes are actually malloc()ed or they are mmap()ed. Thus all the newly added structures as proposed will need some kind of an mmap flag.
mmapstorage
only calls munmap
in some error cases, so basically munmap
is almost never done and the keyset is never invalidated.
During kdbSet
the storage plugins always write to a temp file, due to how the resolver works.
We also don't need to mmap the temp file here: when doing kdbSet
we already have the KeySet
at hand, mmap-ing it is not needed at all, because we have the data.
We just want to update the cache file.
The mmap
/munmap
in kdbSet are just so we can write the KeySet to a file in our format.
(mmap()
is just simpler, but we could also malloc()
a region and then fwrite()
the stuff)
Therefore the only case where we return a mmap()
ed KeySet should be in kdbGet
.
When the mmapstorage
was designed/implemented, not all structures had refcounters, so there was no way to know when a munmap
is safe.
This was simply out of scope at that point in time.
If refcounting is now implemented for all structures, we might be able to properly munmap
in future.
Two ideas to deal with this in conjunction with our reference counting implementation:
If we have free
function-pointer along side the refcount, mmapstorage
(and also other plugins with different allocators) could set it to their own implementation.
To mimic the current behavior of mmapstorage
this would point to a no-op function.
However, we could also improve things and keep track of when all data has been freed and only then call munmap
.
Another simpler way to avoid the flag, which doesn't really allow for further improvements, would be using the refcount.
mmapstorage
could set the refcount to a value that is otherwise illegal.
This would allow us to detect the keys.
Depending on the refcount implementation good values would probably be 0 or UINT16_MAX.
The special value would have to ignored by all refcounting functions (inc, dec, del) and turn the functions into no-ops.
-
This approach requires more allocations than previously. We have not fully benchmarked whether this is a big issue. One optimization could be an expanding "pool" of
_KeySetData
,_KeyData
and_KeyName
. We could then allocate multiple of them at the same time, and borrow and give back instance from and to the pool. -
Embed the
KeySet * meta
directly instruct _Key
. This may help with performance in cases we need metadata. It will, however, increase memory usage. This should only be considered after some benchmarking shows this is a real issue.
The following calculations are based on the AMD64 platform. All results are in bytes unless stated otherwise.
Example key: user:/hosts/ipv6/example.com = 2606:2800:220:1:248:1893:25c8:1946
We want to measure the following properties for the key:
- Empty Key: size of a simple malloc of the key struct
- Empty Key (with name): size of simple malloc of all structs, so that the key has a name, but without including the size of the name
- Empty Key (with name + data): size of a simple malloc of all structs, so that the key has a name and data, but without including the size of the name or data
- Single Example Key: a single instance of the key defined above
- Example Key + 1 Duplicate: two instances of the key defined above, one of them is a duplication of the first
- Example Key + 2 Duplicates: three instances of the key defined above, two of them are duplications of the first
Approach | Empty Key | Empty Key (with name) | Empty Key (with name + data) | Single Example Key | Example Key + 1 Duplicate | Example Key + 2 Duplicates |
---|---|---|---|---|---|---|
Current Implementation | 64 | 64 | 64 | 153 | 306 | 459 |
mmapstorage-like COW implementation (without additional pointers) | 64 | 64 | 64 | 153 | 217 | 281 |
mmapstorage-like COW implementation (with additional pointers) | 80 | 80 | 80 | 169 | 249 | 329 |
Full-blown COW implementation | 32 | 72 | 96 | 185 | 217 | 249 |
Full-blown COW implementation - Variant 1 (RcBuffer) | 40 | 88 | 112 | 201 | 241 | 281 |
We want to measure the following properties for the keyset:
- Empty KeySet: size of a simple malloc of the keyset struct
- Empty KeySet (with data): size of a simple malloc of all structs
- Example KeySet: size of a keyset with 15 keys + NULL byte
- Example KeySet + 1 Duplicate: two instance of Example KeySet, one of them is a duplication
- Example KeySet + 2 Duplicates: three instances of Example KeySet, two of them are duplications
Approach | Empty KeySet | Empty KeySet (with data) | Example KeySet | Example KeySet + 1 Duplicate | Example KeySet + 2 Duplicates |
---|---|---|---|---|---|
Current Implementation | 64 | 64 | 192 | 384 | 576 |
mmapstorage-like COW implementation (without additional pointers) | 64 | 64 | 192 | 384 | 576 |
mmapstorage-like COW implementation (with additional pointers) | 64 | 64 | 192 | 384 | 576 |
Full-blown COW implementation | 16 | 64 | 192 | 208 | 224 |
Raw data size:
- keyname :
28 + 1
=29
- unescaped keyname (measured):
25
- data:
34 + 1
=35
Current Implementation:
-
Empty Key [measured via
sizeof
]:64
-
Empty Key (with name):
64
-
Empty Key (with name + data):
64
-
Single Example Key =
Empty Key + keyname + unescaped keyname + data
=64 + 29 + 25 + 35
=153
-
Single Example Key + 1 Duplicate =
Single Example Key * 2
=153 * 2
=306
-
Single Example Key + 2 Duplicates =
Single Example Key * 3
=153 * 3
=459
-
Empty KeySet [measured via
sizeof
]:64
-
Empty KeySet (with data):
64
-
Example KeySet:
Empty KeySet (with data) + 16 * pointer to keys
=64 + 16 * 8
=192
-
Example KeySet + 1 Duplicate:
Example KeySet * 2
=192 * 2
=384
-
Example KeySet + 2 Duplicates:
Example KeySet * 3
=192 * 3
=576
mmapstorage-like COW implementation (without additional pointers):
-
Empty Key [measured via
sizeof
]:64
-
Empty Key (with name):
64
-
Empty Key (with name + data):
64
-
Single Example Key =
Empty Key + keyname + unescaped keyname + data
=64 + 29 + 25 + 35
=153
-
Single Example Key + 1 Duplicate =
Single Example Key + Empty Key
=153 + 64
=217
-
Single Example Key + 2 Duplicates =
Single Example Key + Empty Key * 2
=153 + 64 * 2
=281
-
KeySets are not COW in this approach --> same as current implementation
mmapstorage-like COW implementation (with additional pointers):
-
Empty KeySet [measured via
sizeof
]:64
-
Empty Key [measured via
sizeof
]:80
-
Empty Key (with name):
80
-
Empty Key (with name + data):
80
-
Single Example Key =
Empty Key + keyname + unescaped keyname + data
=80 + 29 + 25 + 35
=169
-
Single Example Key + 1 Duplicate =
Single Example Key + Empty Key
=169 + 80
=249
-
Single Example Key + 2 Duplicates =
Single Example Key + Empty Key * 2
=169 + 80 * 2
=329
-
KeySets are not COW in this approach --> same as current implementation
Full-blown COW implementation:
-
Empty Key [measured via
sizeof
]:32
-
Empty Key (with name) [measured via
sizeof
]:Empty Key + sizeof(KeyName)
=32 + 40
=72
-
Empty Key (with name + data) [measured via
sizeof
]:Empty Key + sizeof(KeyName) + sizeof(KeyData)
=32 + 40 + 24
=96
-
Single Example Key =
Empty Key (with name + data) + keyname + unescaped keyname + data
=96 + 29 + 25 + 35
=185
-
Single Example Key + 1 Duplicate =
Single Example Key + Empty Key
=185 + 32
=217
-
Single Example Key + 2 Duplicates =
Single Example Key + Empty Key * 2
=185 + 32 * 2
=249
-
Empty KeySet [measured via
sizeof
]:16
-
Empty KeySet (with data):
Empty KeySet + sizeof(KeySetData)
=16 + 48
=64
-
Example KeySet:
Empty KeySet (with data) + 16 * pointer to keys
=64 + 16 * 8
=192
-
Example KeySet + 1 Duplicate:
Example KeySet + Empty KeySet
=192 + 16
=208
-
Example KeySet + 2 Duplicates:
Example KeySet + Empty KeySet * 2
=192 + 16 * 2
=224
Full-blown COW implementation - Variant 1 (RcBuffer):
-
Empty Key [measured via
sizeof
]:40
-
Empty Key (with name) [measured via
sizeof
]:Empty Key + sizeof(RcBuffer)*2
=40 + 24*2
=88
-
Empty Key (with name + data) [measured via
sizeof
]:Empty Key + sizeof(RcBuffer)*3
=40 + 24*3
=112
-
Single Example Key =
Empty Key (with name + data) + keyname + unescaped keyname + data
=112 + 29 + 25 + 35
=201
-
Single Example Key + 1 Duplicate =
Single Example Key + Empty Key
=201 + 40
=241
-
Single Example Key + 2 Duplicates =
Single Example Key + Empty Key * 2
=201 + 40 * 2
=281
-
Empty KeySet [measured via
sizeof
]:16
-
Empty KeySet (with data):
Empty KeySet + sizeof(KeySetData)
=16 + 48
=64
-
Example KeySet:
Empty KeySet (with data) + 16 * pointer to keys
=64 + 16 * 8
=192
-
Example KeySet + 1 Duplicate:
Example KeySet + Empty KeySet
=192 + 16
=208
-
Example KeySet + 2 Duplicates:
Example KeySet + Empty KeySet * 2
=192 + 16 * 2
=224
For allocations want to measure the following properties:
- Empty key: how many objects to allocate for an empty key
- Empty Key (with name): how many objects to allocate for an empty key + name
- Empty Key (with name + data): how many objects to allocate for an empty key + name + data
- Duplication: how many objects to allocate for a duplication
- Key + 1 Duplication: how many objects to allocate for a full key + 1 duplication
- Key + 2 Duplications: how many objects to allocate for a full key + 2 duplications
Approach | Empty Key | Empty Key (with name) | Empty Key (with name + data) | Duplication | Key + 1 Duplication | Key + 2 Duplications |
---|---|---|---|---|---|---|
Current Implementation | 1 | 1 | 1 | 1 | 2 | 3 |
mmapstorage-like COW implementation (without additional pointers) | 1 | 1 | 1 | 1 | 2 | 3 |
mmapstorage-like COW implementation (with additional pointers) | 1 | 1 | 1 | 1 | 2 | 3 |
Full-blown COW implementation | 1 | 2 | 3 | 1 | 4 | 5 |
Implement the full-blown COW approach.
- It is the most versatile option.
- No restrictions on the lifetime of
Key
andKeySet
objects. - Completely transparent to developers using Elektra's public API.
- The
mmapstorage
plugins needs to be updated