-
-
Notifications
You must be signed in to change notification settings - Fork 16
Data Center Format
This page describes the packed .dat
data center format used by TERA, and the unpacked .dec
format used by Alkahest and other third-party projects.
- C/C++-like primitive types,
enum
s,struct
s, andunion
s will be used. -
bool
is equivalent touint8_t
but only allows the valuestrue
(1
) andfalse
(0
). - Integers (
uint8_t
,int8_t
,uint16_t
,int16_t
, etc) are little endian. -
float
anddouble
are IEEE 754binary32
andbinary64
, respectively. - Characters (e.g.
char16_t
) are little endian. -
string
s are written asNUL
-terminated UTF-16LE. - Fields are laid out in the declared order with no implied padding anywhere.
Before reading, please understand that this page omits two key details:
- The string normalization function (
string normalize(string value)
). - The string hash function (
uint32_t hash(string value)
).
These are not necessary for interpreting the format; they are only relevant if one wishes to construct or modify a data center file for consumption by the game. As doing so would (unfortunately) enable a staggering amount of exploits, this page intentionally omits these. Of course, they can be found by reverse engineering TERA.exe
, but this is beyond the average malicious actor's abilities, hence this decision.
The packed data center files shipped with the game (.dat
) are encrypted. Encryption is done with the Rijndael algorithm in CFB mode, using zero padding, and with block/key sizes both set to 128 bits. Note that the final block is often smaller than the block size and not correctly padded with zeros as it should be.
The encryption key and initialization vector can be found in TERA.exe
. They are usually freshly generated for each game build.
The overall structure can be described like this:
struct DataCenterFile
{
DataCenterPackedHeader packed_header;
DataCenterHeader header;
DataCenterSimpleRegion<DataCenterExtension, false> extensions;
DataCenterSegmentedRegion<DataCenterAttribute> attributes;
DataCenterSegmentedRegion<DataCenterElement> elements;
DataCenterStringTable<1024> values;
DataCenterStringTable<512> names;
DataCenterFooter footer;
};
After decryption, there is a small header of the form:
struct DataCenterPackedHeader
{
uint32_t decompressed_size;
uint16_t zlib_header;
};
All data immediately following this header is compressed with the Deflate algorithm.
decompressed_size
is the size of the data center file once inflated.
zlib_header
is a zlib (§2.2) header. For packed data center files, it is always of the form 0x9c78
, with all the facts that this implies about the compression method.
Unpacked data center files (.dec
) used by Alkahest and other third-party projects simply contain the decrypted and decompressed data; packed_header
is stripped.
A data center file starts with this header:
struct DataCenterHeader
{
uint32_t file_version;
int32_t unknown1;
int16_t unknown2;
int16_t unknown3;
uint32_t client_version;
int32_t unknown4;
int32_t unknown5;
int32_t unknown6;
int32_t unknown7;
};
file_version
is currently 6
.
unknown1
, unknown2
, unknown4
, unknown5
, unknown6
, and unknown7
are all currently 0
. Some of them are actually part of more complex tree structures, but released data centers always have them zeroed.
unknown3
is always -16400
.
client_version
is usually (but not always) the value sent in C_CHECK_VERSION
by the client.
A data center file ends with this footer:
struct DataCenterFooter
{
int32_t unknown1;
};
unknown1
is currently 0
.
Most of the data in data center files is arranged into regions, which may be segmented. The region structures used throughout the format are described here:
struct DataCenterSimpleRegion<typename T, bool off_by_one>
{
uint32_t count;
T elements[off_by_one ? count - 1 : count];
};
struct DataCenterSegmentedSimpleRegion<typename T, uint32 count>
{
DataCenterSimpleRegion<T, false> segments[count];
};
struct DataCenterRegion<typename T>
{
uint32_t full_count;
uint32_t used_count;
T elements[full_count];
};
struct DataCenterSegmentedRegion<typename T>
{
uint32_t count;
DataCenterRegion<T> segments[count];
};
A DataCenterSegmentedSimpleRegion
is mostly the same as a DataCenterSegmentedRegion
, with the main difference being that it has a static amount of segments.
Addresses are frequently used to refer to elements within both types of segmented regions:
struct DataCenterAddress
{
uint16_t segment_index;
uint16_t element_index;
};
Here, segment_index
is a zero-based index into the segments
array of the segmented region, while element_index
is a zero-based index into the elements
array of the segment. Addresses are typically written as segment_index:element_index
in textual form. The special address 65535:65535
is used to indicate the absence of an actual address, and thus will never be used when a real lookup is intended.
All strings, whether they are names or values, are arranged into string tables, which are effectively used as hash tables by the game. A string table has the form:
struct DataCenterStringTable<uint32 count>
{
DataCenterSegmentedRegion<char16_t> data;
DataCenterSegmentedSimpleRegion<DataCenterString, count> table;
DataCenterSimpleRegion<DataCenterAddress, true> addresses;
};
A string entry in the table
region looks like this:
struct DataCenterString
{
uint32_t hash;
uint32_t length;
uint32_t index;
DataCenterAddress address;
};
hash
is a hash code for the string, given by the expression hash(value)
where value
is the string value. In a typical data center file, there is only a very tiny amount of hash collisions.
length
is the length of the string in terms of UTF-16LE code units, including the NUL
character.
index
is a one-based index into the string table's addresses
region. The address at this index must match the address
field exactly.
address
is an address into the string table's data
region. This address points to the actual string data. All strings are NUL
-terminated. The string read from this address must have the same length as the length
field.
It is important that a string entry is placed in the correct table
segment based on its hash
field. The segment index is given by the expression (hash ^ hash >> 16) % count
where count
is the static size of the table
region. Further, entries in a segment must be sorted by their hash code in ascending order.
The actual interesting data in a data center file is stored as a sort of tree structure, similar to XML. Each element is of the form:
struct DataCenterElement
{
uint16_t name_index;
uint8_t extension_flags : 4;
uint16_t extension_index : 12;
uint16_t attribute_count;
uint16_t child_count;
DataCenterAddress attribute_address;
uint32_t padding_1;
DataCenterAddress child_address;
uint32_t padding_2;
};
name_index
is a one-based index into the addresses
region of the names
table. If this value is 0
, it indicates that this element has no name or associated data of any kind, and should be disregarded; in this case, all other integer fields of the element will be 0
.
extension_index
is a zero-based index into the extensions
region.
extension_flags
is currently 0
.
attribute_count
and child_count
indicate how many attributes and child elements should be read for this element, respectively. If a count field is 0
, then the corresponding address field is 65535:65535
.
attribute_address
is an address into the attributes
region. attribute_count
attributes should be read at this address.
child_address
is an address into the elements
region. child_count
elements should be read at this address.
padding_1
and padding_2
are meaningless.
The root element of the data tree is always located at the address 0:0
and always has the name __root__
.
struct DataCenterExtension
{
int32_t unknown1;
int32_t unknown2;
};
The exact meaning of this region is currently unknown, but it appears to be used to impart some special meaning or functionality on well-known elements.
Each element in the data tree has zero or more attributes, which are name/value pairs. They are of the form:
struct DataCenterAttribute
{
uint16_t name_index;
DataCenterTypeCode type_code : 2;
union
{
bool is_boolean;
uint16_t value_hash : 14;
} extended_code;
union
{
int32_t i;
bool b;
float f;
DataCenterAddress a;
} value;
uint32_t padding;
};
name_index
is a one-based index into the addresses
region of the names
table.
type_code
specifies the kind of value the attribute holds. Valid values are as follows:
enum DataCenterTypeCode : uint8_t
{
DATA_CENTER_TYPE_CODE_INT = 1,
DATA_CENTER_TYPE_CODE_FLOAT = 2,
DATA_CENTER_TYPE_CODE_STRING = 3,
};
extended_code
specifies extra information based on the value of type_code
:
- If
type_code
isDATA_CENTER_TYPE_CODE_INT
, then theis_boolean
field indicates whether the attribute's value is constrained to1
(true
) and0
(false
). - If
type_code
isDATA_CENTER_TYPE_CODE_FLOAT
, thenextended_code
is zeroed. - If
type_code
isDATA_CENTER_TYPE_CODE_STRING
, thenvalue_hash
is given by the expressionhash(normalize(value)) & 0xFFFF
wherevalue
is the string value.
value
holds the attribute value and is interpreted according to type_code
and extended_code
. In the case of DATA_CENTER_TYPE_CODE_STRING
, the a
field holds an address into the data
region of the values
table. For other type codes, the value is written directly and is accessed through the i
, b
, or f
fields.
padding
is meaningless.