Skip to content
This repository has been archived by the owner on Apr 20, 2022. It is now read-only.

Data Center Format

Alex Rønne Petersen edited this page Jul 28, 2020 · 37 revisions

This page describes the packed .dat data center format used by TERA, and the unpacked .dec format used by Alkahest and other third-party projects.

  • C/C++-like primitive types, enums, structs, and unions will be used.
  • bool is equivalent to uint8_t but only allows the values true (1) and false (0).
  • Integers (uint8_t, int8_t, uint16_t, int16_t, etc) are little endian.
  • float and double are IEEE 754 binary32 and binary64, respectively.
  • Characters (e.g. char16_t) are little endian.
  • strings are written as NUL-terminated UTF-16LE.
  • Fields are laid out in the declared order with no implied padding anywhere.

Omissions

Before reading, please understand that this page omits two key details:

  • The string normalization function (string normalize(string value)).
  • The string hash function (uint32_t hash(string value)).

These are not necessary for interpreting the format; they are only relevant if one wishes to construct or modify a data center file for consumption by the game. As doing so would (unfortunately) enable a staggering amount of exploits, this page intentionally omits these. Of course, they can be found by reverse engineering TERA.exe, but this is beyond the average malicious actor's abilities, hence this decision.

Encryption

The packed data center files shipped with the game (.dat) are encrypted. Encryption is done with the Rijndael algorithm in CFB mode, using zero padding, and with block/key sizes both set to 128 bits. Note that the final block is often smaller than the block size and not correctly padded with zeros as it should be.

The encryption key and initialization vector can be found in TERA.exe. They are usually freshly generated for each game build.

Overall Structure

The overall structure can be described like this:

struct DataCenterFile
{
    DataCenterPackedHeader packed_header;
    DataCenterHeader header;
    DataCenterSimpleRegion<DataCenterExtension, false> extensions;
    DataCenterSegmentedRegion<DataCenterAttribute> attributes;
    DataCenterSegmentedRegion<DataCenterElement> elements;
    DataCenterStringTable<1024> values;
    DataCenterStringTable<512> names;
    DataCenterFooter footer;
};

Packed Header

After decryption, there is a small header of the form:

struct DataCenterPackedHeader
{
    uint32_t decompressed_size;
    uint16_t zlib_header;
};

All data immediately following this header is compressed with the Deflate algorithm.

decompressed_size is the size of the data center file once inflated.

zlib_header is a zlib (§2.2) header. For packed data center files, it is always of the form 0x9c78, with all the facts that this implies about the compression method.

Unpacked data center files (.dec) used by Alkahest and other third-party projects simply contain the decrypted and decompressed data; packed_header is stripped.

File Header

A data center file starts with this header:

struct DataCenterHeader
{
    uint32_t file_version;
    int32_t unknown1;
    int16_t unknown2;
    int16_t unknown3;
    uint32_t client_version;
    int32_t unknown4;
    int32_t unknown5;
    int32_t unknown6;
    int32_t unknown7;
};

file_version is currently 6.

unknown1, unknown2, unknown4, unknown5, unknown6, and unknown7 are all currently 0. Some of them are actually part of more complex tree structures, but released data centers always have them zeroed.

unknown3 is always -16400.

client_version is usually (but not always) the value sent in C_CHECK_VERSION by the client.

File Footer

A data center file ends with this footer:

struct DataCenterFooter
{
    int32_t unknown1;
};

unknown1 is currently 0.

Regions

Most of the data in data center files is arranged into regions, which may be segmented. The region structures used throughout the format are described here:

struct DataCenterSimpleRegion<typename T, bool off_by_one>
{
    uint32_t count;
    T elements[off_by_one ? count - 1 : count];
};

struct DataCenterSegmentedSimpleRegion<typename T, uint32 count>
{
    DataCenterSimpleRegion<T, false> segments[count];
};

struct DataCenterRegion<typename T>
{
    uint32_t full_count;
    uint32_t used_count;
    T elements[full_count];
};

struct DataCenterSegmentedRegion<typename T>
{
    uint32_t count;
    DataCenterRegion<T> segments[count];
};

A DataCenterSegmentedSimpleRegion is mostly the same as a DataCenterSegmentedRegion, with the main difference being that it has a static amount of segments.

Addresses are frequently used to refer to elements within both types of segmented regions:

struct DataCenterAddress
{
    uint16_t segment_index;
    uint16_t element_index;
};

Here, segment_index is a zero-based index into the segments array of the segmented region, while element_index is a zero-based index into the elements array of the segment. Addresses are typically written as segment_index:element_index in textual form. The special address 65535:65535 is used to indicate the absence of an actual address, and thus will never be used when a real lookup is intended.

String Tables

All strings, whether they are names or values, are arranged into string tables, which are effectively used as hash tables by the game. A string table has the form:

struct DataCenterStringTable<uint32 count>
{
    DataCenterSegmentedRegion<char16_t> data;
    DataCenterSegmentedSimpleRegion<DataCenterString, count> table;
    DataCenterSimpleRegion<DataCenterAddress, true> addresses;
};

A string entry in the table region looks like this:

struct DataCenterString
{
    uint32_t hash;
    uint32_t length;
    uint32_t index;
    DataCenterAddress address;
};

hash is a hash code for the string, given by the expression hash(value) where value is the string value. In a typical data center file, there is only a very tiny amount of hash collisions.

length is the length of the string in terms of UTF-16LE code units, including the NUL character.

index is a one-based index into the string table's addresses region. The address at this index must match the address field exactly.

address is an address into the string table's data region. This address points to the actual string data. All strings are NUL-terminated. The string read from this address must have the same length as the length field.

It is important that a string entry is placed in the correct table segment based on its hash field. The segment index is given by the expression (hash ^ hash >> 16) % count where count is the static size of the table region. Further, entries in a segment must be sorted by their hash code in ascending order.

Elements

The actual interesting data in a data center file is stored as a sort of tree structure, similar to XML. Each element is of the form:

struct DataCenterElement
{
    uint16_t name_index;
    uint8_t extension_flags : 4;
    uint16_t extension_index : 12;
    uint16_t attribute_count;
    uint16_t child_count;
    DataCenterAddress attribute_address;
    uint32_t padding_1;
    DataCenterAddress child_address;
    uint32_t padding_2;
};

name_index is a one-based index into the addresses region of the names table. If this value is 0, it indicates that this element has no name or associated data of any kind, and should be disregarded; in this case, all other integer fields of the element will be 0.

extension_index is a zero-based index into the extensions region.

extension_flags is currently 0.

attribute_count and child_count indicate how many attributes and child elements should be read for this element, respectively. If a count field is 0, then the corresponding address field is 65535:65535.

attribute_address is an address into the attributes region. attribute_count attributes should be read at this address.

child_address is an address into the elements region. child_count elements should be read at this address.

padding_1 and padding_2 are meaningless.

The root element of the data tree is always located at the address 0:0 and always has the name __root__.

Extensions

struct DataCenterExtension
{
    int32_t unknown1;
    int32_t unknown2;
};

The exact meaning of this region is currently unknown, but it appears to be used to impart some special meaning or functionality on well-known elements.

Attributes

Each element in the data tree has zero or more attributes, which are name/value pairs. They are of the form:

struct DataCenterAttribute
{
    uint16_t name_index;
    DataCenterTypeCode type_code : 2;
    union
    {
        bool is_boolean;
        uint16_t value_hash : 14;
    } extended_code;
    union
    {
        int32_t i;
        bool b;
        float f;
        DataCenterAddress a;
    } value;
    uint32_t padding;
};

name_index is a one-based index into the addresses region of the names table.

type_code specifies the kind of value the attribute holds. Valid values are as follows:

enum DataCenterTypeCode : uint8_t
{
    DATA_CENTER_TYPE_CODE_INT = 1,
    DATA_CENTER_TYPE_CODE_FLOAT = 2,
    DATA_CENTER_TYPE_CODE_STRING = 3,
};

extended_code specifies extra information based on the value of type_code:

  • If type_code is DATA_CENTER_TYPE_CODE_INT, then the is_boolean field indicates whether the attribute's value is constrained to 1 (true) and 0 (false).
  • If type_code is DATA_CENTER_TYPE_CODE_FLOAT, then extended_code is zeroed.
  • If type_code is DATA_CENTER_TYPE_CODE_STRING, then value_hash is given by the expression hash(normalize(value)) & 0xFFFF where value is the string value.

value holds the attribute value and is interpreted according to type_code and extended_code. In the case of DATA_CENTER_TYPE_CODE_STRING, the a field holds an address into the data region of the values table. For other type codes, the value is written directly and is accessed through the i, b, or f fields.

padding is meaningless.