Skip to content

Latest commit

 

History

History
125 lines (87 loc) · 3.02 KB

clp-structured.md

File metadata and controls

125 lines (87 loc) · 3.02 KB

Using CLP for semi-structured logs

For semi-structured logs (e.g., JSON), you can compress, decompress, and search them using the clp-s binary described below.

Contents

Compression

Usage:

./clp-s c [<options>] <archives-dir> <input-path> [<input-path> ...]
  • archives-dir is the directory that archives should be written to.
  • input-path is any new-line-delimited JSON (ndjson) log file or directory containing such files.
  • options allow you to specify things like which field should be considered as the log event's timestamp (--timestamp-key <field-path>).
    • For a complete list, run ./clp-s c --help

Examples

Compress /mnt/logs/log1.json and output archives to /mnt/data/archives1:

./clp-s c /mnt/data/archives1 /mnt/logs/log1.json

Treat the field {"d": {"@timestamp": "..."}} as each log event's timestamp:

./clp-s c --timestamp-key 'd.@timestamp' /mnt/data/archives1 /mnt/logs/log1.json

Tip

Specifying the timestamp-key will create a range-index for the timestamp column which can increase compression ratio and search performance.

Set the target encoded size to 1 GiB and the compression level to 6 (3 by default)

./clp-s c \
    --target-encoded-size 1073741824 \
    --compression-level 6 \
    /mnt/data/archives1 \
    /mnt/logs/log1.json

Decompression

Usage:

./clp-s x <archives-dir> <output-dir>
  • archives-dir is a directory containing archives.
  • output-dir is the directory that decompressed logs should be written to.

Examples

Decompress all logs from /mnt/data/archives1 into /mnt/data/archives1-decomp:

./clp-s x /mnt/data/archives1 /mnt/data/archives1-decomp

Search

Usage:

./clp-s s <archives-dir> <kql-query>
  • archives-dir is a directory containing archives.
  • kql-query is a KQL query.

Examples

Find all log events within a time range:

./clp-s s /mnt/data/archives1 'ts >= 1649923037 AND ts <= 1649923038'

or

./clp-s s /mnt/data/archives1 \
    'ts >= date("2022-04-14T07:57:17") AND ts <= date("2022-04-14T07:57:18")'

Find log events with a given key-value pair:

./clp-s s /mnt/data/archives1 'id: 22149'

Find ERROR log events containing a substring:

./clp-s s /mnt/data/archives1 'level: ERROR AND message: "job*"'

Find both FATAL and ERROR log events:

./clp-s s /mnt/data/archives1 'level: FATAL OR level: ERROR'

Current limitations

  • clp-s currently only supports valid ndjson logs; it does not handle ndjson logs with trailing commas or other JSON syntax errors.
  • Time zone information is not preserved.
  • The order of log events is not preserved.
  • The input directory structure is not preserved and during decompression all files are written to the same file.