log-surgeon
is a library for high-performance parsing of unstructured text
logs. It allows users to parse and extract information from the vast amount of
unstructured logs generated by today's open-source software.
Some of the library's features include:
- Parsing and extracting variable values like the log event's log-level and any other user-specified variables, no matter where they appear in each log event.
- Parsing by using regular expressions for each variable type rather than regular expressions for an entire log event.
- Improved latency, and memory efficiency compared to popular regex engines.
- Parsing multi-line log events (delimited by timestamps).
Note that log-surgeon
is not a generic regex engine and does impose some
constraints on how log events can be parsed.
Let's say we want to parse and inspect multi-line log events like this:
2023-02-23T18:10:14-0500 DEBUG task_123 crashed. Dumping stacktrace:
#0 0x000000000040110e in bar () at example.cpp:6
#1 0x000000000040111d in bar () at example.cpp:10
#2 0x0000000000401129 in main () at example.cpp:15
Using the example schema file which includes these rules:
timestamp:\d{4}\-\d{2}\-\d{2}T\d{2}:\d{2}:\d{2}\-\d{4}
...
loglevel:INFO|DEBUG|WARN|ERROR
We can parse and inspect the events as follows:
// Define a reader to read from your data source
Reader reader{/* <Omitted> */};
// Instantiate the parser
ReaderParser parser{"examples/schema.txt"};
parser.reset_and_set_reader(reader);
// Get the loglevel variable's ID
optional<uint32_t> loglevel_id{parser.get_variable_id("loglevel")};
// <Omitted validation of loglevel_id>
// Create a LogEventView (similar to a string_view)
LogEventView event{&parser.get_log_parser()};
while (false == parser.done()) {
// Parse the next event
auto err = parser.get_next_event_view(event);
if (ErrorCode::Success != err) {
throw runtime_error("Parsing Failed");
}
// Get and print the timestamp
Token* timestamp{event.get_timestamp()};
if (nullptr != timestamp) {
cout << "timestamp: " << timestamp->to_string_view() << endl;
}
// Get and print the log-level
auto const& loglevels = event.get_variables(*loglevel_id);
if (false == loglevels.empty()) {
// In case there are multiple matches, just get the first one
cout << "loglevel:" << loglevels[0]->to_string_view() << endl;
}
// Other analysis...
// Print the entire event
cout << event->to_string() << endl;
}
For advanced uses, log-surgeon
also has a
BufferParser that reads directly from a buffer.
From the repo's root, run:
# Generate the CMake project
cmake -S . -B build
# Build the project
cmake --build ./build -j
# Install the project to ~/.local
cmake --install ./build --prefix ~/.local
To build the debug version replace the first command with:
cmake -S . -B ./build -j -DCMAKE_BUILD_TYPE=Debug
- docs contains more detailed documentation including:
- The schema specification, which describes the syntax for writing your own schema
log-surgeon
's design objectives
- examples contains programs demonstrating usage of the library.
You can use GitHub issues to report a bug or request a feature.
Join us on Zulip to chat with developers and other community members.
The following are issues we're aware of and working on:
- Schema rules must use ASCII characters. We will release UTF-8 support in a future release.
- Timestamps must appear at the start of the message to be handled specially (than other variable values) and support multi-line log events.
- A variable pattern has no way to match text around a variable, without having
it also be a part of the variable.
- Support for submatch extraction will be coming in a future release.