-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Unify epsilon, positive start, positive end, and negative transitions into spontaneous transitions. #76
base: main
Are you sure you want to change the base?
refactor: Unify epsilon, positive start, positive end, and negative transitions into spontaneous transitions. #76
Conversation
Co-authored-by: Lin Zhihao <[email protected]>
…reflect functionality change.
WalkthroughThis pull request introduces a comprehensive refactoring of the log surgeon's finite automata implementation, focusing on replacing the concept of "tags" with "captures". The changes span multiple files across the project, including CMakeLists.txt, source files, and test cases. The primary goal appears to be enhancing the clarity and functionality of capture group handling in regex and NFA processing, with modifications to type definitions, method signatures, and internal data structures. Changes
Sequence DiagramsequenceDiagram
participant Lexer
participant Nfa
participant RegexAST
participant Capture
Lexer->>Nfa: Create NFA with rules
Nfa->>RegexAST: Extract captures
RegexAST->>Capture: Generate unique capture IDs
Nfa-->>Lexer: NFA with capture mappings
Possibly Related PRs
Suggested Reviewers
Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (7)
src/log_surgeon/Lexer.tpp (4)
362-365
: Passsymbol_id_t
by value inadd_rule
methodSince
symbol_id_t
is likely a small type (e.g., an integer), passing it by value is more efficient than passing by const reference.Apply this diff to update the parameter:
- void Lexer<TypedNfaState, TypedDfaState>::add_rule( - symbol_id_t const& var_id, + void Lexer<TypedNfaState, TypedDfaState>::add_rule( + symbol_id_t var_id, std::unique_ptr<finite_automata::RegexAST<TypedNfaState>> rule)
369-374
: Passsymbol_id_t
by value inget_rule
methodFor consistency and efficiency, consider passing
symbol_id_t
by value in theget_rule
method.Apply this diff to update the parameter:
- auto Lexer<TypedNfaState, TypedDfaState>::get_rule(symbol_id_t const var_id) + auto Lexer<TypedNfaState, TypedDfaState>::get_rule(symbol_id_t var_id)
381-395
: Improve exception message for duplicate capture namesIncluding the duplicate capture name in the exception message enhances debugging by providing specific information about the error.
Apply this diff to enhance the exception message:
- throw std::invalid_argument("`m_rules` contains capture names that are not unique."); + throw std::invalid_argument("Duplicate capture name detected: " + capture_name);
404-404
: Offer assistance for DFA capture handlingThe TODO comment notes that the DFA currently ignores captures, which might lead to incorrect lexing of patterns with capture groups.
Would you like assistance in updating the DFA implementation to properly handle captures? I can help develop a solution and open a new GitHub issue to track this task.
src/log_surgeon/Lexer.hpp (1)
131-169
: Add documentation for the new getter methods.While the methods are well-structured, they would benefit from documentation explaining:
- The purpose of each method
- The meaning of nullopt returns
- Any preconditions or postconditions
tests/test-nfa.cpp (1)
47-47
: Consider using std::move for rules.The removal of std::move could lead to unnecessary copying. Consider restoring it:
- ByteNfa const nfa{rules}; + ByteNfa const nfa{std::move(rules)};tests/test-lexer.cpp (1)
296-350
: Comprehensive test coverage with room for expansion.The test cases effectively cover both basic lexer functionality and capture groups. However, there's a TODO comment about adding tests for register-related functionality.
Would you like me to help implement the register-related tests once the determinization is implemented?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (18)
CMakeLists.txt
(1 hunks)src/log_surgeon/Lexer.hpp
(5 hunks)src/log_surgeon/Lexer.tpp
(3 hunks)src/log_surgeon/LexicalRule.hpp
(1 hunks)src/log_surgeon/SchemaParser.cpp
(4 hunks)src/log_surgeon/finite_automata/Capture.hpp
(2 hunks)src/log_surgeon/finite_automata/Dfa.hpp
(3 hunks)src/log_surgeon/finite_automata/Nfa.hpp
(3 hunks)src/log_surgeon/finite_automata/NfaState.hpp
(3 hunks)src/log_surgeon/finite_automata/PrefixTree.hpp
(1 hunks)src/log_surgeon/finite_automata/RegexAST.hpp
(21 hunks)src/log_surgeon/finite_automata/RegisterHandler.hpp
(2 hunks)src/log_surgeon/finite_automata/TaggedTransition.hpp
(3 hunks)tests/CMakeLists.txt
(2 hunks)tests/test-capture.cpp
(1 hunks)tests/test-lexer.cpp
(5 hunks)tests/test-nfa.cpp
(4 hunks)tests/test-tag.cpp
(0 hunks)
💤 Files with no reviewable changes (1)
- tests/test-tag.cpp
✅ Files skipped from review due to trivial changes (1)
- src/log_surgeon/finite_automata/PrefixTree.hpp
🧰 Additional context used
📓 Path-based instructions (13)
src/log_surgeon/finite_automata/RegisterHandler.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/LexicalRule.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
tests/test-nfa.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/SchemaParser.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
tests/test-lexer.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/Capture.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/Dfa.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/NfaState.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/Lexer.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/Nfa.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/RegexAST.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
tests/test-capture.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/TaggedTransition.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
📓 Learnings (7)
src/log_surgeon/finite_automata/RegisterHandler.hpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#56
File: src/log_surgeon/finite_automata/RegisterHandler.hpp:0-0
Timestamp: 2024-11-27T22:25:35.608Z
Learning: In the `RegisterHandler` class in `src/log_surgeon/finite_automata/RegisterHandler.hpp`, the methods `add_register` and `append_position` rely on `emplace_back` and `m_prefix_tree.insert` to handle exceptions correctly and maintain consistent state without requiring additional exception handling.
src/log_surgeon/SchemaParser.cpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexAST.hpp:700-700
Timestamp: 2024-11-13T22:38:19.472Z
Learning: In `RegexASTCapture`, `m_tag` must always be non-null.
src/log_surgeon/finite_automata/Capture.hpp (2)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexAST.hpp:700-700
Timestamp: 2024-11-13T22:38:19.472Z
Learning: In `RegexASTCapture`, `m_tag` must always be non-null.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#50
File: src/log_surgeon/finite_automata/Tag.hpp:0-0
Timestamp: 2024-11-18T16:45:46.074Z
Learning: The class `TagPositions` was removed from `src/log_surgeon/finite_automata/Tag.hpp` as it is no longer needed.
src/log_surgeon/finite_automata/NfaState.hpp (3)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:127-128
Timestamp: 2024-11-10T16:46:58.543Z
Learning: `RegexNFAUTF8State` is defined as a type alias for `RegexNFAState<RegexNFAStateType::UTF8>`.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/TaggedTransition.hpp:16-37
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In `src/log_surgeon/finite_automata/TaggedTransition.hpp`, the classes `PositiveTaggedTransition` and `NegativeTaggedTransition` currently do not share enough functionality to justify refactoring into a common base class.
src/log_surgeon/Lexer.hpp (2)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#42
File: src/log_surgeon/finite_automata/RegexNFA.hpp:37-90
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In this codebase, prefer code clarity over efficiency optimizations unless efficiency is a critical concern.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#56
File: src/log_surgeon/finite_automata/RegisterHandler.hpp:0-0
Timestamp: 2024-11-27T22:25:35.608Z
Learning: In the `RegisterHandler` class in `src/log_surgeon/finite_automata/RegisterHandler.hpp`, the methods `add_register` and `append_position` rely on `emplace_back` and `m_prefix_tree.insert` to handle exceptions correctly and maintain consistent state without requiring additional exception handling.
src/log_surgeon/finite_automata/RegexAST.hpp (3)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexAST.hpp:700-700
Timestamp: 2024-11-13T22:38:19.472Z
Learning: In `RegexASTCapture`, `m_tag` must always be non-null.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#50
File: src/log_surgeon/finite_automata/Tag.hpp:0-0
Timestamp: 2024-11-18T16:45:46.074Z
Learning: The class `TagPositions` was removed from `src/log_surgeon/finite_automata/Tag.hpp` as it is no longer needed.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
src/log_surgeon/finite_automata/TaggedTransition.hpp (3)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/TaggedTransition.hpp:16-37
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In `src/log_surgeon/finite_automata/TaggedTransition.hpp`, the classes `PositiveTaggedTransition` and `NegativeTaggedTransition` currently do not share enough functionality to justify refactoring into a common base class.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#50
File: src/log_surgeon/finite_automata/Tag.hpp:0-0
Timestamp: 2024-11-18T16:45:46.074Z
Learning: The class `TagPositions` was removed from `src/log_surgeon/finite_automata/Tag.hpp` as it is no longer needed.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: build (ubuntu-latest, Debug)
🔇 Additional comments (27)
src/log_surgeon/finite_automata/Nfa.hpp (1)
43-43
: Good practice: Passrules
by const referenceModifying the constructor to accept
rules
as aconst&
parameter avoids unnecessary copies and improves performance.src/log_surgeon/finite_automata/RegexAST.hpp (4)
86-100
: Efficient management of subtree positive capturesThe addition of methods to get, set, and add
m_subtree_positive_captures
enhances the clarity and maintainability of capture tracking within the AST nodes.
650-660
: Validation of non-null parameters inRegexASTCapture
constructorEnsuring that
group_regex_ast
andcapture
are not null by throwingstd::invalid_argument
improves robustness and aligns with previously learned best practices regarding non-null requirements form_capture
inRegexASTCapture
.This approach reflects the learning that
m_tag
(nowm_capture
) must always be non-null, as noted in past experiences.
788-790
: Correct accumulation of positive captures inRegexASTOr
The constructor correctly accumulates subtree positive captures from both left and right operands, ensuring accurate tracking of captures in alternation expressions.
817-819
: Accurate merging of captures inRegexASTCat
Combining subtree positive captures from both operands in concatenation expressions maintains the integrity of capture groups throughout the AST.
src/log_surgeon/finite_automata/Capture.hpp (2)
1-2
: LGTM! Class and header guards renamed consistently.The renaming from
Tag
toCapture
has been consistently applied across the class name and header guards, which improves code clarity by better reflecting its purpose.Also applies to: 9-9, 20-20
11-11
: LGTM! Constructor implementation is correct.The constructor correctly moves the name parameter into the member variable, which is an efficient approach for string handling.
tests/test-capture.cpp (1)
7-34
: LGTM! Comprehensive test coverage with well-organized test cases.The test suite effectively covers:
- Basic functionality
- Edge cases (empty names)
- Special character handling
- Copy and move semantics
src/log_surgeon/LexicalRule.hpp (1)
26-28
: LGTM! Well-implemented getter with appropriate const correctness.The method correctly:
- Uses
[[nodiscard]]
to prevent accidental value discarding- Returns const pointers to prevent modification of captures
- Delegates to the appropriate regex method
src/log_surgeon/finite_automata/RegisterHandler.hpp (2)
5-5
: LGTM! Appropriate include for fixed-width integer types.The addition of
<cstdint>
is necessary for usinguint32_t
in the type alias.
21-21
: LGTM! Type alias improves code clarity and maintainability.Using
register_id_t
instead of rawuint32_t
enhances code readability and makes future type changes easier.src/log_surgeon/finite_automata/TaggedTransition.hpp (3)
14-15
: LGTM! Type alias improves code clarity.The introduction of
tag_id_t
asstd::uint32_t
enhances code readability and ensures consistent tag ID representation across the codebase.
Line range hint
23-46
: LGTM! Improved type safety with value types.The refactoring from Tag pointers to tag_id_t values:
- Reduces memory management complexity
- Improves type safety
- Aligns with the PR objective of simplifying transition handling
Line range hint
55-78
: LGTM! Consistent refactoring approach.The changes to NegativeTaggedTransition mirror those in PositiveTaggedTransition, maintaining consistency in the transition handling approach.
src/log_surgeon/finite_automata/Dfa.hpp (2)
Line range hint
5-45
: LGTM! Enhanced register handling capabilities.The addition of RegisterHandler and supporting includes improves the DFA's register management capabilities.
82-82
: LGTM! Explicit pointer initialization improves safety.Initializing the state pointer to nullptr prevents potential undefined behavior.
src/log_surgeon/Lexer.hpp (3)
27-29
: LGTM! Well-defined type system.The introduction of clear type aliases enhances code readability and maintains consistency with the codebase's type system.
59-61
: LGTM! Clear error documentation.The updated documentation clearly specifies the exception condition for duplicate capture names.
197-199
: LGTM! Efficient ID mapping implementation.The use of unordered_map provides efficient lookups and aligns with the transition to an ID-based system.
tests/test-nfa.cpp (1)
Line range hint
50-138
: LGTM! Test expectations updated correctly.The test expectations have been properly updated to reflect the new tag ID system while maintaining comprehensive coverage of different transition types.
src/log_surgeon/finite_automata/NfaState.hpp (3)
35-40
: LGTM! Constructor changes improve efficiency.The switch from
Tag const*
totag_id_t
for constructor parameters is a good change that:
- Reduces pointer indirection
- Aligns with the PR's goal of unifying transitions
- Simplifies memory management
54-56
: LGTM! Method signature change maintains consistency.The update to use
tag_id_t
inadd_positive_tagged_start_transition
maintains consistency with the constructor changes.
Line range hint
205-251
: Consider documenting the epsilon transition behaviour.The TODO comment "currently treat tagged transitions as epsilon transitions" suggests this might be temporary. Consider:
- Documenting this behaviour in the class documentation
- Evaluating if this should remain the long-term approach
tests/test-lexer.cpp (1)
58-72
: Well-structured helper functions with clear documentation.The new helper functions
initialize_lexer
andtest_scanning_input
improve test maintainability and readability.src/log_surgeon/SchemaParser.cpp (1)
12-12
: LGTM! Consistent transition from Tag to Capture.The changes properly implement the transition from Tag to Capture, maintaining consistency across the codebase.
Also applies to: 170-170
tests/CMakeLists.txt (2)
5-7
: LGTM! Source files properly updated.The addition of Capture.hpp and related headers maintains proper build dependencies.
27-34
: LGTM! Test sources well-organized.The addition of test-capture.cpp and the improved formatting enhance maintainability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🔭 Outside diff range comments (1)
src/log_surgeon/finite_automata/RegexAST.hpp (1)
Line range hint
89-850
: Format the code using clang-format.Multiple code formatting violations were detected by the linter. Please run clang-format on the file to ensure it adheres to the project's style guide.
♻️ Duplicate comments (1)
src/log_surgeon/finite_automata/Nfa.hpp (1)
20-28
:⚠️ Potential issueEnsure thread safety of
UniqueIdGenerator
The
current_id
counter is not thread-safe. If instances ofNfa
are accessed by multiple threads, consider usingstd::atomic<uint32_t>
forcurrent_id
to prevent data races.private: - uint32_t current_id; + std::atomic<uint32_t> current_id;
🧹 Nitpick comments (6)
src/log_surgeon/finite_automata/SpontaneousTransition.hpp (4)
15-19
: Add documentation for TransitionOperation enum valuesEach enum value's purpose should be documented to improve code maintainability.
Add documentation like this:
enum class TransitionOperation { + /// No operation to perform on tags None, + /// Set the specified tags as active SetTags, + /// Set the specified tags as inactive NegateTags };
62-64
: Make member variables constSince these members are never modified after construction, they should be marked as const.
Apply this diff:
- TransitionOperation m_transition_op; - std::vector<tag_id_t> m_tag_ids; - TypedNfaState const* m_dest_state; + const TransitionOperation m_transition_op; + const std::vector<tag_id_t> m_tag_ids; + TypedNfaState const* const m_dest_state;
52-59
: Enhance serialize() implementationThe current implementation has two potential improvements:
- Include the transition operation in the output for better debugging and visualization
- Handle empty tag_ids vector case specially
Consider this implementation:
[[nodiscard]] auto serialize(std::unordered_map<TypedNfaState const*, uint32_t> const& state_ids ) const -> std::optional<std::string> { auto const state_id_it = state_ids.find(m_dest_state); if (state_id_it == state_ids.end()) { return std::nullopt; } - return fmt::format("{}[{}]", state_id_it->second, fmt::join(m_tag_ids, ",")); + auto const op_str = m_transition_op == TransitionOperation::None ? "" + : m_transition_op == TransitionOperation::SetTags ? "+" + : "-"; + auto const tags_str = m_tag_ids.empty() ? "" + : fmt::format("[{}]", fmt::join(m_tag_ids, ",")); + return fmt::format("{}{}{}", state_id_it->second, op_str, tags_str); }
26-26
: Add comparison operators for container operationsConsider adding operator== and operator<=> to enable using this class in ordered containers and algorithms.
Add these operators:
auto operator<=>(const SpontaneousTransition&) const = default; bool operator==(const SpontaneousTransition&) const = default;src/log_surgeon/finite_automata/NfaState.hpp (1)
48-49
: Add parameter validationThe method should validate that
dest_state
is not null before adding it to the transitions.auto add_spontaneous_transition(NfaState* dest_state) -> void { + if (nullptr == dest_state) { + return; + } m_spontaneous_transitions.emplace_back(dest_state); }src/log_surgeon/finite_automata/Nfa.hpp (1)
110-111
: Consider implementing scoped capture groupsThe TODO comment suggests that unique naming across capture groups limits use cases. Consider implementing a scoping mechanism to allow reuse of capture names within different scopes.
Would you like me to propose a design for implementing scoped capture groups?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
CMakeLists.txt
(1 hunks)src/log_surgeon/LexicalRule.hpp
(2 hunks)src/log_surgeon/finite_automata/Nfa.hpp
(5 hunks)src/log_surgeon/finite_automata/NfaState.hpp
(6 hunks)src/log_surgeon/finite_automata/RegexAST.hpp
(19 hunks)src/log_surgeon/finite_automata/SpontaneousTransition.hpp
(1 hunks)src/log_surgeon/finite_automata/TaggedTransition.hpp
(0 hunks)tests/CMakeLists.txt
(2 hunks)tests/test-nfa.cpp
(5 hunks)
💤 Files with no reviewable changes (1)
- src/log_surgeon/finite_automata/TaggedTransition.hpp
🚧 Files skipped from review as they are similar to previous changes (4)
- src/log_surgeon/LexicalRule.hpp
- tests/test-nfa.cpp
- CMakeLists.txt
- tests/CMakeLists.txt
🧰 Additional context used
📓 Path-based instructions (4)
src/log_surgeon/finite_automata/SpontaneousTransition.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/Nfa.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/NfaState.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/RegexAST.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
📓 Learnings (3)
src/log_surgeon/finite_automata/SpontaneousTransition.hpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/TaggedTransition.hpp:16-37
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In `src/log_surgeon/finite_automata/TaggedTransition.hpp`, the classes `PositiveTaggedTransition` and `NegativeTaggedTransition` currently do not share enough functionality to justify refactoring into a common base class.
src/log_surgeon/finite_automata/NfaState.hpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
src/log_surgeon/finite_automata/RegexAST.hpp (3)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexAST.hpp:700-700
Timestamp: 2024-11-13T22:38:19.472Z
Learning: In `RegexASTCapture`, `m_tag` must always be non-null.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#50
File: src/log_surgeon/finite_automata/Tag.hpp:0-0
Timestamp: 2024-11-18T16:45:46.074Z
Learning: The class `TagPositions` was removed from `src/log_surgeon/finite_automata/Tag.hpp` as it is no longer needed.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
🪛 GitHub Actions: lint
src/log_surgeon/finite_automata/Nfa.hpp
[error] 224-224: Code formatting violation detected. Line needs to be formatted using clang-format.
src/log_surgeon/finite_automata/NfaState.hpp
[error] 198-204: Code formatting violations detected. Multiple lines need to be formatted using clang-format.
src/log_surgeon/finite_automata/RegexAST.hpp
[error] 89-850: Multiple code formatting violations detected throughout the file. File needs to be formatted using clang-format.
🔇 Additional comments (9)
src/log_surgeon/finite_automata/RegexAST.hpp (7)
23-23
: LGTM! Documentation accurately reflects the tag-to-capture transition.The include directive and class documentation have been properly updated to reflect the architectural change from tags to captures.
Also applies to: 33-37
85-99
: LGTM! Member functions properly handle capture-based operations.The implementation correctly manages capture-related operations, and the condition on line 116 follows the coding guideline of using
false == <expression>
.Also applies to: 103-104, 112-120
132-148
: LGTM! Serialization methods effectively handle capture formatting.The implementation makes good use of modern C++ features like ranges and format for serializing captures.
Line range hint
632-658
: LGTM! RegexASTCapture implementation maintains non-null capture invariant.The implementation correctly enforces the non-null requirement for captures, which aligns with the previously established invariant from PR #48.
Also applies to: 659-662, 670-671
784-788
: LGTM! RegexASTOr correctly handles captures in alternation.The implementation properly manages captures and their negations in alternation patterns.
Also applies to: 794-795
815-817
: LGTM! RegexASTCat properly manages captures in concatenation.The implementation correctly handles capture propagation in concatenated patterns.
Also applies to: 825-825, 827-827
850-851
: LGTM! RegexASTMultiplication effectively manages captures in repetition patterns.The implementation properly handles captures in various repetition scenarios (zero, finite, and infinite).
Also applies to: 861-885
src/log_surgeon/finite_automata/NfaState.hpp (2)
171-175
: Address TODO comment regarding tag modificationsThe TODO comment indicates that only transitions without tag modifications are handled. This limitation should be addressed to ensure proper functionality.
Would you like me to help implement the handling of transitions that modify tags?
198-204
:⚠️ Potential issueFix code formatting
The code formatting violates the project's style guidelines. Apply proper formatting to these lines.
- auto const optional_serialized_transition - = spontaneous_transition.serialize(state_ids); - if(false == optional_serialized_transition.has_value()) { + auto const optional_serialized_transition = spontaneous_transition.serialize(state_ids); + if (false == optional_serialized_transition.has_value()) {Likely invalid or redundant comment.
🧰 Tools
🪛 GitHub Actions: lint
[error] 198-204: Code formatting violations detected. Multiple lines need to be formatted using clang-format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
src/log_surgeon/finite_automata/Nfa.hpp
(5 hunks)src/log_surgeon/finite_automata/NfaState.hpp
(6 hunks)src/log_surgeon/finite_automata/RegexAST.hpp
(19 hunks)src/log_surgeon/finite_automata/SpontaneousTransition.hpp
(1 hunks)tests/test-nfa.cpp
(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/test-nfa.cpp
- src/log_surgeon/finite_automata/SpontaneousTransition.hpp
🧰 Additional context used
📓 Path-based instructions (3)
src/log_surgeon/finite_automata/Nfa.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/NfaState.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/RegexAST.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
📓 Learnings (2)
src/log_surgeon/finite_automata/NfaState.hpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
src/log_surgeon/finite_automata/RegexAST.hpp (2)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexAST.hpp:700-700
Timestamp: 2024-11-13T22:38:19.472Z
Learning: In `RegexASTCapture`, `m_tag` must always be non-null.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#50
File: src/log_surgeon/finite_automata/Tag.hpp:0-0
Timestamp: 2024-11-18T16:45:46.074Z
Learning: The class `TagPositions` was removed from `src/log_surgeon/finite_automata/Tag.hpp` as it is no longer needed.
🔇 Additional comments (12)
src/log_surgeon/finite_automata/RegexAST.hpp (4)
23-23
: LGTM! Documentation and includes updated to reflect the transition from tags to captures.The changes accurately reflect the architectural shift from tag-based to capture-based implementation.
Also applies to: 33-37
85-102
: LGTM! Base class changes maintain functionality while transitioning to captures.The changes consistently replace tag-based operations with capture-based ones, maintaining the same functionality and following coding guidelines.
Also applies to: 110-118, 130-146, 151-153
630-632
: LGTM! RegexASTCapture changes maintain the non-null invariant.The changes correctly transition from tags to captures while maintaining the critical invariant that the capture pointer must never be null, as enforced by the constructor's null checks.
Also applies to: 641-658, 666-667, 707-707, 716-716
780-783
: LGTM! Derived classes updated consistently.The changes systematically replace tag operations with capture operations across all derived classes. The updated documentation in
RegexASTCapture::add_to_nfa
provides an excellent visualization of the NFA structure.Also applies to: 789-790, 810-811, 819-819, 821-821, 854-878, 906-926, 932-932, 936-936, 942-949
src/log_surgeon/finite_automata/NfaState.hpp (4)
16-21
: LGTM! Clean header organization and type aliases.The addition of
SpontaneousTransition.hpp
and the type aliases forByteNfaState
andUtf8NfaState
improve code readability.
35-40
: LGTM! Clear constructor implementation.The constructor properly initializes the spontaneous transitions with the provided parameters.
48-57
: LGTM! Well-designed method overloads.The two overloads of
add_spontaneous_transition
provide good flexibility for adding transitions with or without operations and tags.
196-202
: LGTM! Robust error handling.The serialization implementation properly handles failures using
std::optional
.src/log_surgeon/finite_automata/Nfa.hpp (4)
126-136
: LGTM! Efficient tag management.The method efficiently handles tag creation and caching using the unique ID generator.
146-161
: LGTM! Clear negative capture handling.The method properly manages negative transitions and tag creation for each capture.
166-179
: LGTM! Well-structured capture state creation.The method properly creates and links start and end states with appropriate tag operations.
20-28
: 🛠️ Refactor suggestionConsider making UniqueIdGenerator thread-safe.
The current implementation uses a non-atomic increment operation which could lead to race conditions if used in a multi-threaded context.
class UniqueIdGenerator { public: UniqueIdGenerator() : current_id{0} {} - [[nodiscard]] auto generate_id() -> uint32_t { return current_id++; } + [[nodiscard]] auto generate_id() -> uint32_t { + return std::atomic_fetch_add(¤t_id, 1u); + } private: - uint32_t current_id; + std::atomic<uint32_t> current_id; };Likely invalid or redundant comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/log_surgeon/finite_automata/Nfa.hpp (1)
166-180
: Consider more efficient vector initialization.The implementation is correct, but the vector initialization for end_tag could be more efficient.
- std::vector{end_tag}, + std::vector<tag_id_t>{end_tag},
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/log_surgeon/finite_automata/Nfa.hpp
(5 hunks)src/log_surgeon/finite_automata/NfaState.hpp
(6 hunks)src/log_surgeon/finite_automata/SpontaneousTransition.hpp
(1 hunks)tests/test-nfa.cpp
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/log_surgeon/finite_automata/SpontaneousTransition.hpp
- tests/test-nfa.cpp
🧰 Additional context used
📓 Path-based instructions (2)
src/log_surgeon/finite_automata/Nfa.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/NfaState.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
📓 Learnings (1)
src/log_surgeon/finite_automata/NfaState.hpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
🔇 Additional comments (8)
src/log_surgeon/finite_automata/NfaState.hpp (4)
35-40
: LGTM! Efficient constructor implementation.The constructor correctly uses std::move for tag_ids parameter and initializes m_spontaneous_transitions with the provided parameters.
48-57
: LGTM! Well-designed transition methods.Both overloads of add_spontaneous_transition are well-implemented, with proper parameter handling and efficient use of std::move.
171-174
: Consider transition operations in epsilon closure.The epsilon_closure method currently adds all spontaneous transitions to the closure set without checking their TransitionOperation. This might lead to incorrect closure calculation if some transitions modify tags.
Consider filtering transitions based on their operation type or documenting the intended behaviour.
193-207
: LGTM! Robust serialization implementation.The serialize method properly handles serialization failures and maintains a clear format for the output.
src/log_surgeon/finite_automata/Nfa.hpp (4)
126-136
: LGTM! Clean implementation of tag management.The method efficiently handles both existing and new tags, with good use of structured bindings and clear control flow.
146-162
: LGTM! Well-structured negative capture handling.The method properly collects both start and end tags for each capture and efficiently creates the new state with the NegateTags operation.
Line range hint
216-233
: LGTM! Robust serialization with proper error handling.The method correctly handles serialization failures and maintains a clear format for the output.
20-28
:⚠️ Potential issueMake UniqueIdGenerator thread-safe.
The current implementation uses a non-atomic increment operation which isn't thread-safe. Consider using std::atomic<uint32_t> for current_id to prevent data races in multi-threaded contexts.
- uint32_t current_id; + std::atomic<uint32_t> current_id;Likely invalid or redundant comment.
References
Description
Validation performed
Summary by CodeRabbit
Release Notes
New Features
Capture
class to replace the existingTag
class.SpontaneousTransition
class for NFA transitions.Capture
class functionality.Refactoring
Documentation
Testing
These changes improve the log surgeon library's type management and capture group handling while maintaining existing functionality.