-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Move epsilon_closure
from Lexer
to NfaState
; Refactor nfa_to_dfa
as Dfa
's constructor.
#71
refactor: Move epsilon_closure
from Lexer
to NfaState
; Refactor nfa_to_dfa
as Dfa
's constructor.
#71
Conversation
Co-authored-by: Lin Zhihao <[email protected]>
Co-authored-by: Lin Zhihao <[email protected]>
Co-authored-by: Lin Zhihao <[email protected]>
Co-authored-by: Lin Zhihao <[email protected]>
Co-authored-by: Lin Zhihao <[email protected]>
WalkthroughThe pull request introduces significant changes to the handling of finite automata, specifically Deterministic Finite Automata (DFA) and Non-deterministic Finite Automata (NFA). The modifications include adding a new constructor to the Changes
Sequence DiagramsequenceDiagram
participant NFA
participant Dfa Constructor
participant NfaState
participant DFA
NFA ->> NfaState: epsilon_closure()
NfaState -->> Dfa Constructor: Return reachable states
Dfa Constructor ->> DFA: Create states and transitions
Dfa Constructor -->> DFA: Initialize complete DFA
Possibly related PRs
Suggested reviewers
Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Co-authored-by: Lin Zhihao <[email protected]>
Co-authored-by: Lin Zhihao <[email protected]>
Co-authored-by: Lin Zhihao <[email protected]>
Co-authored-by: Lin Zhihao <[email protected]>
Co-authored-by: Lin Zhihao <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
src/log_surgeon/finite_automata/NfaState.hpp (1)
97-100
: Documentation could be more descriptive.While the documentation indicates the return value, it would be beneficial to include:
- The method's purpose
- Time complexity
- Whether the method modifies the state
Consider expanding the documentation:
/** + * Computes the epsilon closure of the current state using depth-first traversal. + * The epsilon closure includes all states reachable through epsilon transitions and + * tagged transitions (as per TODO comment in implementation). + * + * Time complexity: O(V + E) where V is the number of states and E is the number + * of transitions. + * + * @note This method does not modify the current state. * @return The set of all states reachable from the current state via epsilon transitions. */src/log_surgeon/Lexer.tpp (1)
382-382
: Consider documenting the DFA construction behaviour.The TODO comment indicates that the DFA ignores tags, which is an important implementation detail that should be documented more formally.
Consider adding a formal documentation comment:
+ /** + * Generates a DFA from the NFA, ignoring tags. + * @note Currently, tags are ignored in the DFA construction. For example, + * "capture:user=(?<user_id>\d+)" is treated as "capture:user=\d+" + */ m_dfa = std::make_unique<finite_automata::Dfa<TypedDfaState>>(std::move(nfa));src/log_surgeon/finite_automata/Dfa.hpp (2)
17-17
: Consider acceptingnfa
by rvalue reference to improve performanceCurrently, the constructor accepts
Nfa<NfaStateType> nfa
by value, which may introduce unnecessary copying. If the intention is to transfer ownership ofnfa
, consider accepting it by rvalue reference (Nfa<NfaStateType>&& nfa
) and moving it to avoid unnecessary copies.
46-46
: Useusing
instead oftypedef
for type aliasingTo adhere to modern C++ conventions, consider replacing
typedef
withusing
for type aliasing:- typedef std::set<TypedNfaState const*> StateSet; + using StateSet = std::set<TypedNfaState const*>;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
examples/intersect-test.cpp
(3 hunks)src/log_surgeon/Buffer.hpp
(1 hunks)src/log_surgeon/Lalr1Parser.tpp
(26 hunks)src/log_surgeon/Lexer.hpp
(0 hunks)src/log_surgeon/Lexer.tpp
(10 hunks)src/log_surgeon/LogEvent.cpp
(2 hunks)src/log_surgeon/LogParser.cpp
(1 hunks)src/log_surgeon/Parser.tpp
(2 hunks)src/log_surgeon/ParserInputBuffer.cpp
(2 hunks)src/log_surgeon/Schema.cpp
(1 hunks)src/log_surgeon/SchemaParser.cpp
(2 hunks)src/log_surgeon/finite_automata/Dfa.hpp
(2 hunks)src/log_surgeon/finite_automata/NfaState.hpp
(3 hunks)
💤 Files with no reviewable changes (1)
- src/log_surgeon/Lexer.hpp
✅ Files skipped from review due to trivial changes (6)
- src/log_surgeon/LogEvent.cpp
- src/log_surgeon/Schema.cpp
- src/log_surgeon/SchemaParser.cpp
- src/log_surgeon/LogParser.cpp
- src/log_surgeon/Buffer.hpp
- src/log_surgeon/ParserInputBuffer.cpp
🧰 Additional context used
📓 Path-based instructions (3)
src/log_surgeon/finite_automata/NfaState.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/Dfa.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
examples/intersect-test.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build (ubuntu-latest, Release)
- GitHub Check: build (ubuntu-latest, Debug)
🔇 Additional comments (7)
src/log_surgeon/finite_automata/NfaState.hpp (2)
8-8
: LGTM!The addition of the
<stack>
header is necessary for the newepsilon_closure()
method implementation.
174-207
: Review the TODO comment and tagged transitions handling.The implementation treats tagged transitions as epsilon transitions (line 189), but this is marked as a TODO. This design decision should be reviewed as it might affect the correctness of the epsilon closure computation.
Let's verify the usage of tagged transitions:
Additionally:
- The implementation correctly handles cycles using a visited set (
closure_set
).- The use of
false == stack.empty()
andfalse == closure_set.insert(current_state).second
follows the coding guidelines for boolean comparisons.✅ Verification successful
The treatment of tagged transitions as epsilon transitions is intentional and correct.
The TODO comment appears to be a documentation note rather than a pending task. This behaviour is:
- Part of the regex capture group implementation design
- Well-documented in RegexAST.hpp
- Consistently implemented across the codebase
- Validated by tests
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for other implementations or tests that rely on this behaviour rg -A 5 "tagged.*transition.*epsilon" --type cppLength of output: 619
Script:
#!/bin/bash # Search for tests involving tagged transitions rg -l "TaggedTransition" test/ # Search for documentation about tagged transitions fd README.md | xargs rg "tagged.*transition" -A 3 # Search for related discussions in comments rg "tagged.*transition" --type cpp -g "!test/*" -A 3Length of output: 25622
src/log_surgeon/Lexer.tpp (2)
22-22
: LGTM! Consistent use of trailing return types.The change to use
auto -> void
for method return types is consistent with modern C++ practices and improves readability.Also applies to: 69-72, 78-81, 86-89, 153-153, 163-163, 213-214, 270-270, 302-302, 379-380, 416-416, 422-422, 428-428, 645-645
45-45
: LGTM! Consistent use ofauto
for variable declarations.The change to use
auto
for variable declarations improves maintainability and follows modern C++ type inference practices.Also applies to: 69-71, 85-85, 177-177, 201-201, 241-241, 256-256, 277-278, 283-286, 303-304, 321-322, 335-338, 348-351, 366-368, 382-383, 390-393, 407-408, 429-432, 477-477, 519-519, 559-559, 575-579, 582-583, 594-594, 612-614, 629-629, 638-638, 661-661, 666-666, 691-692, 714-714, 735-736
src/log_surgeon/Lalr1Parser.tpp (1)
69-72
: LGTM! Consistent modernization of C++ syntax.The changes to use trailing return types and
auto
for variable declarations are consistent with modern C++ practices and improve code maintainability.Also applies to: 78-81, 86-89, 153-153, 163-166, 173-177, 182-184, 213-214, 220-220, 241-243, 270-270, 277-278, 283-286, 302-304, 308-308, 321-322, 335-338, 348-351, 366-368, 379-380, 383-383, 390-393, 407-408, 416-416, 422-422, 428-430, 432-432, 440-440, 477-477, 519-519, 530-530, 559-559, 575-579, 582-583, 594-594, 612-614, 629-629, 638-638, 645-645, 661-661, 666-666, 691-692, 714-714, 735-736
src/log_surgeon/Parser.tpp (1)
33-36
: Use ofauto
with trailing return type-> void
is acceptableWhile using
auto
with a trailing return type for functions returningvoid
may be unconventional, it's acceptable if it aligns with the project's coding style. Consistency across the codebase can enhance readability.examples/intersect-test.cpp (1)
44-44
: Confirm safety of movingnfa
intodfa2
At line 44,
nfa
is moved intodfa2
usingstd::move(nfa)
. Please ensure thatnfa
is not used after this operation to prevent undefined behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise looks good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
src/log_surgeon/finite_automata/Dfa.hpp (3)
50-56
: Consider optimizing lambda captures.The lambda
create_dfa_state
capturesthis
,dfa_states
, andunmarked_sets
by reference. Sincethis
is only used for callingnew_state
, consider capturing it by value for better clarity of ownership.- auto create_dfa_state - = [this, &dfa_states, &unmarked_sets](StateSet const& set) -> TypedDfaState* { + auto create_dfa_state + = [this = this, &dfa_states, &unmarked_sets](StateSet const& set) -> TypedDfaState* {
64-74
: Consider extracting transition mapping logic.The nested loops for building the transitions map could be extracted into a separate method for better readability and maintainability, especially when UTF-8 support is added.
Consider creating a helper method:
template <typename TypedNfaState> auto build_transitions_map(StateSet const& set) -> std::map<uint32_t, StateSet> { std::map<uint32_t, StateSet> ascii_transitions_map; for (auto const* s0 : set) { for (uint32_t i = 0; i < cSizeOfByte; i++) { for (auto* const s1 : s0->get_byte_transitions(i)) { StateSet closure = s1->epsilon_closure(); ascii_transitions_map[i].insert(closure.begin(), closure.end()); } } } return ascii_transitions_map; }
73-73
: Track UTF-8 support requirements.Multiple TODO comments indicate pending UTF-8 support implementation. These should be tracked in the issue tracker.
Would you like me to create GitHub issues to track the UTF-8 support implementation requirements?
Also applies to: 90-90
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/log_surgeon/finite_automata/Dfa.hpp
(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/log_surgeon/finite_automata/Dfa.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
🔇 Additional comments (1)
src/log_surgeon/finite_automata/Dfa.hpp (1)
10-10
: LGTM! Well-structured constructor declaration.The constructor is properly templated and takes advantage of move semantics by accepting the NFA by value. The include statement is appropriately placed.
Also applies to: 16-17
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the PR title, how about:
refactor: Move `epsilon_closure` from `Lexer` to `NfaState`; Refactor `nfa_to_dfa` as `Dfa`'s constructor.
epsilon_closure
and nfa_to_dfa
from Lexer
class to Dfa
class.epsilon_closure
from Lexer
to NfaState
; Refactor nfa_to_dfa
as Dfa
's constructor.
References
Description
epsilon_closure
fromLexer
class toNfaState
class.nfa_to_dfa
inLexer
class toDfa
class constructor.Validation performed
Previously existing tests succeed.
Summary by CodeRabbit
Release Notes
Refactor
auto
type inferenceNew Features
epsilon_closure()
method to compute reachable states via epsilon transitionsPerformance