Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Split NFA positive tags into start and end transitions to encapsulate a capture group. #50

Merged
merged 257 commits into from
Nov 21, 2024
Merged
Show file tree
Hide file tree
Changes from 250 commits
Commits
Show all changes
257 commits
Select commit Hold shift + click to select a range
e05acbb
Add tags to AST; Serialize AST for testing; Add unit-test for testing…
SharafMohamed Sep 13, 2024
5e61e83
Use using to condense code; Use a unique schema object for each test …
SharafMohamed Sep 13, 2024
082090d
Add has_capture_groups(); Add unit-test for has_capture_groups()
SharafMohamed Sep 13, 2024
2c6d94e
Create and use RegexASTEmpty to split RegexASTgroup with min=0 into R…
SharafMohamed Sep 13, 2024
4e02f24
Add unit-test for 0 repetition regex
SharafMohamed Sep 13, 2024
bb3c543
Add more tests for repetition regex
SharafMohamed Sep 13, 2024
54027ad
Return by value in literal getters; Use const instead of const& for l…
SharafMohamed Sep 16, 2024
e58274f
Refactor new_state()
SharafMohamed Sep 16, 2024
1321871
Rename get_first_matching_variable_ids() to get_matching_variable_ids…
SharafMohamed Sep 16, 2024
c904755
Remove redundant docstrings
SharafMohamed Sep 16, 2024
ffe9a0f
Remove has_capture_groups()
SharafMohamed Sep 16, 2024
913ed1a
Const and auto changes
SharafMohamed Sep 16, 2024
795add3
Add tagged-nfa
SharafMohamed Sep 16, 2024
6e45657
Clarify that the add functions are adding to the nfa; Make add to nfa…
SharafMohamed Sep 17, 2024
7aa8a92
Changed AST add functions to indicate the AST are being added to the …
SharafMohamed Sep 17, 2024
d1d87e7
Merged with previous PR
SharafMohamed Sep 17, 2024
f386a3b
Merge branch 'tagged-ast' into pre-tagged-nfa-cleanup
SharafMohamed Sep 17, 2024
0c600d7
Merge branch 'pre-tagged-nfa-cleanup' into regex-ast-empty
SharafMohamed Sep 17, 2024
bedad75
Change add in RegexASTEmpty to add_to_nfa
SharafMohamed Sep 17, 2024
c78f79c
Merge with previous PR
SharafMohamed Sep 17, 2024
cd54e64
Fix and refactor NFA unit-test
SharafMohamed Sep 17, 2024
06c7066
Merge with previous PRs and update some ints to uints.
SharafMohamed Oct 8, 2024
38ab6fe
Fix compiler error.
SharafMohamed Oct 8, 2024
4c6b9c6
Fix compiler error where macos considers a struct default constructor…
SharafMohamed Oct 8, 2024
2e71aaa
Add state_type explicitly.
SharafMohamed Oct 8, 2024
c062a2c
Add state_type explicitly.
SharafMohamed Oct 8, 2024
eaa5674
Remove commented out code.
SharafMohamed Oct 8, 2024
f150474
Remove errent +=.
SharafMohamed Oct 8, 2024
bdafe10
Replace constructors with aggregate initialization.
SharafMohamed Oct 8, 2024
335bb34
Replace static inline with static constexpr.
SharafMohamed Oct 8, 2024
8446390
Undo last commit.
SharafMohamed Oct 8, 2024
73d8e46
Fix comment.
SharafMohamed Oct 8, 2024
7871f80
Finish changes of int to uint32_t for SymbolID.
SharafMohamed Oct 8, 2024
56483c9
Added comment explaining use of uint32_t for SymbolID.
SharafMohamed Oct 8, 2024
cafa973
Finish removing ints that should be uint32_t.
SharafMohamed Oct 8, 2024
a2b1bfd
Fix formatting.
SharafMohamed Oct 8, 2024
2fb4831
Rename SymbolID to SymbolId; Remove redundant ID for SymbolIds enum v…
SharafMohamed Oct 10, 2024
79482b1
Use docstring instead of inline comment.
SharafMohamed Oct 10, 2024
0237854
Use `auto`.
SharafMohamed Oct 10, 2024
c935af5
Use `const` for error code.
SharafMohamed Oct 10, 2024
91b5e78
Use `auto` and `const` for `add_to_nfa_with_negative_tags`.
SharafMohamed Oct 10, 2024
f6c86ec
Use 'auto' for `intermediate_state`.
SharafMohamed Oct 10, 2024
8fd70d7
Replace `find` with `at`.
SharafMohamed Oct 10, 2024
dd03a35
Use `auto` for `intermediate_state`.
SharafMohamed Oct 10, 2024
fd6bb02
Added constructors for tagged transition classes.
SharafMohamed Oct 10, 2024
65861c3
Add getters to tagged transition classes.
SharafMohamed Oct 10, 2024
dfb7dcf
Use emplace_back instead of push_back for tagged transitions.
SharafMohamed Oct 10, 2024
473787e
Use `const` for `factor`.
SharafMohamed Oct 10, 2024
2a08121
Use `const` for `sub_factor`.
SharafMohamed Oct 10, 2024
0b7e38b
Use list initialization for `rule`.
SharafMohamed Oct 10, 2024
f3b0f6a
Use list initialization for `var_schema`.
SharafMohamed Oct 10, 2024
74793b3
Group `visited_states` modifications together.
SharafMohamed Oct 10, 2024
d2f38fa
Use unordered_map instead of map for state_ids.
SharafMohamed Oct 10, 2024
a7f7a14
Make add_to_queue lambda a helper called add_to_queue_and_visited.
SharafMohamed Oct 10, 2024
d244a80
Replace const& with std::move when dealing with negative_tags.
SharafMohamed Oct 10, 2024
158df37
Run auto-formatter.
SharafMohamed Oct 10, 2024
f82b46f
Remove incorrect comment.
SharafMohamed Oct 16, 2024
c87caf9
Move LexicalRule to its own class; Pass rules into NFA construction; …
SharafMohamed Oct 20, 2024
abe55e2
Add tagged transitions during RegexNFAState construction; Remove unus…
SharafMohamed Oct 20, 2024
6d1db10
Fix compiler errors in intersect-test.
SharafMohamed Oct 20, 2024
a5413d0
Run linter.
SharafMohamed Oct 20, 2024
a4a4ab7
Fix headgaurd comment in LexicalRule.hpp.
SharafMohamed Oct 20, 2024
aa93847
Run linter.
SharafMohamed Oct 20, 2024
abb2656
Improve naming of intermediate state for postive and negative tagged …
SharafMohamed Oct 20, 2024
2f1c588
Move serialize method from test into classes; Clean up serialize code…
SharafMohamed Oct 20, 2024
9835eb0
Fix compiler error.
SharafMohamed Oct 20, 2024
dcd79a6
Improve var naming; Improve docstring.
SharafMohamed Oct 20, 2024
73300e7
Improve docstrings for serialize() methods.
SharafMohamed Oct 20, 2024
8548bd9
Add get_traversal_order() to NFA; Fix docstrings.
SharafMohamed Oct 20, 2024
38720f7
Add missing include to test-intersect.
SharafMohamed Oct 20, 2024
b700d99
Update src/log_surgeon/LexicalRule.hpp
SharafMohamed Oct 23, 2024
12e930c
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
0a104ff
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
7c126eb
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
7e43f99
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
5957bfb
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
98b5242
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
06742ba
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
021ac00
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
29e9c43
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
bd6081b
Update docstring for get_travel_order().
SharafMohamed Oct 23, 2024
16edf6f
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
b4b0b63
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
d108697
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
eef79d2
Update tests/test-NFA.cpp
SharafMohamed Oct 23, 2024
0d599cb
Update tests/test-NFA.cpp
SharafMohamed Oct 23, 2024
2807141
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
8e225cd
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
ecb84fb
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
6fc6030
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
f83ac5f
Rename get_traversal_order() to get_bfs_tranversal_order() and upate …
SharafMohamed Oct 23, 2024
e3214f1
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
d7d6dbe
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
fc55354
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
fbc25c8
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
df070c3
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
84cd573
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
a35f61f
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
53ba56a
Remove unused using.
SharafMohamed Oct 23, 2024
8a677e3
Remove empty namespace.
SharafMohamed Oct 23, 2024
f17f752
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
cbe1d39
Make traversal_order const.
SharafMohamed Oct 23, 2024
77bf2e0
Merge branch 'tagged-nfa-new' of https://github.com/SharafMohamed/log…
SharafMohamed Oct 23, 2024
a9d0ef3
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
43ec3f0
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
45372df
Add missing using for std::move.
SharafMohamed Oct 23, 2024
d0ba724
Merge branch 'tagged-nfa-new' of https://github.com/SharafMohamed/log…
SharafMohamed Oct 23, 2024
f69aa86
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
723eabb
Use move semantic for NFA constructor.
SharafMohamed Oct 23, 2024
8d40656
Merge branch 'tagged-nfa-new' of https://github.com/SharafMohamed/log…
SharafMohamed Oct 23, 2024
d20e391
Move add_to_queue_and_visited() to lambda.
SharafMohamed Oct 23, 2024
6a312e9
Fix compiler error in intersect-test.
SharafMohamed Oct 23, 2024
f8e5f8f
Simplify new_state().
SharafMohamed Oct 24, 2024
fc25f00
Remove using for std::move, and explicitly add namespace.
SharafMohamed Oct 24, 2024
cdab650
Update serialize docstring.
SharafMohamed Oct 24, 2024
e8db277
Have internal serialize() functions for RegexNFA (states and tagged t…
SharafMohamed Oct 24, 2024
337cead
Reserve space during BFS; Run linter.
SharafMohamed Oct 24, 2024
4a30fdc
Add braced initialization to nfa.
SharafMohamed Oct 27, 2024
0203038
Update docstring for positive tag serialization.
SharafMohamed Oct 27, 2024
633acc4
Update docstring for negative tag serialization.
SharafMohamed Oct 27, 2024
4db7b82
Use return statement for full docstring of get_bfs_traversal_order.
SharafMohamed Oct 27, 2024
01f8b14
Update NFA serialize() docstring.
SharafMohamed Oct 27, 2024
d047624
Add long form of BFS for first use.
SharafMohamed Oct 27, 2024
f9c4f46
Use const for state_id_it.
SharafMohamed Oct 27, 2024
bd77c78
Update docstring for NFA state serialize.
SharafMohamed Oct 27, 2024
f2d8049
Combine the two failure cases in NFA state serailize's docstring to m…
SharafMohamed Oct 27, 2024
4cb560f
Use const for state_id_it.
SharafMohamed Oct 27, 2024
95b7497
For NFA state serialize flip order of failure checks to reduce indent…
SharafMohamed Oct 27, 2024
e187445
Merge branch 'tagged-nfa-new' of https://github.com/SharafMohamed/log…
SharafMohamed Oct 27, 2024
8b85511
Use const& for passing rules into the NFA as rules are never stored, …
SharafMohamed Oct 28, 2024
0756794
Use braced initialization for NFA.
SharafMohamed Oct 28, 2024
6ab439a
Remove warning for not check std::optional when we know the function …
SharafMohamed Oct 28, 2024
9244812
Remove redundant initialzation of member variables in tagged transiti…
SharafMohamed Oct 28, 2024
0d151a4
Use member initialization lists for constructing NFA state from tagge…
SharafMohamed Oct 28, 2024
ac63713
Switch to using optional prefix for optional return types.
SharafMohamed Oct 28, 2024
b57b93f
Make negative tagged transition singular as you can never have more t…
SharafMohamed Oct 28, 2024
c3fb16d
Add missing param for new_state_with_negative_tagged_transitions.
SharafMohamed Oct 28, 2024
8a41367
Move RegexNFAStateType, RegexNFAState, and PositiveTaggedTransition/N…
SharafMohamed Oct 28, 2024
d1a57e4
Add tag class.
SharafMohamed Oct 28, 2024
bc78f59
Make tag an object with name, start, and end information, instead of …
SharafMohamed Oct 29, 2024
ac7260f
Run linter.
SharafMohamed Oct 29, 2024
40a8206
Merge branch 'main' into singular-negative-transition
SharafMohamed Oct 31, 2024
c2eea21
Change t to curr_state and u to dest_state.
SharafMohamed Oct 31, 2024
629fce9
Change curr_state to current_state; Remove extraneous *; Add newline …
SharafMohamed Oct 31, 2024
aed62b2
Add TODO for utf8 case in BFS.
SharafMohamed Oct 31, 2024
34522a7
Use auto and fix order of const wrt to *.
SharafMohamed Oct 31, 2024
332af35
Initialize m_dest_state to nullptr.
SharafMohamed Oct 31, 2024
748e794
Change negative_tagged_transition to negative_tagged_transition_string.
SharafMohamed Oct 31, 2024
38dc22b
Change negative tag transitions to singular.
SharafMohamed Oct 31, 2024
5a30ed8
Switch transitions to singular where applicable.
SharafMohamed Oct 31, 2024
c8bf9e6
Merge changes with previous PR manually. Still missing changes to pre…
SharafMohamed Oct 31, 2024
90edf77
Auto linter.
SharafMohamed Oct 31, 2024
fd765f7
Merge branch 'singular-negative-transition' into individual-files
SharafMohamed Oct 31, 2024
f7d3415
Merge branch 'individual-files' into meaningful-tags
SharafMohamed Oct 31, 2024
b5f7cdf
Modify expected output where ordering of negative tags is ambiguous. …
SharafMohamed Oct 31, 2024
d90b731
Add a description for how to use the tag.
SharafMohamed Oct 31, 2024
3f1f8ff
Add start and end positive transitions.
SharafMohamed Oct 31, 2024
2bd5d2c
Add functionality to tags to use it for tracking capture positions; R…
SharafMohamed Oct 31, 2024
2d0157e
Reduce indentation of epsilon closure by using continue.
SharafMohamed Oct 31, 2024
1cabafd
Use optional for negative transitions in RegexNFAState.
SharafMohamed Oct 31, 2024
dc2c637
Add missing headers; Remove unused headers.
SharafMohamed Nov 1, 2024
7c5cfc0
Assign optional_negative_tagged_transition to a reference.
SharafMohamed Nov 1, 2024
4e8d290
Assign optional_negative_tagged_transition to a reference again.
SharafMohamed Nov 1, 2024
fde9037
Add <stack> to Lexer.tpp.
SharafMohamed Nov 1, 2024
e63637e
Fix comment grammar.
SharafMohamed Nov 1, 2024
08e7d5e
Update with previous PR.
SharafMohamed Nov 1, 2024
f7b5666
Merge branch 'individual-files' into meaningful-tags
SharafMohamed Nov 1, 2024
93aebd5
Merge branch 'meaningful-tags' into fixed-tagged-nfa
SharafMohamed Nov 1, 2024
b8c8f77
Store negative tags in a vector instead of set so that the order is d…
SharafMohamed Nov 1, 2024
ef95061
Sync with previous PR.
SharafMohamed Nov 2, 2024
b55e96c
Merge branch 'individual-files' into meaningful-tags
SharafMohamed Nov 2, 2024
304f612
Merge branch 'meaningful-tags' into fixed-tagged-nfa
SharafMohamed Nov 2, 2024
7cc8c52
Add start tags to NFA.
SharafMohamed Nov 2, 2024
b1a9300
Update unit-test to handle start transitions.
SharafMohamed Nov 2, 2024
9da470d
Merge branch 'main' into individual-files
SharafMohamed Nov 2, 2024
b451651
Move RegexNFAXState typedef into RegexNFAState.hpp
SharafMohamed Nov 6, 2024
f71348b
Switch void to auto -> void.
SharafMohamed Nov 6, 2024
21e80b9
Merge branch 'individual-files' of https://github.com/SharafMohamed/l…
SharafMohamed Nov 6, 2024
4576d7d
Move short functions into the class definition; Move RegexNFAXState t…
SharafMohamed Nov 6, 2024
6e24969
Merge branch 'individual-files' into meaningful-tags
SharafMohamed Nov 7, 2024
ff91bcc
Merge branch 'main' into meaningful-tags
SharafMohamed Nov 7, 2024
e786ec6
Merge branch 'meaningful-tags' into fixed-tagged-nfa
SharafMohamed Nov 7, 2024
5abe906
Auto format.
SharafMohamed Nov 7, 2024
bb0bd2e
Remove unused lambda; Auto format.
SharafMohamed Nov 7, 2024
a36bb90
Add test case for Tag class.
SharafMohamed Nov 7, 2024
59cc6cd
Add nullptr checks.
SharafMohamed Nov 7, 2024
8097a69
Merge branch 'meaningful-tags' into fixed-tagged-nfa
SharafMohamed Nov 7, 2024
9fc41c0
Change Tag class functionality to reflect how registers will be used.
SharafMohamed Nov 7, 2024
d060bc6
Temp fix for unit-test until future PR where Tag ptrs are stored in v…
SharafMohamed Nov 11, 2024
f041a37
Swap from set to vector to tag pointers to ensure determinism.
SharafMohamed Nov 11, 2024
f72e120
Better test coverage for tag class.
SharafMohamed Nov 11, 2024
d5ac1ad
Use constant iterators for elements that should not change.
SharafMohamed Nov 12, 2024
30f03ed
Use braced intiailization in test-tag.cpp.
SharafMohamed Nov 12, 2024
d386fc0
Use const& for insertion function that can't use move semantics.
SharafMohamed Nov 12, 2024
4024c3e
Have get_name() return string_view; Update headers.
SharafMohamed Nov 13, 2024
22c3b82
Remove const from member variable.
SharafMohamed Nov 13, 2024
ed55534
Remove const from member variable.
SharafMohamed Nov 13, 2024
534afce
Run linter.
SharafMohamed Nov 13, 2024
61fdb5d
Add move semantic test cases.
SharafMohamed Nov 13, 2024
78e5fe8
Add PositiveTaggedTransition docstring and make m_tag throw if ever n…
SharafMohamed Nov 13, 2024
630d882
Delete unused operators.
SharafMohamed Nov 13, 2024
543f8af
Move null check into intiailizer list for NegativeTaggedTransition co…
SharafMohamed Nov 13, 2024
ec342fc
Remove position vectors from Tag, as they arent used in the AST.
SharafMohamed Nov 13, 2024
af86281
RegexASTCapture enforces non-null arguments; Add docstring to RegexAS…
SharafMohamed Nov 13, 2024
738becd
Capitalize exceptions.
SharafMohamed Nov 13, 2024
789263e
Use () to fix linting issue.
SharafMohamed Nov 13, 2024
1f15ca7
Keep default copy assignment.
SharafMohamed Nov 14, 2024
7688c24
Move @throw to constructor docstrings.
SharafMohamed Nov 14, 2024
27618b2
Merge branch 'meaningful-tags' into fixed-tagged-nfa
SharafMohamed Nov 14, 2024
486190a
Do string_viee comparisomn in lexer test.
SharafMohamed Nov 14, 2024
ac75909
Use string_view compares in tag tests.
SharafMohamed Nov 14, 2024
090f18c
Update headers in TaggedTransition.hpp.
SharafMohamed Nov 15, 2024
c7cfc10
Seperate copy and move constructor unit-tests.
SharafMohamed Nov 15, 2024
91b8b51
Use NOTE for class requirements.
SharafMohamed Nov 15, 2024
fcb1a76
Use NOTE for class requirements.
SharafMohamed Nov 15, 2024
9b09e19
Use NOTE for class requirements.
SharafMohamed Nov 15, 2024
2f712e6
Merge branch 'meaningful-tags' into fixed-tagged-nfa
SharafMohamed Nov 18, 2024
75aecc4
Update install-catch2.sh to compile catch2 with c++17.
SharafMohamed Nov 18, 2024
9302b94
Merge branch 'main' into fixed-tagged-nfa
SharafMohamed Nov 18, 2024
97caabb
Merge branch 'catch2-install-fix' into fixed-tagged-nfa
SharafMohamed Nov 18, 2024
507a7d3
Loop over end_transitions correctly.
SharafMohamed Nov 18, 2024
34c227b
Add TagPositions class.
SharafMohamed Nov 18, 2024
27c8560
Remove new class, going to add it later.
SharafMohamed Nov 18, 2024
86caa9b
Add const back in.
SharafMohamed Nov 18, 2024
338638e
Add more const back in.
SharafMohamed Nov 18, 2024
a742601
Add more const back in.
SharafMohamed Nov 18, 2024
d358713
Linter.
SharafMohamed Nov 18, 2024
43870ea
Add more const back in.
SharafMohamed Nov 18, 2024
f941607
Use `auto`.
SharafMohamed Nov 19, 2024
aad9eb3
Fix spacing.
SharafMohamed Nov 19, 2024
a801bf8
Add diagram for capture group NFA.
SharafMohamed Nov 19, 2024
08b7548
Add const for consitency with constructor.
SharafMohamed Nov 19, 2024
449133e
Update positive end transition to be optional instead of a vector.
SharafMohamed Nov 19, 2024
7b837bf
Rename new_state function correctly.
SharafMohamed Nov 19, 2024
f0eb56b
Update capture group AST state creation.
SharafMohamed Nov 19, 2024
a945915
Encapsulate new state for capture group.
SharafMohamed Nov 19, 2024
c757ded
Fix compiler error.
SharafMohamed Nov 19, 2024
2eb7477
Use singular for end transition getter function.
SharafMohamed Nov 20, 2024
08060ed
Void to auto -> void.
SharafMohamed Nov 20, 2024
0c2c1d1
Update new_capture_group_start_states to new_capture_group_states to …
SharafMohamed Nov 20, 2024
b0b951a
Linter.
SharafMohamed Nov 20, 2024
3c2a2ab
Update docstring for .
SharafMohamed Nov 20, 2024
98c5b95
Rename to new_start_and_end_states_with_positively_tagged_transitions.
SharafMohamed Nov 20, 2024
f59cf41
Rename to capture_X_state.
SharafMohamed Nov 20, 2024
85a2d69
Update docstring.
SharafMohamed Nov 20, 2024
4c602d4
Updated diagram to match vars used in code.
SharafMohamed Nov 20, 2024
2b01433
Rename vars to serialized_X.
SharafMohamed Nov 20, 2024
e37b29a
Run Linter.
SharafMohamed Nov 20, 2024
c5beca3
Fix typo.
SharafMohamed Nov 20, 2024
fe4a7b3
Update diagram for capture group NFA.
SharafMohamed Nov 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 9 additions & 3 deletions src/log_surgeon/Lexer.tpp
Original file line number Diff line number Diff line change
Expand Up @@ -405,11 +405,17 @@ auto Lexer<NFAStateType, DFAStateType>::epsilon_closure(NFAStateType const* stat
}

// TODO: currently treat tagged transitions as epsilon transitions
for (auto const& positive_tagged_transition :
current_state->get_positive_tagged_transitions())
for (auto const& positive_tagged_start_transition :
current_state->get_positive_tagged_start_transitions())
{
stack.push(positive_tagged_transition.get_dest_state());
stack.push(positive_tagged_start_transition.get_dest_state());
}
auto const& optional_positive_tagged_end_transition
= current_state->get_positive_tagged_end_transition();
if (optional_positive_tagged_end_transition.has_value()) {
stack.push(optional_positive_tagged_end_transition.value().get_dest_state());
}

auto const& optional_negative_tagged_transition
= current_state->get_negative_tagged_transition();
if (optional_negative_tagged_transition.has_value()) {
Expand Down
28 changes: 20 additions & 8 deletions src/log_surgeon/finite_automata/RegexAST.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -693,11 +693,11 @@ class RegexASTCapture : public RegexAST<NFAStateType> {

/**
* Adds the needed `RegexNFA::states` to the passed in nfa to handle a
* `RegexASTCapture` before transitioning to an accepting `end_state`.
* `RegexASTCapture` before transitioning to a `dest_state`.
* @param nfa
* @param end_state
* @param dest_state
*/
auto add_to_nfa(RegexNFA<NFAStateType>* nfa, NFAStateType* end_state) const -> void override;
auto add_to_nfa(RegexNFA<NFAStateType>* nfa, NFAStateType* dest_state) const -> void override;

[[nodiscard]] auto serialize() const -> std::u32string override;

Expand Down Expand Up @@ -892,11 +892,23 @@ template <typename NFAStateType>
}

template <typename NFAStateType>
void RegexASTCapture<NFAStateType>::add_to_nfa(RegexNFA<NFAStateType>* nfa, NFAStateType* end_state)
const {
auto* state_with_positive_tagged_transition
= nfa->new_state_with_positive_tagged_transition(m_tag.get(), end_state);
m_group_regex_ast->add_to_nfa_with_negative_tags(nfa, state_with_positive_tagged_transition);
auto RegexASTCapture<NFAStateType>::add_to_nfa(
RegexNFA<NFAStateType>* nfa,
NFAStateType* dest_state
) const -> void {
// root --(pos_tagged_start_transition)--> capture_group_start_state -->
// [inner capture group NFA] --(neg_tagged_transition)--> neg_state -->
// state_with_positive_tagged_end_transition --(pos_tagged_end_transition)--> end_state
LinZhihao-723 marked this conversation as resolved.
Show resolved Hide resolved
auto [capture_start_state, capture_end_state]
= nfa->new_start_and_end_states_with_positively_tagged_transitions(
m_tag.get(),
dest_state
);

auto* initial_root = nfa->get_root();
nfa->set_root(capture_start_state);
m_group_regex_ast->add_to_nfa_with_negative_tags(nfa, capture_end_state);
nfa->set_root(initial_root);
}

template <typename NFAStateType>
Expand Down
45 changes: 39 additions & 6 deletions src/log_surgeon/finite_automata/RegexNFA.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -35,13 +35,13 @@ class RegexNFA {
[[nodiscard]] auto new_state() -> NFAStateType*;

/**
* Creates a unique_ptr for an NFA state with a positive tagged transition and adds it to
* Creates a unique_ptr for an NFA state with a positive tagged end transition and adds it to
* `m_states`.
* @param tag
* @param dest_state
* @return NFAStateType*
LinZhihao-723 marked this conversation as resolved.
Show resolved Hide resolved
*/
[[nodiscard]] auto new_state_with_positive_tagged_transition(
[[nodiscard]] auto new_state_with_positive_tagged_end_transition(
LinZhihao-723 marked this conversation as resolved.
Show resolved Hide resolved
Tag const* tag,
NFAStateType const* dest_state
) -> NFAStateType*;
Expand All @@ -58,6 +58,19 @@ class RegexNFA {
NFAStateType const* dest_state
) -> NFAStateType*;

/**
* Creates the start and end states for a capture group.
* @param tag The tag associated with the capture group.
* @param dest_state
* @return A pair of states:
* - A new state with a positive tagged start transition from `m_root`.
* - A new state with a positive tagged end transition to `dest_state`.
*/
[[nodiscard]] auto new_start_and_end_states_with_positively_tagged_transitions(
LinZhihao-723 marked this conversation as resolved.
Show resolved Hide resolved
Tag const* tag,
NFAStateType const* dest_state
) -> std::pair<NFAStateType*, NFAStateType*>;

/**
* @return A vector representing the traversal order of the NFA states using breadth-first
* search (BFS).
Expand Down Expand Up @@ -101,7 +114,7 @@ auto RegexNFA<NFAStateType>::new_state() -> NFAStateType* {
}

template <typename NFAStateType>
auto RegexNFA<NFAStateType>::new_state_with_positive_tagged_transition(
auto RegexNFA<NFAStateType>::new_state_with_positive_tagged_end_transition(
Tag const* tag,
NFAStateType const* dest_state
) -> NFAStateType* {
Expand All @@ -118,6 +131,18 @@ auto RegexNFA<NFAStateType>::new_state_with_negative_tagged_transition(
return m_states.back().get();
}

template <typename NFAStateType>
auto RegexNFA<NFAStateType>::new_start_and_end_states_with_positively_tagged_transitions(
Tag const* tag,
NFAStateType const* dest_state
) -> std::pair<NFAStateType*, NFAStateType*> {
auto* start_state = new_state();
m_root->add_positive_tagged_start_transition(tag, start_state);

auto* end_state = new_state_with_positive_tagged_end_transition(tag, dest_state);
return {start_state, end_state};
}

template <typename NFAStateType>
auto RegexNFA<NFAStateType>::get_bfs_traversal_order() const -> std::vector<NFAStateType const*> {
std::queue<NFAStateType const*> state_queue;
Expand Down Expand Up @@ -147,11 +172,19 @@ auto RegexNFA<NFAStateType>::get_bfs_traversal_order() const -> std::vector<NFAS
for (auto const* dest_state : current_state->get_epsilon_transitions()) {
add_to_queue_and_visited(dest_state);
}
for (auto const& positive_tagged_transition :
current_state->get_positive_tagged_transitions())
for (auto const& positive_tagged_start_transition :
current_state->get_positive_tagged_start_transitions())
{
add_to_queue_and_visited(positive_tagged_transition.get_dest_state());
add_to_queue_and_visited(positive_tagged_start_transition.get_dest_state());
}

auto const& optional_positive_tagged_end_transition
= current_state->get_positive_tagged_end_transition();
if (optional_positive_tagged_end_transition.has_value()) {
add_to_queue_and_visited(optional_positive_tagged_end_transition.value().get_dest_state(
));
}

auto const& optional_negative_tagged_transition
= current_state->get_negative_tagged_transition();
if (optional_negative_tagged_transition.has_value()) {
Expand Down
51 changes: 38 additions & 13 deletions src/log_surgeon/finite_automata/RegexNFAState.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ class RegexNFAState {
RegexNFAState() = default;

RegexNFAState(Tag const* tag, RegexNFAState const* dest_state)
: m_positive_tagged_transitions{{tag, dest_state}} {}
: m_positive_tagged_end_transition{PositiveTaggedTransition{tag, dest_state}} {}

RegexNFAState(std::vector<Tag const*> tags, RegexNFAState const* dest_state)
: m_negative_tagged_transition{NegativeTaggedTransition{std::move(tags), dest_state}} {}
Expand All @@ -49,9 +49,19 @@ class RegexNFAState {
return m_matching_variable_id;
}

[[nodiscard]] auto get_positive_tagged_transitions(
auto
add_positive_tagged_start_transition(Tag const* tag, RegexNFAState const* dest_state) -> void {
m_positive_tagged_start_transitions.emplace_back(tag, dest_state);
}

[[nodiscard]] auto get_positive_tagged_start_transitions(
) const -> std::vector<PositiveTaggedTransition<RegexNFAState>> const& {
return m_positive_tagged_transitions;
return m_positive_tagged_start_transitions;
}

[[nodiscard]] auto get_positive_tagged_end_transition(
) const -> std::optional<PositiveTaggedTransition<RegexNFAState>> const& {
return m_positive_tagged_end_transition;
}

[[nodiscard]] auto get_negative_tagged_transition(
Expand Down Expand Up @@ -100,7 +110,8 @@ class RegexNFAState {
private:
bool m_accepting{false};
uint32_t m_matching_variable_id{0};
std::vector<PositiveTaggedTransition<RegexNFAState>> m_positive_tagged_transitions;
std::vector<PositiveTaggedTransition<RegexNFAState>> m_positive_tagged_start_transitions;
std::optional<PositiveTaggedTransition<RegexNFAState>> m_positive_tagged_end_transition;
std::optional<NegativeTaggedTransition<RegexNFAState>> m_negative_tagged_transition;
std::vector<RegexNFAState*> m_epsilon_transitions;
std::array<std::vector<RegexNFAState*>, cSizeOfByte> m_bytes_transitions;
Expand Down Expand Up @@ -176,14 +187,26 @@ auto RegexNFAState<state_type>::serialize(
epsilon_transitions.emplace_back(std::to_string(state_ids.at(dest_state)));
}

std::vector<std::string> positive_tagged_transitions;
for (auto const& positive_tagged_transition : m_positive_tagged_transitions) {
auto const optional_serialized_positive_transition
= positive_tagged_transition.serialize(state_ids);
if (false == optional_serialized_positive_transition.has_value()) {
std::vector<std::string> positive_tagged_start_transition_strings;
SharafMohamed marked this conversation as resolved.
Show resolved Hide resolved
for (auto const& positive_tagged_start_transition : m_positive_tagged_start_transitions) {
auto const optional_serialized_positive_start_transition
= positive_tagged_start_transition.serialize(state_ids);
if (false == optional_serialized_positive_start_transition.has_value()) {
return std::nullopt;
}
positive_tagged_start_transition_strings.emplace_back(
optional_serialized_positive_start_transition.value()
);
}

std::string positive_tagged_end_transition_string;
SharafMohamed marked this conversation as resolved.
Show resolved Hide resolved
if (m_positive_tagged_end_transition.has_value()) {
auto const optional_serialized_positive_end_transition
= m_positive_tagged_end_transition.value().serialize(state_ids);
if (false == optional_serialized_positive_end_transition.has_value()) {
return std::nullopt;
}
positive_tagged_transitions.emplace_back(optional_serialized_positive_transition.value());
positive_tagged_end_transition_string = optional_serialized_positive_end_transition.value();
}

std::string negative_tagged_transition_string;
Expand All @@ -200,13 +223,15 @@ auto RegexNFAState<state_type>::serialize(
= m_accepting ? fmt::format("accepting_tag={},", m_matching_variable_id) : "";

return fmt::format(
"{}:{}byte_transitions={{{}}},epsilon_transitions={{{}}},positive_tagged_transitions={{"
"{}}},negative_tagged_transition={{{}}}",
"{}:{}byte_transitions={{{}}},epsilon_transitions={{{}}},positive_tagged_start_"
"transitions={{{}}},positive_tagged_end_transitions={{{}}},negative_tagged_transition={"
"{{}}}",
state_ids.at(this),
accepting_tag_string,
fmt::join(byte_transitions, ","),
fmt::join(epsilon_transitions, ","),
fmt::join(positive_tagged_transitions, ","),
fmt::join(positive_tagged_start_transition_strings, ","),
positive_tagged_end_transition_string,
negative_tagged_transition_string
);
}
Expand Down
85 changes: 59 additions & 26 deletions tests/test-NFA.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -49,58 +49,91 @@ TEST_CASE("Test NFA", "[NFA]") {
// Compare against expected output
string expected_serialized_nfa = "0:byte_transitions={A-->1,Z-->2},"
"epsilon_transitions={},"
"positive_tagged_transitions={},"
"positive_tagged_start_transitions={},"
"positive_tagged_end_transitions={},"
"negative_tagged_transition={}\n";
expected_serialized_nfa += "1:byte_transitions={a-->3,b-->3,c-->4,d-->4},"
expected_serialized_nfa += "1:byte_transitions={},"
"epsilon_transitions={},"
"positive_tagged_transitions={},"
"positive_tagged_start_transitions={3[letter]},"
"positive_tagged_end_transitions={},"
"negative_tagged_transition={}\n";
expected_serialized_nfa
+= "2:byte_transitions={},"
"epsilon_transitions={},"
"positive_tagged_transitions={},"
"negative_tagged_transition={5[letter1,letter2,letter,containerID]}\n";
"positive_tagged_start_transitions={},"
"positive_tagged_end_transitions={},"
"negative_tagged_transition={4[letter1,letter2,letter,containerID]}\n";
expected_serialized_nfa += "3:byte_transitions={},"
"epsilon_transitions={},"
"positive_tagged_transitions={6[letter1]},"
"positive_tagged_start_transitions={5[letter1],6[letter2]},"
"positive_tagged_end_transitions={},"
"negative_tagged_transition={}\n";
expected_serialized_nfa += "4:byte_transitions={},"
expected_serialized_nfa += "4:accepting_tag=0,byte_transitions={},"
"epsilon_transitions={},"
"positive_tagged_transitions={7[letter2]},"
"positive_tagged_start_transitions={},"
"positive_tagged_end_transitions={},"
"negative_tagged_transition={}\n";
expected_serialized_nfa += "5:accepting_tag=0,byte_transitions={},"
expected_serialized_nfa += "5:byte_transitions={a-->7,b-->7},"
"epsilon_transitions={},"
"positive_tagged_transitions={},"
"positive_tagged_start_transitions={},"
"positive_tagged_end_transitions={},"
"negative_tagged_transition={}\n";
expected_serialized_nfa += "6:byte_transitions={},"
expected_serialized_nfa += "6:byte_transitions={c-->8,d-->8},"
"epsilon_transitions={},"
"positive_tagged_transitions={},"
"negative_tagged_transition={8[letter2]}\n";
"positive_tagged_start_transitions={},"
"positive_tagged_end_transitions={},"
"negative_tagged_transition={}\n";
expected_serialized_nfa += "7:byte_transitions={},"
"epsilon_transitions={},"
"positive_tagged_transitions={},"
"negative_tagged_transition={8[letter1]}\n";
"positive_tagged_start_transitions={},"
"positive_tagged_end_transitions={9[letter1]},"
"negative_tagged_transition={}\n";
expected_serialized_nfa += "8:byte_transitions={},"
"epsilon_transitions={},"
"positive_tagged_transitions={9[letter]},"
"positive_tagged_start_transitions={},"
"positive_tagged_end_transitions={10[letter2]},"
"negative_tagged_transition={}\n";
expected_serialized_nfa += "9:byte_transitions={},"
"epsilon_transitions={},"
"positive_tagged_start_transitions={},"
"positive_tagged_end_transitions={},"
"negative_tagged_transition={11[letter2]}\n";
expected_serialized_nfa += "10:byte_transitions={},"
"epsilon_transitions={},"
"positive_tagged_start_transitions={},"
"positive_tagged_end_transitions={},"
"negative_tagged_transition={11[letter1]}\n";
expected_serialized_nfa += "11:byte_transitions={},"
"epsilon_transitions={},"
"positive_tagged_start_transitions={},"
"positive_tagged_end_transitions={12[letter]},"
"negative_tagged_transition={}\n";
expected_serialized_nfa += "12:byte_transitions={B-->13},"
"epsilon_transitions={},"
"positive_tagged_start_transitions={},"
"positive_tagged_end_transitions={},"
"negative_tagged_transition={}\n";
expected_serialized_nfa += "9:byte_transitions={B-->10},"
expected_serialized_nfa += "13:byte_transitions={},"
"epsilon_transitions={},"
"positive_tagged_transitions={},"
"positive_tagged_start_transitions={14[containerID]},"
"positive_tagged_end_transitions={},"
"negative_tagged_transition={}\n";
expected_serialized_nfa += "10:byte_transitions={0-->11,1-->11,2-->11,3-->11,4-->11,5-->11,6-->"
"11,7-->11,8-->11,9-->11},"
expected_serialized_nfa += "14:byte_transitions={0-->15,1-->15,2-->15,3-->15,4-->15,5-->15,6-->"
"15,7-->15,8-->15,9-->15},"
"epsilon_transitions={},"
"positive_tagged_transitions={},"
"positive_tagged_start_transitions={},"
"positive_tagged_end_transitions={},"
"negative_tagged_transition={}\n";
expected_serialized_nfa += "11:byte_transitions={0-->11,1-->11,2-->11,3-->11,4-->11,5-->11,6-->"
"11,7-->11,8-->11,9-->11},"
expected_serialized_nfa += "15:byte_transitions={0-->15,1-->15,2-->15,3-->15,4-->15,5-->15,6-->"
"15,7-->15,8-->15,9-->15},"
"epsilon_transitions={},"
"positive_tagged_transitions={12[containerID]},"
"positive_tagged_start_transitions={},"
"positive_tagged_end_transitions={16[containerID]},"
"negative_tagged_transition={}\n";
expected_serialized_nfa += "12:byte_transitions={C-->5},"
expected_serialized_nfa += "16:byte_transitions={C-->4},"
"epsilon_transitions={},"
"positive_tagged_transitions={},"
"positive_tagged_start_transitions={},"
"positive_tagged_end_transitions={},"
"negative_tagged_transition={}\n";

// Compare expected and actual line-by-line
Expand Down
Loading