Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in Smack with wildcards #32

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from
Draft

Bug in Smack with wildcards #32

wants to merge 4 commits into from

Conversation

Frky
Copy link
Member

@Frky Frky commented Dec 31, 2021

When two patterns are added to the smack, with wildcards, and such as wildcards of one pattern overlaps the other one, the smack fails to match in some situations.

The minimalistic example that has been added as a test case (failing for now) is the following:

    fn test_wildcard_collision() {
        let mut smack = Smack::new("test".to_string(), SMACK_CASE_INSENSITIVE);
        smack.add_pattern(
            b"****abcd",
            0,
            SmackFlags::ANCHOR_BEGIN | SmackFlags::WILDCARDS,
        );
        smack.add_pattern(
            b"******abcd",
            1,
            SmackFlags::ANCHOR_BEGIN | SmackFlags::WILDCARDS,
        );
        smack.compile();
        let mut state = BASE_STATE;
        let mut offset = 0;
        let id = smack.search_next(&mut state, &b"xxxxabcd".to_vec(), &mut offset);
        assert!(id == 0);
        let mut state = BASE_STATE;
        let mut offset = 0;
        let mut id = smack.search_next(&mut state, &b"xxxxxxabcd".to_vec(), &mut offset);
        assert!(id == 1);
        let mut state = BASE_STATE;
        let mut offset = 0;
        let mut id = smack.search_next(&mut state, &b"xxxxaxabcd".to_vec(), &mut offset);
        assert!(id == 1);
    }

In this example, the last search (xxxxaxabcd) fails, while it shouldn't:

  • after reading the first four characters, the smack cannot decide between the two patterns,
  • after reading the first a, it still could be either of the two patterns,
  • but when the fifh x is read, then it should be decided that it cannot be the second pattern,
  • eventually, after reading the whole string, it should be decided that the first pattern matches (while it currently does not).

Note that this don't happen if the first a in the string to parse is replaced by a b for instance.

@Frky
Copy link
Member Author

Frky commented Dec 31, 2021

fsm

This is a graph of the FSM generated by smack.rs for the following patterns:

        smack.add_pattern(
            b"ab",
            0,
            SmackFlags::ANCHOR_BEGIN | SmackFlags::WILDCARDS,
        );
        smack.add_pattern(
            b"*ab",
            1,
            SmackFlags::ANCHOR_BEGIN | SmackFlags::WILDCARDS,
        );

We can observe that there is a missing edge from state 2 to state 3 for the character a (if we read ^aa, we should end up in 3).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant