-
Notifications
You must be signed in to change notification settings - Fork 809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated the pull request from #469 so that it will cleanly apply against master. #1441
base: main
Are you sure you want to change the base?
Conversation
… apply against master.
Can you explain more clearly why it's perfect for your use case (with example & grammar) ? For me this parser is totally opposite to the concept of combinator. This instead of eat input and give the rest to other combinator, eat the end, and let the head to other combinator. This lead to very ineffective parser. Plus, it's unfortunate you remove the original author. |
@Stargateur The issue that caused me to switch to using this particular combinator is the definition of sql string literals in Postgres found here: https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-CONSTANTS (Section 4.1.2.1). It requires a single ', any combination of unicode characters and a terminating single quote '. Multiple sql strings are merged into one IF and ONLY IF they are joined by whitespace with a newline. Valid example: Invalid example: This combinator is fairly greedy and seemed the best way to look ahead and match as much of the whole string as possible. My current code is here (other attempts have been commented out in the link): https://github.com/chotchki/feophant/blob/main/src/engine/sql_parser/constants.rs#L29 Further work is coming to support the variety of embedded escape sequences. I only recreated the pull request and code because I couldn't get git rebase to replay the original author's commits (despite trying several times this week). From my googling I can't directly edit the original pull request so this was my attempted work around. I tagged the original pull request to link back and give credit, if I've made a mistake with that, my apologies! |
Seem such syntax is parser very well with
I didn't want to blame in anyway, I also struggle with git it's hard to use. I think you can do something like that: git checkout take_until_parser_matches
git branch -m take_until_parser_matches_tmp
git remote add tomalexander https://github.com/tomalexander/nom.git
git fetch tomalexander
git checkout --track tomalexander/take_until_parser_matches
git cherry-pick 706c89355ae56d57b8e181b1770d0dc2fab7fc70
# resolve conflict
git push --force Best I can propose. But don't worry that much, that just unfortunate. |
See #1444 that allow empty sep if you want |
I was parsing a language that relies on keywords, and allows multiple words as names in between, like |
I don't get why you simply not parse let input = "<Rub> <the lovely kitten's belly> [<for 30 minutes> <with vigor>]";
let result = parse(input);
assert_eq!(result, Ok({
verb: "Rub",
names: vec!["the lively kitten's belly"],
modifiers: vec!["for 30 minutes", "with vigor"],
})); Thus like you say it's working for you, but I still don't see why "official" nom should have this. Tell me If I miss something, if needed be more precise I will try to help solve the problem. In my opinion as user of nom, a new parser should solve a problem that can't be solve using other conbinator (or introduce a shortcut like for example |
The brackets were for clarity, the actual input is just |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, I would like improve this:
- remove no consume variant (no need the sub parser can control this behavior using
peek()
) - change signature to
where this almost like
pub fn take_until0<F, G, Input, Output, Error>(f: F, g: G) -> impl FnMut(Input) -> IResult<Input, (Input, Output), Error>
many_till
but instead of accumulate result drop them but return consumed input of f parser. - put it in multi module (or combinator module), take_* should not be in
bytes
module anymore, this combinator is simple a variant of many_till that doesn't accumulate. And no sense to put it in complete when they is no streaming version -
add take_until1 variant -
add take_until_m_n variant - Optionally depreciate other take_ function (maybe in another PR)
Why take_until name ? cause I think with that we can depreciate original all take function in bytes.
maybe take_until variant are not needed
I think we can factorize take_until0
and many_till
into a fold_until
one could use it as:
let (tail, (len, modifier)) = fold_until0(recognize(preceded(opt(char(' ')), word)), modifier,
|| 0, |acc, i| acc + i.len())(input)?;
let name = input[..len];
with this signature:
pub fn fold_until0<F, G, A, Input, Output, Error>(f: F, g: G, init: H, acc: A) ->
impl FnMut(Input) -> IResult<Input, (Acc, Output), Error>
@Stargateur I like the approach, working on it. |
@Stargateur This is pushing my understanding of nom but I'm trying to understand the signature fold_until0 completely. I have the following signature with the where clause:
Can you please help me understand if BOTH "g" and "acc" make sense? I'm reading https://docs.rs/nom/7.0.0/nom/multi/fn.fold_many0.html and it seems to just use g to accumulate. |
pub fn fold_until0<P, Until, Init, Acc, Fold, Input, Output, UntilOutput, Error>(
parser: P,
until: Until,
init: Init,
fold: Fold,
) -> impl FnMut(Input) -> IResult<Input, (Acc, UntilOutput), Error>
where
Input: InputTake + InputIter + InputLength + Clone,
F: Parser<Input, Output, Error>,
Until: Parser<Input, UntilOutput, Error>,
Fold: FnMut(Acc, Output) -> Acc,
Init: FnMut() -> Acc,
Error: ParseError<Input>;
the body of |
Cleaned up the conflicts from #469 so that the parser can be merged cleanly.