Updated the pull request from #469 so that it will cleanly apply against master. #1441

chotchki · 2021-10-24T16:42:03Z

Cleaned up the conflicts from #469 so that the parser can be merged cleanly.

… apply against master.

…r entry.

Stargateur · 2021-10-24T21:06:06Z

Can you explain more clearly why it's perfect for your use case (with example & grammar) ? For me this parser is totally opposite to the concept of combinator. This instead of eat input and give the rest to other combinator, eat the end, and let the head to other combinator.

This lead to very ineffective parser.

Plus, it's unfortunate you remove the original author.

chotchki · 2021-10-25T01:21:10Z

@Stargateur The issue that caused me to switch to using this particular combinator is the definition of sql string literals in Postgres found here: https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-CONSTANTS (Section 4.1.2.1).

It requires a single ', any combination of unicode characters and a terminating single quote '. Multiple sql strings are merged into one IF and ONLY IF they are joined by whitespace with a newline.

Valid example:
'foo'
'bar'

Invalid example:
'foo' 'bar'

This combinator is fairly greedy and seemed the best way to look ahead and match as much of the whole string as possible. My current code is here (other attempts have been commented out in the link): https://github.com/chotchki/feophant/blob/main/src/engine/sql_parser/constants.rs#L29

Further work is coming to support the variety of embedded escape sequences.

I only recreated the pull request and code because I couldn't get git rebase to replay the original author's commits (despite trying several times this week). From my googling I can't directly edit the original pull request so this was my attempted work around. I tagged the original pull request to link back and give credit, if I've made a mistake with that, my apologies!

Stargateur · 2021-10-25T01:55:41Z

Seem such syntax is parser very well with separated_list1 as separated_list1(new_line, constant_sql_string)(s). constant_sql_string as delimited(tag('\''), accepted_char, tag('\''))

I only recreated the pull request and code because I couldn't get git rebase to replay the original author's commits (despite trying several times this week). From my googling I can't directly edit the original pull request so this was my attempted work around. I tagged the original pull request to link back and give credit, if I've made a mistake with that, my apologies!

I didn't want to blame in anyway, I also struggle with git it's hard to use.

I think you can do something like that:

git checkout take_until_parser_matches
git branch -m take_until_parser_matches_tmp
git remote add tomalexander https://github.com/tomalexander/nom.git
git fetch tomalexander
git checkout --track tomalexander/take_until_parser_matches
git cherry-pick 706c89355ae56d57b8e181b1770d0dc2fab7fc70
# resolve conflict
git push --force

Best I can propose. But don't worry that much, that just unfortunate.

Stargateur · 2021-10-25T04:52:59Z

See #1444 that allow empty sep if you want separated_list1(opt(new_line), constant_sql_string)(s)

NickNick · 2021-10-26T18:54:21Z

Can you explain more clearly why it's perfect for your use case (with example & grammar) ? For me this parser is totally opposite to the concept of combinator. This instead of eat input and give the rest to other combinator, eat the end, and let the head to other combinator.

I was parsing a language that relies on keywords, and allows multiple words as names in between, like <verb> <name> [modifier] e.g. <Rub> <the lovely kitten's belly> [<for 30 minutes> <with vigor>]. To figure out where 'name' ends, I look in this specific case for when a modifier (in that example for 30 minutes and with vigor) succeeds to parse. It's not efficient... the suggestion of starting the parse at the end sounds nice, but it would complicate the logic a bit (not all examples add with a list of modifiers like this). I feel this is a nice addition to nom because it is intuitive (for me anyway ;-)), albeit not the most efficient way to parse such grammars.

Stargateur · 2021-10-26T19:40:13Z

I was parsing a language that relies on keywords, and allows multiple words as names in between, like <verb> <name> [modifier] e.g. <Rub> <the lovely kitten's belly> [<for 30 minutes> <with vigor>]. To figure out where 'name' ends, I look in this specific case for when a modifier (in that example for 30 minutes and with vigor) succeeds to parse.

I don't get why you simply not parse <verb> then <name> then the modifiers. Since all of them are correctly delimited I don't see the problem here. I can produce an example if you wish with the following requirement:

let input = "<Rub> <the lovely kitten's belly> [<for 30 minutes> <with vigor>]";
let result = parse(input);
assert_eq!(result, Ok({
  verb: "Rub",
  names: vec!["the lively kitten's belly"],
  modifiers: vec!["for 30 minutes", "with vigor"],
}));

Thus like you say it's working for you, but I still don't see why "official" nom should have this. Tell me If I miss something, if needed be more precise I will try to help solve the problem.

In my opinion as user of nom, a new parser should solve a problem that can't be solve using other conbinator (or introduce a shortcut like for example many0). Add a parser that simply brute force every bytes of the input doesn't meet nom philosophy. Nom eat byte by byte. I agree there is already a number of parser like this in nom but I don't think there are a good addition to nom.

NickNick · 2021-10-27T18:17:47Z

The brackets were for clarity, the actual input is just Rub the lovely kitten's belly for 30 minutes with vigor. I am sure you can come up with a way to parse this specific example without this combinator though, but I don't get why you think this is such an unusual combinator while there is take_until. This is just a fancier version of it for when the thing you want to take_until is not a fixed value. That is not so weird right?

Stargateur

In that case, I would like improve this:

remove no consume variant (no need the sub parser can control this behavior using peek())

change signature to

pub fn take_until0<F, G, Input, Output, Error>(f: F, g: G) ->
  impl FnMut(Input) -> IResult<Input, (Input, Output), Error>

where this almost like many_till but instead of accumulate result drop them but return consumed input of f parser.

put it in multi module (or combinator module), take_* should not be in bytes module anymore, this combinator is simple a variant of many_till that doesn't accumulate. And no sense to put it in complete when they is no streaming version
~~add take_until1 variant~~
~~add take_until_m_n variant~~
Optionally depreciate other take_ function (maybe in another PR)

Why take_until name ? cause I think with that we can depreciate original all take function in bytes.

maybe take_until variant are not needed

I think we can factorize take_until0 and many_till into a fold_until one could use it as:

let (tail, (len, modifier)) = fold_until0(recognize(preceded(opt(char(' ')), word)), modifier,
  || 0, |acc, i| acc + i.len())(input)?;
let name = input[..len];

with this signature:

pub fn fold_until0<F, G, A, Input, Output, Error>(f: F, g: G, init: H, acc: A) ->
  impl FnMut(Input) -> IResult<Input, (Acc, Output), Error>

chotchki · 2021-10-31T13:30:51Z

@Stargateur I like the approach, working on it.

chotchki · 2021-11-03T03:19:47Z

@Stargateur This is pushing my understanding of nom but I'm trying to understand the signature fold_until0 completely.

I have the following signature with the where clause:

pub fn fold_until0<F, G, H, A, Input, Output, Error>(
  f: F,
  g: G,
  init: H,
  acc: A,
) -> impl FnMut(Input) -> IResult<Input, (A, Output), Error>
where
  Input: InputTake + InputIter + InputLength + Clone,
  F: Parser<Input, Output, Error>,
  G: FnMut(A, Output) -> (A, Output),
  H: FnMut() -> (A, Output),
  Error: ParseError<Input>,

"f" is the child parser that will recognize the end
"g": I don't understand in the context of the other arguments
"init": Function for the initial state for the accumulator
"acc": Function for building the accumulator up. I don't know if this really should be "g"
I'm returning the remaining input, the accumulation, the content until "f" matched (dropping "f"'s output).

Can you please help me understand if BOTH "g" and "acc" make sense? I'm reading https://docs.rs/nom/7.0.0/nom/multi/fn.fold_many0.html and it seems to just use g to accumulate.

Stargateur · 2021-11-03T08:21:56Z

pub fn fold_until0<P, Until, Init, Acc, Fold, Input, Output, UntilOutput, Error>(
    parser: P,
    until: Until,
    init: Init,
    fold: Fold,
) -> impl FnMut(Input) -> IResult<Input, (Acc, UntilOutput), Error>
where
    Input: InputTake + InputIter + InputLength + Clone,
    F: Parser<Input, Output, Error>,
    Until: Parser<Input, UntilOutput, Error>,
    Fold: FnMut(Acc, Output) -> Acc,
    Init: FnMut() -> Acc,
    Error: ParseError<Input>;

parser is used to parse what is before could be anychar for quick and dirty parsing
until is the parser used to stop the output of this parser is returned by fold_until
init is the init fct used to get the accumulator of the user
fold is the user fct that use the accumulator and return it

the body of fold_until0 is very simulator to #1341

chotchki added 2 commits October 24, 2021 12:40

Updated the pull request from rust-bakery#469 so that it will cleanly…

18d2964

… apply against master.

As requested in the pull request template, added choosing a combinato…

706c893

…r entry.

chotchki marked this pull request as ready for review October 24, 2021 16:46

Stargateur mentioned this pull request Oct 25, 2021

Implement take_until_parser_matches #469

Closed

Stargateur suggested changes Oct 28, 2021

View reviewed changes

Merge branch 'Geal:main' into take_until_parser_matches

5a45542

Geal added this to the 8.0 milestone Mar 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated the pull request from #469 so that it will cleanly apply against master. #1441

Updated the pull request from #469 so that it will cleanly apply against master. #1441

chotchki commented Oct 24, 2021

Stargateur commented Oct 24, 2021 •

edited

Loading

chotchki commented Oct 25, 2021 •

edited

Loading

Stargateur commented Oct 25, 2021 •

edited

Loading

Stargateur commented Oct 25, 2021

NickNick commented Oct 26, 2021 •

edited

Loading

Stargateur commented Oct 26, 2021 •

edited

Loading

NickNick commented Oct 27, 2021

Stargateur left a comment •

edited

Loading

chotchki commented Oct 31, 2021 •

edited

Loading

chotchki commented Nov 3, 2021

Stargateur commented Nov 3, 2021

Updated the pull request from #469 so that it will cleanly apply against master. #1441

Are you sure you want to change the base?

Updated the pull request from #469 so that it will cleanly apply against master. #1441

Conversation

chotchki commented Oct 24, 2021

Stargateur commented Oct 24, 2021 • edited Loading

chotchki commented Oct 25, 2021 • edited Loading

Stargateur commented Oct 25, 2021 • edited Loading

Stargateur commented Oct 25, 2021

NickNick commented Oct 26, 2021 • edited Loading

Stargateur commented Oct 26, 2021 • edited Loading

NickNick commented Oct 27, 2021

Stargateur left a comment • edited Loading

Choose a reason for hiding this comment

chotchki commented Oct 31, 2021 • edited Loading

chotchki commented Nov 3, 2021

Stargateur commented Nov 3, 2021

Stargateur commented Oct 24, 2021 •

edited

Loading

chotchki commented Oct 25, 2021 •

edited

Loading

Stargateur commented Oct 25, 2021 •

edited

Loading

NickNick commented Oct 26, 2021 •

edited

Loading

Stargateur commented Oct 26, 2021 •

edited

Loading

Stargateur left a comment •

edited

Loading

chotchki commented Oct 31, 2021 •

edited

Loading