Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow comments in grammars #37

Open
shnewto opened this issue Jan 10, 2018 · 13 comments
Open

Allow comments in grammars #37

shnewto opened this issue Jan 10, 2018 · 13 comments

Comments

@shnewto
Copy link
Owner

shnewto commented Jan 10, 2018

We need a way to allow incorporating comments into a grammar. It seems somewhat standard to
use the ; to indicate the start of a comment and a \n to close it. It'd be a good add I think to make that work.

A couple initial questions:

  • Do we look for the ; all along the way?
  • Do we make a "first pass" that strips all comments before parsing?
@Carlyle-Foster
Copy link
Contributor

i don't think we'd have to look for ; all the time, unless there's some de facto standard for for closing comments within a line comments can never appear as interjections so we'd only have to check here <a> ::= "b" here <c> here | <d> and here

@Carlyle-Foster
Copy link
Contributor

now that i think of it, we should be able to confine comment parsing entirely within whitespace parsing, can you think of any exceptions?

@Carlyle-Foster
Copy link
Contributor

i don't think we'd have to look for ; all the time, unless there's some de facto standard for for closing comments within a line comments can never appear as interjections so we'd only have to check here <a> ::= "b" here <c> here | <d> and here

wait a second that's wrong, it's actually just here <a> ::= "b" <c> | <d> and here that have to be checked

@Carlyle-Foster
Copy link
Contributor

i'm not sure if this should be closed yet, i found this out in the wild, it's VERY old but it's usage of comments really make sense, if newlines are allowable WS then any whitespace could have comments inside

@Carlyle-Foster
Copy link
Contributor

i might just have to bite the bullet on this one and replace all the simple WS parsing with a custom function that parses WS while transparently skipping comments

@shnewto
Copy link
Owner Author

shnewto commented Jan 5, 2025

one note here is that I don't think comments are actually allowed anywhere except at the end of a rule so rather than
here <a> ::= "b" <c> | <d> and here

it's just
<a> ::= "b" <c> | <d> here

From section 2.8 of the spec linked above:

A semi-colon, set off some distance to the right of rule text, starts a comment that continues to the end of line. This is a simple way of including useful notes in parallel with the specifications.

@shnewto
Copy link
Owner Author

shnewto commented Jan 5, 2025

which effectively means that anything that follows a ; until a newline can be treated as a comment and we can expect that there can be lines that are only whitespace + comments

@shnewto
Copy link
Owner Author

shnewto commented Jan 5, 2025

also 🤔 I think we already handle ; to terminate lines so it might just be that we want to eat anything that follows a ; until the newline rather than allowing ; as a delimiter which I think it's behaving as currently.

@Carlyle-Foster
Copy link
Contributor

; isn't the main delimiter, we do normally just consume until we hit a newline, ; is only a delimiter because for some reason i though having two comments on the same line should be rejected, i think because i interpreted that as trying to close the comment b4 the newline
in retrospect, that doesn't make much sense

@shnewto
Copy link
Owner Author

shnewto commented Jan 5, 2025

I think we're talking about the same thing but just in case, here's the current behavior for bnf grammars

<dna> ::= <base> | <base> <dna> 
<base> ::= 'A' | 'C' | 'G' | 'T'"

is equivalent to

<dna> ::= <base> | <base> <dna> ; <base> ::= 'A' | 'C' | 'G' | 'T'

i.e. ; is (incorrectly) just acting as an alternative to \n as a delimiter and they both result in the same object after parsing.

but if we want to handle comments correctly, I believe we want

<dna> ::= <base> | <base> <dna> ; <base> ::= 'A' | 'C' | 'G' | 'T'

to be equivalent to

<dna> ::= <base> | <base> <dna>

@Carlyle-Foster
Copy link
Contributor

one note here is that I don't think comments are actually allowed anywhere except at the end of a rule so rather than here <a> ::= "b" <c> | <d> and here

it's just <a> ::= "b" <c> | <d> here

From section 2.8 of the spec linked above:

A semi-colon, set off some distance to the right of rule text, starts a comment that continues to the end of line. This is a simple way of including useful notes in parallel with the specifications.

the problem is currently all WS could include newlines so all WS has to handle comments, take this example from the RFC u quoted foreinstance, it idiomatically puts a NL in the whitespace before a /

destination =  "To"          ":" 1#address  ; Primary
                 /  "Resent-To"   ":" 1#address
                 /  "cc"          ":" 1#address  ; Secondary
                 /  "Resent-cc"   ":" 1#address
                 /  "bcc"         ":"  #address  ; Blind carbon
                 /  "Resent-bcc"  ":"  #address

@Carlyle-Foster
Copy link
Contributor

I think we're talking about the same thing but just in case, here's the current behavior for bnf grammars

<dna> ::= <base> | <base> <dna> 
<base> ::= 'A' | 'C' | 'G' | 'T'"

is equivalent to

<dna> ::= <base> | <base> <dna> ; <base> ::= 'A' | 'C' | 'G' | 'T'

i.e. ; is (incorrectly) just acting as an alternative to \n as a delimiter and they both result in the same object after parsing.

but if we want to handle comments correctly, I believe we want

<dna> ::= <base> | <base> <dna> ; <base> ::= 'A' | 'C' | 'G' | 'T'

to be equivalent to

<dna> ::= <base> | <base> <dna>

oh, that's what ur talking about, yeah i thought that was a little weird, i've never seen that used out in the wild for that matter, not sure why anyone would want 2 production rules on 1 line

@shnewto
Copy link
Owner Author

shnewto commented Jan 5, 2025

hah brevity I guess, it's my fault 😆

to your previous point, I think that's okay! since we know that all comments will start with ; and end with \n (or EOF) we can just throw away anything that comes after a ; until the \n or EOF regardless of where it shows up and keep on parsing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants