Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 473 assertion #476

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft

Conversation

ratmice
Copy link
Collaborator

@ratmice ratmice commented Dec 3, 2024

This seems to fix the assert in issue 473, however i'm marking this as draft because
for some reason the spans seem off by one when parsing the examples posted in that issue.
Strangely the spans are fine for the fuzzer minimized reproducer. Should probably also add test cases etc.

I wonder if this could be due to comments?

The other thing is that after fixing the assertion, nimbleparse seems to return to taking forever to parse the file.
I believe that it might be caused by running error recovery, as I did see some error recovery output.

For some reason It isn't clear whether a mention of @mingodad will work, but he may want to try this patch.

Previously the spans of empty productions were lost from the AST.
When a conflict involving an empty production was found a panic would
ensue due to the missing span.
@ltratt
Copy link
Member

ltratt commented Dec 10, 2024

Oops, I missed this one because I wasn't assigned -- sorry!

@ltratt
Copy link
Member

ltratt commented Dec 10, 2024

Dumb question: does this fix the assert with the postgres grammar from the issue?

@ratmice
Copy link
Collaborator Author

ratmice commented Dec 10, 2024

My bad, I seem to consistently mix up assignment vs request review. It does fix it, however I'm not entirely absolutely certain it is totally correct.

I had been thinking that it would be good to do some local testing where I made the prods field non-public and replace it with a prods() function, so I can tell where the field is being used from. From grepping it is too easy to mix up with the prods field of YaccGrammar.

I don't think it is worth including that kind of patch just because of compatibility, but just locally to get a better idea of where this prods field propagates throughout the lib...

@ltratt
Copy link
Member

ltratt commented Dec 11, 2024

I had been thinking that it would be good to do some local testing where I made the prods field non-public and replace it with a prods() function, so I can tell where the field is being used from.

Good idea!

@ltratt
Copy link
Member

ltratt commented Dec 18, 2024

@ratmice I was just wondering if you've had any luck with this?

@ratmice
Copy link
Collaborator Author

ratmice commented Dec 18, 2024

No I'm sorry, I haven't managed to sit down and work on this, or anything yet really. I've taken the month for attempting to see if I can sustain walking 24km a day (hint I cannot, If I really push myself I can still only seem to sustain 22km), So most of my time has been just walking or recovering for the next day. But been keeping at it considering how close to that goal I seem to be... It is inevitable that I'll need a rest day soon, but have a pretty good streak going and when I do it is the first thing on my list. Sorry if that isn't a very satisfying answer.

@ltratt
Copy link
Member

ltratt commented Jan 13, 2025

@ratmice Happy new year! I hope all is going well in your neck of the woods. I just thought I'd see if you've had any luck with this one? I quite understand if not!

@ratmice
Copy link
Collaborator Author

ratmice commented Jan 13, 2025

I tried going though making that field private and seeing if it was enlightening at all this morning, making the GrammarAST prods field wasn't actually helpful or enlightening as far as the usage of Symbol values.

In my previous message it seems I was thinking the GrammarAST field was more like the YaccGrammar prods field.
(that branch is here for the record anyhow)

The difference between the two being that YaccGrammar has prods: Box<[Box<[Symbol<StorageT>]>]> however GrammarAst has prods: Vec<Production> where struct Production has a symbols: Vec<Symbols>. So what I actually should have been looking to make private was the Symbols value of Production.

Trying to experiment with making the Production.symbols field private seemed an exercise in futility given how many places in the testsuite Production is constructed (would need to change those all to use a new constructor, looking at the places that cause errors when making it private didn't show anywhere that we were using its len() alone without doing a match symbol in such a way that we've already had to have added cases for %empty.

My impression is that the place that this change can cause mappings to go awry is fairly well isolated to the function YaccGrammar::new_from_ast_with_validity_info but we correctly handle the ast::Symbol to Symbol mapping for that in this branch, with the %empty values never escaping into the Symbol.

Perhaps that is not totally reassuring?

@ltratt
Copy link
Member

ltratt commented Jan 13, 2025

new_from_ast_with_validity_info is definitely a slightly hairy function! I must admit that at first glance I'm not quite sure where best to look. It might be that we could run shrinkray on the input grammar and see if it shrinks it down to something small enough that the correlation between bug and code jumps out at us?

@ratmice
Copy link
Collaborator Author

ratmice commented Jan 13, 2025

Just for the record, this patch should fix the assertion and known bug, the question that is left is mostly whether our confidence that the changes do not have any unintended effects on what most likely would be Pidx generation.
So unfortunately it's the kind of non-existence of bug that automated tools are unlikely to prove the absence of.

I think it is likely that the patch as is probably behaves perfectly fine, but the use of numerical indices makes it difficult to attain the degree of confidence that I normally like to have in a patch :(

@ltratt
Copy link
Member

ltratt commented Jan 14, 2025

Dumb question: is the fix just to remove the assert?

diff --git nimbleparse/src/diagnostics.rs nimbleparse/src/diagnostics.rs
index babfdae..ae37483 100644
--- nimbleparse/src/diagnostics.rs
+++ nimbleparse/src/diagnostics.rs
@@ -304,7 +304,7 @@ impl<'a> SpannedDiagnosticFormatter<'a> {
                 .collect::<Vec<_>>();
             prod_lines.sort();
             prod_lines.dedup();
-            assert!(!r_prod_spans.is_empty());
+            // assert!(!r_prod_spans.is_empty());
 
             for lines in &prod_lines {
                 let mut spans_on_line = Vec::new();

When I look at the output that I get with this diff from nimbleparse, it's not obvious to me that it's wrong.

@ratmice
Copy link
Collaborator Author

ratmice commented Jan 14, 2025

Dumb question: is the fix just to remove the assert?

When I look at the output that I get with this diff from nimbleparse, it's not obvious to me that it's wrong.

By simply removing the span we can get the occurrence that the reduce in a shift/reduce conflict has no span associated with it, such that the loop in the trailing context for the diff that you gave removing the assert will iterate 0 times (because prod_lines is a map over r_prod_spans.

for lines in &prod_lines {
    let mut spans_on_line = Vec::new();

So the effect of removing the assert is that we print the rule in which the reduce is occurring, but unlike most cases where we also print the specific production which is reduced, we don't currently have a span for this specific case of empty or %empty productions, since currently we're only tracking spans for productions which refer to tokens or other rules.

So it's really just a case where error printing needs some info that the runtime representation doesn't need in order to print errors uniformly across all shift/reduce conflicts.

So, yes it will not lead to any further panics by removing the assert, but the assert currently is catching an exceptional case where we lack any span info for these productions.

@ltratt
Copy link
Member

ltratt commented Jan 14, 2025

Ah, right, yes, now I remember!

So is your aim here that we keep the user's %emptys but we don't add ones in unnecessarily? Or do we convert every empty rule to have %empty even if the user didn't write it?

@ratmice
Copy link
Collaborator Author

ratmice commented Jan 14, 2025

well this attempt at an implementation we have 2 cases, both of which are implemented as GrammarAST::Symbol::Empty

First there is the rule containing %empty which results in a span containing %empty

case1: %empty | 'a' ;

The second is productions absent any text, these result in a zero length span at the location to the left of the | or ;.

case2: | 'a' ;
case3: 'a' | ;

The only case where produce %empty even if it did not ocur in input text is in the Display implementation, in theory we could inspect the span and print %empty if the span is not zero length, but I'm uncertain whether printing the empty string would make for a useful Display implementation.

But besides that Display impl the actual spans refer to directly to the input text as given, so there is no attempt to convert or print %empty, the intent is that it produces an error something like:

case3: 'a' | ;
            ^ Reduced productions;

Edit: It's worth noting that the Display impl while it isn't used for any of the diagnostics, I am not actually aware of whether it is used anywhere, I can't think of where it would be used in the repo at least.

@ltratt
Copy link
Member

ltratt commented Jan 15, 2025

Ah, I see. Let me chew on this for a bit, mostly because the pub nature Symbol means this is a breaking API change (I think).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants