Issue 473 assertion #476

ratmice · 2024-12-03T08:10:37Z

This seems to fix the assert in issue 473, however i'm marking this as draft because
for some reason the spans seem off by one when parsing the examples posted in that issue.
Strangely the spans are fine for the fuzzer minimized reproducer. Should probably also add test cases etc.

I wonder if this could be due to comments?

The other thing is that after fixing the assertion, nimbleparse seems to return to taking forever to parse the file.
I believe that it might be caused by running error recovery, as I did see some error recovery output.

For some reason It isn't clear whether a mention of @mingodad will work, but he may want to try this patch.

Previously the spans of empty productions were lost from the AST. When a conflict involving an empty production was found a panic would ensue due to the missing span.

ltratt · 2024-12-10T23:05:57Z

Oops, I missed this one because I wasn't assigned -- sorry!

ltratt · 2024-12-10T23:07:34Z

Dumb question: does this fix the assert with the postgres grammar from the issue?

ratmice · 2024-12-10T23:51:48Z

My bad, I seem to consistently mix up assignment vs request review. It does fix it, however I'm not entirely absolutely certain it is totally correct.

I had been thinking that it would be good to do some local testing where I made the prods field non-public and replace it with a prods() function, so I can tell where the field is being used from. From grepping it is too easy to mix up with the prods field of YaccGrammar.

I don't think it is worth including that kind of patch just because of compatibility, but just locally to get a better idea of where this prods field propagates throughout the lib...

ltratt · 2024-12-11T07:50:00Z

I had been thinking that it would be good to do some local testing where I made the prods field non-public and replace it with a prods() function, so I can tell where the field is being used from.

Good idea!

ltratt · 2024-12-18T14:10:42Z

@ratmice I was just wondering if you've had any luck with this?

ratmice · 2024-12-18T19:28:57Z

No I'm sorry, I haven't managed to sit down and work on this, or anything yet really. I've taken the month for attempting to see if I can sustain walking 24km a day (hint I cannot, If I really push myself I can still only seem to sustain 22km), So most of my time has been just walking or recovering for the next day. But been keeping at it considering how close to that goal I seem to be... It is inevitable that I'll need a rest day soon, but have a pretty good streak going and when I do it is the first thing on my list. Sorry if that isn't a very satisfying answer.

ltratt · 2025-01-13T11:24:09Z

@ratmice Happy new year! I hope all is going well in your neck of the woods. I just thought I'd see if you've had any luck with this one? I quite understand if not!

ratmice · 2025-01-13T13:19:43Z

I tried going though making that field private and seeing if it was enlightening at all this morning, making the GrammarAST prods field wasn't actually helpful or enlightening as far as the usage of Symbol values.

In my previous message it seems I was thinking the GrammarAST field was more like the YaccGrammar prods field.
(that branch is here for the record anyhow)

The difference between the two being that YaccGrammar has prods: Box<[Box<[Symbol<StorageT>]>]> however GrammarAst has prods: Vec<Production> where struct Production has a symbols: Vec<Symbols>. So what I actually should have been looking to make private was the Symbols value of Production.

Trying to experiment with making the Production.symbols field private seemed an exercise in futility given how many places in the testsuite Production is constructed (would need to change those all to use a new constructor, looking at the places that cause errors when making it private didn't show anywhere that we were using its len() alone without doing a match symbol in such a way that we've already had to have added cases for %empty.

My impression is that the place that this change can cause mappings to go awry is fairly well isolated to the function YaccGrammar::new_from_ast_with_validity_info but we correctly handle the ast::Symbol to Symbol mapping for that in this branch, with the %empty values never escaping into the Symbol.

Perhaps that is not totally reassuring?

ltratt · 2025-01-13T22:20:44Z

new_from_ast_with_validity_info is definitely a slightly hairy function! I must admit that at first glance I'm not quite sure where best to look. It might be that we could run shrinkray on the input grammar and see if it shrinks it down to something small enough that the correlation between bug and code jumps out at us?

ratmice · 2025-01-13T22:56:14Z

Just for the record, this patch should fix the assertion and known bug, the question that is left is mostly whether our confidence that the changes do not have any unintended effects on what most likely would be Pidx generation.
So unfortunately it's the kind of non-existence of bug that automated tools are unlikely to prove the absence of.

I think it is likely that the patch as is probably behaves perfectly fine, but the use of numerical indices makes it difficult to attain the degree of confidence that I normally like to have in a patch :(

ltratt · 2025-01-14T19:14:08Z

Dumb question: is the fix just to remove the assert?

diff --git nimbleparse/src/diagnostics.rs nimbleparse/src/diagnostics.rs
index babfdae..ae37483 100644
--- nimbleparse/src/diagnostics.rs
+++ nimbleparse/src/diagnostics.rs
@@ -304,7 +304,7 @@ impl<'a> SpannedDiagnosticFormatter<'a> {
                 .collect::<Vec<_>>();
             prod_lines.sort();
             prod_lines.dedup();
-            assert!(!r_prod_spans.is_empty());
+            // assert!(!r_prod_spans.is_empty());
 
             for lines in &prod_lines {
                 let mut spans_on_line = Vec::new();

When I look at the output that I get with this diff from nimbleparse, it's not obvious to me that it's wrong.

ratmice · 2025-01-14T22:04:25Z

Dumb question: is the fix just to remove the assert?

When I look at the output that I get with this diff from nimbleparse, it's not obvious to me that it's wrong.

By simply removing the span we can get the occurrence that the reduce in a shift/reduce conflict has no span associated with it, such that the loop in the trailing context for the diff that you gave removing the assert will iterate 0 times (because prod_lines is a map over r_prod_spans.

for lines in &prod_lines {
    let mut spans_on_line = Vec::new();

So the effect of removing the assert is that we print the rule in which the reduce is occurring, but unlike most cases where we also print the specific production which is reduced, we don't currently have a span for this specific case of empty or %empty productions, since currently we're only tracking spans for productions which refer to tokens or other rules.

So it's really just a case where error printing needs some info that the runtime representation doesn't need in order to print errors uniformly across all shift/reduce conflicts.

So, yes it will not lead to any further panics by removing the assert, but the assert currently is catching an exceptional case where we lack any span info for these productions.

ltratt · 2025-01-14T22:19:53Z

Ah, right, yes, now I remember!

So is your aim here that we keep the user's %emptys but we don't add ones in unnecessarily? Or do we convert every empty rule to have %empty even if the user didn't write it?

ratmice · 2025-01-14T22:49:49Z

well this attempt at an implementation we have 2 cases, both of which are implemented as GrammarAST::Symbol::Empty

First there is the rule containing %empty which results in a span containing %empty

case1: %empty | 'a' ;

The second is productions absent any text, these result in a zero length span at the location to the left of the | or ;.

case2: | 'a' ;
case3: 'a' | ;

The only case where produce %empty even if it did not ocur in input text is in the Display implementation, in theory we could inspect the span and print %empty if the span is not zero length, but I'm uncertain whether printing the empty string would make for a useful Display implementation.

But besides that Display impl the actual spans refer to directly to the input text as given, so there is no attempt to convert or print %empty, the intent is that it produces an error something like:

case3: 'a' | ;
            ^ Reduced productions;

Edit: It's worth noting that the Display impl while it isn't used for any of the diagnostics, I am not actually aware of whether it is used anywhere, I can't think of where it would be used in the repo at least.

ltratt · 2025-01-15T07:59:05Z

Ah, I see. Let me chew on this for a bit, mostly because the pub nature Symbol means this is a breaking API change (I think).

ratmice added 2 commits December 2, 2024 23:05

Switch if/else condition to assert

4f9930a

Add ast::Symbol variant for %empty productions.

9769c4b

Previously the spans of empty productions were lost from the AST. When a conflict involving an empty production was found a panic would ensue due to the missing span.

ratmice requested a review from ltratt December 3, 2024 08:34

ratmice mentioned this pull request Dec 3, 2024

Can't finish or takes forever with postgres16 grammar #473

Open

ltratt self-assigned this Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue 473 assertion #476

Issue 473 assertion #476

ratmice commented Dec 3, 2024

ltratt commented Dec 10, 2024

ltratt commented Dec 10, 2024

ratmice commented Dec 10, 2024 •

edited

Loading

ltratt commented Dec 11, 2024

ltratt commented Dec 18, 2024

ratmice commented Dec 18, 2024

ltratt commented Jan 13, 2025

ratmice commented Jan 13, 2025

ltratt commented Jan 13, 2025

ratmice commented Jan 13, 2025

ltratt commented Jan 14, 2025

ratmice commented Jan 14, 2025

ltratt commented Jan 14, 2025

ratmice commented Jan 14, 2025 •

edited

Loading

ltratt commented Jan 15, 2025

Issue 473 assertion #476

Are you sure you want to change the base?

Issue 473 assertion #476

Conversation

ratmice commented Dec 3, 2024

ltratt commented Dec 10, 2024

ltratt commented Dec 10, 2024

ratmice commented Dec 10, 2024 • edited Loading

ltratt commented Dec 11, 2024

ltratt commented Dec 18, 2024

ratmice commented Dec 18, 2024

ltratt commented Jan 13, 2025

ratmice commented Jan 13, 2025

ltratt commented Jan 13, 2025

ratmice commented Jan 13, 2025

ltratt commented Jan 14, 2025

ratmice commented Jan 14, 2025

ltratt commented Jan 14, 2025

ratmice commented Jan 14, 2025 • edited Loading

ltratt commented Jan 15, 2025

ratmice commented Dec 10, 2024 •

edited

Loading

ratmice commented Jan 14, 2025 •

edited

Loading