Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: RangeQuery goroutine leak #15665

Merged
merged 5 commits into from
Jan 9, 2025

Conversation

yincongcyincong
Copy link
Contributor

@yincongcyincong yincongcyincong commented Jan 9, 2025

every time https://github.com/grafana/loki/blob/main/pkg/querier/queryrange/downstreamer.go#L197 this function return. there is no receiver for this channel.

https://github.com/grafana/loki/blob/main/pkg/querier/queryrange/downstreamer.go#L187. but there still have some error send to this channel. which result in goroutine leakage.

we can get this leakage through pprof/goroutine
image

@yincongcyincong yincongcyincong requested a review from a team as a code owner January 9, 2025 03:44
@yincongcyincong yincongcyincong changed the title RangeQuery: fix goroutine leak fix: RangeQuery goroutine leak Jan 9, 2025
@cyriltovena
Copy link
Contributor

cyriltovena commented Jan 9, 2025

Yes that's correct what do you think of doing instead this

if err != nil {
                 select {
			case <-ctx.Done():
			case ch <- logql.Resp{
				I:   -1,
				Err: err,
			}:
                }
		}

Technically the context will be done because the main function returns.

@chaudum
Copy link
Contributor

chaudum commented Jan 9, 2025

IMO there are two problems:

  1. We send a response error twice:
  2. We do not drain the channel ch when accumulation returns an error but instead return https://github.com/grafana/loki/blob/f2fc0c26c2/pkg/querier/queryrange/downstreamer.go#L199-L201 This leaves remaining results in the channel.
diff --git a/pkg/querier/queryrange/downstreamer.go b/pkg/querier/queryrange/downstreamer.go
index c6fba0fbf4..490a63150d 100644
--- a/pkg/querier/queryrange/downstreamer.go
+++ b/pkg/querier/queryrange/downstreamer.go
@@ -170,18 +170,19 @@ func (in instance) For(
 	go func() {
 		err := concurrency.ForEachJob(ctx, len(queries), in.parallelism, func(ctx context.Context, i int) error {
 			res, err := fn(queries[i])
+			if err != nil {
+				return err
+			}
 			response := logql.Resp{
 				I:   i,
 				Res: res,
-				Err: err,
 			}
-
 			// Feed the result into the channel unless the work has completed.
 			select {
 			case <-ctx.Done():
 			case ch <- response:
 			}
-			return err
+			return nil
 		})
 		if err != nil {
 			ch <- logql.Resp{
@@ -192,15 +193,15 @@ func (in instance) For(
 		close(ch)
 	}()
 
+	var err error
 	for resp := range ch {
-		if resp.Err != nil {
-			return nil, resp.Err
-		}
-		if err := acc.Accumulate(ctx, resp.Res, resp.I); err != nil {
-			return nil, err
+		if err != nil || resp.Err != nil {
+			err = resp.Err
+			continue
 		}
+		err = acc.Accumulate(ctx, resp.Res, resp.I)
 	}
-	return acc.Result(), nil
+	return acc.Result(), err
 }
 
 // convert to matrix

@yincongcyincong
Copy link
Contributor Author

yincongcyincong commented Jan 9, 2025

@cyriltovena @chaudum thanks for your review. i modified these code.
but i think this part has something wrong. when acc.Accumulate have an error and resp.Err is nil, err will replace by nil
image

Copy link
Contributor

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chaudum
Copy link
Contributor

chaudum commented Jan 9, 2025

@cyriltovena @chaudum thanks for your review. i modified these code. but i think this part has something wrong. when acc.Accumulate have an error and resp.Err is nil, err will replace by nil image

You're right. Both errors need to be checked separately.

Copy link
Contributor

@chaudum chaudum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT. Thanks!

@chaudum chaudum merged commit 5f476a3 into grafana:main Jan 9, 2025
59 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants