Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ improve logging: catalog http server, op-con resolver #1564

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

joelanford
Copy link
Member

Description

This adds more logging to help administrators troubleshoot issues resolving bundles from catalogs. Specifically, it:

  • Adds logging to the catalogd http server, so that all requests are logged with useful metadata about the request and response
  • Adds logging in the resolution code path to:
    • report with catalogs are included in the resolution
    • report whether the desired package is found in each catalog
    • report the total count of bundles and the matched count of bundles in each catalog

This should help administrators narrow down which stage of resolution should be investigated when unexpected resolution errors occur.

Reviewer Checklist

  • API Go Documentation
  • Tests: Unit Tests (and E2E Tests, if appropriate)
  • Comprehensive Commit Messages
  • Links to related GitHub Issue(s)

@joelanford joelanford requested a review from a team as a code owner January 10, 2025 02:39
Copy link

netlify bot commented Jan 10, 2025

Deploy Preview for olmv1 ready!

Name Link
🔨 Latest commit 67e6a22
🔍 Latest deploy log https://app.netlify.com/sites/olmv1/deploys/6786ac99fd224c000867af17
😎 Deploy Preview https://deploy-preview-1564--olmv1.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

codecov bot commented Jan 10, 2025

Codecov Report

Attention: Patch coverage is 53.22581% with 29 lines in your changes missing coverage. Please review.

Project coverage is 66.52%. Comparing base (da28803) to head (67e6a22).

Files with missing lines Patch % Lines
catalogd/internal/serverutil/serverutil.go 0.00% 26 Missing ⚠️
internal/resolve/catalog.go 91.66% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1564      +/-   ##
==========================================
- Coverage   66.68%   66.52%   -0.17%     
==========================================
  Files          57       57              
  Lines        4584     4645      +61     
==========================================
+ Hits         3057     3090      +33     
- Misses       1302     1329      +27     
- Partials      225      226       +1     
Flag Coverage Δ
e2e 53.12% <91.66%> (+0.48%) ⬆️
unit 53.62% <53.22%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

camilamacedo86
camilamacedo86 previously approved these changes Jan 10, 2025
Copy link
Contributor

@camilamacedo86 camilamacedo86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 10, 2025
catalogd/internal/serverutil/serverutil.go Show resolved Hide resolved
config/samples/olm_v1_clusterextension.yaml Outdated Show resolved Hide resolved
@@ -65,6 +66,15 @@ func (r *CatalogResolver) Resolve(ctx context.Context, ext *ocv1.ClusterExtensio
}
}

type catStat struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about extracting this to package-level and making attributes private?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started out with private attributes, but then when these are eventually passed into l.Info(), logr only outputs the exported attributes. We could write more code to specifically log each unexported field, but that would be brittle if we ever add new fields.

Not highly opposed to extracting to package level, but I didn't see any use case where this would be used outside the function, so I figured the more local, the better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right, though it's actually the zap backend sink that controller-runtime/log uses which requires public fields - see go-logr/logr#142

Regarding the declaration, sure it can be extracted in the future if there's a need. Personally I just find it more readable with type declarations done on the outside and functions/methods kept smaller.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I thought I had switched everything over to the klog backend a month or so ago.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also ensure that the log output is lowercased, which I am not sure happens right now.

With textlogger it can be done by making the catStat implement slog.LogValuer for example, which would also solve the problem of private attributes - ie. we would be free to make them private.

Alternatively, adding tags with lowercase keys to the struct would also work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the log line, the structured log keys are lowercase. However, the catStat value string contains field names that match the casing of the struct fields (hence they are currently capitalized).

I don't personally think we need to downcase the first letter of those field names, but I'm also not really against it. I'll generally agree that the logs would look more consistent if we did.

I'll take a look at what it would take to do that. It might just be json tags.

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jan 10, 2025
@camilamacedo86
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 10, 2025
Comment on lines +92 to +94
if isFBCEmpty(packageFBC) {
return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This early exit did not exist before. Are we worried about a change in behavior?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before, if len(packageFBC.Bundles) was 0, we would evaluate/run the predicates against a 0-length slice, which would return a 0-length slice, and then we'd return nil in line 103 (old) / line 124 (new)

So I don't think there's a behavior change here other than skipping some unnecessary predicate setup.

@@ -158,6 +179,7 @@ func (r *CatalogResolver) Resolve(ctx context.Context, ext *ocv1.ClusterExtensio

// Check for ambiguity
if len(resolvedBundles) != 1 {
l.Info("resolution failed", "stats", catStats)
Copy link
Contributor

@tmshort tmshort Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want this before all error returns from this point forward?

Copy link
Member Author

@joelanford joelanford Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could. I felt like those other returns include context in the error message that would make it possible to figure out what happened, so that's why I only included it here.

Copy link

openshift-ci bot commented Jan 14, 2025

New changes are detected. LGTM label has been removed.

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants