-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat!: Enable pprof server on demand (#16594) #17068
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #17068 +/- ##
==========================================
+ Coverage 50.70% 52.68% +1.98%
==========================================
Files 316 316
Lines 43479 43656 +177
==========================================
+ Hits 22046 23001 +955
+ Misses 18920 18102 -818
- Partials 2513 2553 +40 ☔ View full report in Codecov by Sentry. |
I'm totally sure that this requires at least a note in documentation, but I'm not totally sure about where I should write down the info. EDIT: Is this commit enough? |
@JorTurFer IMO there should be a new page under |
@JorTurFer I converted this to a draft until the docs are ready. |
yeah! Sorry, I want to finish this during the week but I forgot to make it a draft |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest putting this behind an env var flag, as we have in Workflows. That way it's not running profilers (that use CPU and memory themselves and reveal debug info) by default, as the vast majority of usage would not need it
Thanks for the feedback! The profiler isn't enabled as default, you have to set a command arg to enable it. As default this line is false: https://github.com/argoproj/argo-cd/pull/17068/files#diff-d9546c5501e320265b2650f92869618fd0c75f9b2caab740461b0b44223ad9c9R21 Using command arg or env isn't a problem, I can switch to (true/false) env to be aligned, but in that case the port has to be hardcoded or provided from other source. That's why to avoid 2 parameters I took the same approach as controller-runtime took: kubernetes-sigs/controller-runtime@482ed05#diff-d500fbd6a2aa620607ca5e2a7c3ac4f1a4c82309d1a549561e92abfcb18f2f0eR594-R604 Given that, I want to work on the pending doc this weekend, so it'd be nice if we decide the enabling approach xD |
Oh. That seems like an unintuitive way of handling it. It also does still create and run a
Surprised to see With an env var you could do something similar, i.e. |
mux.HandleFunc("/debug/pprof/", pprof.Index) | ||
mux.HandleFunc("/debug/pprof/cmdline", pprof.Cmdline) | ||
mux.HandleFunc("/debug/pprof/profile", pprof.Profile) | ||
mux.HandleFunc("/debug/pprof/symbol", pprof.Symbol) | ||
mux.HandleFunc("/debug/pprof/trace", pprof.Trace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might want to remove this other code that added handlers previously: #7533
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol, I didn't realize that the code was there 🤦
but, does ArgoCD already have support for enabling the profiler using vars?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, maybe I've added a feature that was already supported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I was just responding to someone on Slack about this and found the old PR and your PR and issue. Complete coincidence that I stumbled upon this, especially as I primarily contribute to Workflows.
but, does ArgoCD already have support for enabling the profiler using vars?
it seems to be unconditionally enabled on the metrics port(?) if I'm reading correctly. that would be a security issue if so (albeit a minor one, as it's just debug info of an OSS project), and a perf issue too.
also that means the port isn't customizable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okey, we will replace that with my current approach, right? I mean, I have to remove the old code and just keep my PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the title in particular is used in the changelog and the prefix is typically used for cherry-picking (although CD has a bit different process for that). This might be a fix!
since it's technically breaking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the !
required too in the title?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In conventional commits, the !
is an indication of a breaking change. This does remove the previously default enabled access to the pprof
endpoints and so is technically breaking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the previous support for pprof
actually only added it to the Application Controller and Server (some of the components may not have existed at the time; I think only support for the Repo Server was missing at the time of #7533), so this PR also adds it to the rest... which is technically a feature... 😖
So maybe this makes sense as a feat!:
that is still backported due to the security issue? I'll let the CD Maintainers decide how to handle that 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem , feat!:
! xD
That's true xD
Yeah, for sure. I'm happy changing to |
yea I could also change Workflows to support a port number as well. now that I'm looking at it in-detail, would be a good time |
Could probably put this just after the "Metrics" page in order so that they're right next to each other as they're related |
Hi! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All docs comments below
Wow, thanks for the in depth review! ❤️ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a short-hand for log.Error
followed by os.Exit(1)
-- log.Fatal
Otherwise LGTM other than the PR title and description per in-line comments.
Will pass off to CD folks
cmd/argocd-application-controller/commands/argocd_application_controller.go
Outdated
Show resolved
Hide resolved
cmd/argocd-application-controller/commands/argocd_application_controller.go
Outdated
Show resolved
Hide resolved
cmd/argocd-applicationset-controller/commands/applicationset_controller.go
Outdated
Show resolved
Hide resolved
cmd/argocd-applicationset-controller/commands/applicationset_controller.go
Outdated
Show resolved
Hide resolved
cmd/argocd-application-controller/commands/argocd_application_controller.go
Outdated
Show resolved
Hide resolved
mux.HandleFunc("/debug/pprof/", pprof.Index) | ||
mux.HandleFunc("/debug/pprof/cmdline", pprof.Cmdline) | ||
mux.HandleFunc("/debug/pprof/profile", pprof.Profile) | ||
mux.HandleFunc("/debug/pprof/symbol", pprof.Symbol) | ||
mux.HandleFunc("/debug/pprof/trace", pprof.Trace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the title in particular is used in the changelog and the prefix is typically used for cherry-picking (although CD has a bit different process for that). This might be a fix!
since it's technically breaking?
Thanks a lot for the guidance! |
docs/operator-manual/profiling.md
Outdated
`pprof` has two modes: interactive and non-interactive. Non-interactive mode generates profiling data for future analysis. Interactive mode launches a web server to visualize the profiling data in real-time. | ||
|
||
!!! Note | ||
In the below commands, replace http://localhost:6060 with the appropriate URL of your Argo component. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realized, we might want to recommend that users port-forward the pprof
port to their localhost
, since the port should generally not be accessible via Ingress (unlike the UI, for example)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used localhost to implicitly suggest the port-forward, but you are right and in this case, more explicit is better as exposing this server could be risky
So, if the vendor will explain how to use |
Because As I mentioned, both metrics and the pprof endpoint are not accessible outside of the cluster. If we agree that hosting pprof handlers on the metrics port constitutes a CVE, then does it mean that an unprotected metrics port is a CVE as well? If yes, how do we fix it?
I literally asked one question.
Thank you for advice. I would advice to keep conversation friendly and constructive. |
@JorTurFer I'll add you to the GHSA so we can discuss details without risking revealing anything sensitive. Others on this thread are already on the GHSA. |
I mean, you could say the same thing about the entirety of Argo CD's configuration (or any configuration at all, for that matter). Many of those toggles are used significantly more frequently than I'll repeat again, that we did literally create the docs in this PR and that this was undocumented before. Both the author and I literally arrived here because we had no idea about this insecure default. We are doing that very communication and explanation in this PR, and improving the maintenance with a security patch.
Once more, the metrics should be authenticated -- and therefore not "unprotected" -- and the metrics are not debug-level information.
You rejected the PR on behalf of all Approvers. You are also a Lead on CD, your comments and actions have more authority than any other person -- please recognize your own position. It's one thing to ask a question in a comment, it is a different thing to reject a security patch outright, suggest it is not a vulnerability at all, and superimpose your opinion, one in which you repeatedly suggested poor security practices ("secure enough"), made incorrect statements ("no one can access it"), and asked to publicly reveal what else might be possible. The amount of security red flags in your statements is quite high.
A constructive discussion also involves engaging with all the material. You rejected this outright in a few short sentences. My disclosure is substantially longer than that, and I provided some details here as well. You did not address the CWE, related CVEs, nor Workflows's own patch what-so-ever. You had several opportunities to do so, and you did not engage with the material, particularly not before rejecting the patch. |
@agilgur5 I've asked for clarification: I'm just trying to understand how exposing the
I disagree with these statements. I believe I've just joined a conversation and asked a question. I must do it as a maintainer. So many personal accusations make it difficult to focus on the point of this PR. I'm proposing to keep the discussion technical. It would really help to keep moving this change forward. |
I strongly suggest moving this conversation to the GHSA to prevent sharing sensitive information or projecting a wrong message |
Can we please keep the discussion technical, and not make any personal attacks here? |
@agilgur5 I think you are confusing a change request with rejection. There is no point in continuing this discussion but fyi a rejection is outright closing the PR which none of the maintainers did. This was a very civil discussion where maintainers of the project put out their opinions(which they are very much entitled to) keeping the best in mind for the project and the users. You might have a differing opinion but there's no justifying your rude behaviour.
That's because all maintainers are part of the approvers github group, my review comment also had the same tag. It does not represent the opinions of all the maintainers of the project neither does it mean a PR is rejected. |
I've never worked in any organization where a code review rejection was not addressed -- they have always been considered blocking denials. In this case two of them said to intentionally make something insecure, which is the most important part of this PR. To be clear, I was appalled, horrified to see that not once, but twice with increasing authority and even more insecure suggestions. The ramifications of that are deeply, deeply concerning to me, so much so that I have been reconsidering contributing to Argo entirely. The security culture is critical to any organization -- it is a pre-requisite for patches to land and land timely. Insecure statements from any position of authority reverberate through an organization and its followers/users. I have an ethical obligation to correct and limit that effect as soon as possible, which is why this has been at the top of my priorities for days (despite this not being a very high risk vuln IMO -- I explained the difference between severity and risk in the disclosure, effectively CVSS v3 vs v4). I am ethically obligated to defend the security of any system I am involved with, as much as humanly possible. In any organization I've ever worked in, one would make those changes or the PR would indeed eventually be closed.
I don't think I was rude. I do think my tone was harsh & forceful, no doubt, I do not disagree. That very much reflects my aghast, shocked reaction plus my ethical obligation. Notably, this does very much feel like tone policing, which is a form of ad hominem attack. Ad hominem attacks are, notably, exclusively non-technical, by definition. Certainly had I known that would be the case beforehand, I may have approached it differently, but that was so unexpected for me at the time that I began panicking ("sh*tting my pants" in American idioms) at the ramifications of such statements. Security statements are of grave importance, particularly when made in public in a position of authority, and should not be made lightly. Further, it very much seemed that the supporting material I provided was not engaged with (whereas it was by SIG Security), which felt disrespectful and incredibly dismissive to me and those authors. If you want to bring in "rudeness", I would say such dismissal is not only very rude but also the first instance of such "rudeness" in this thread, and could very well be considered a violation of the CoC. The material is all exclusively technical, yet was not even addressed here.
I recognized that you did not have access to the disclosure and may not be involved in security matters as such. I was still forceful in my reaction to your comment, per my ethical obligation, but not nearly as much as later since I recognized you did not have context and may not have in-depth experience with these matters. I was intensely more aghast when someone with higher authority who had seen the disclosure (commented there before here) doubled down on it and proceeded to make other insecure statements as well. That made me go from pretty concerned to downright horrified panicking. |
@agilgur5 This is the way GitHub works since Alex is part of the approvers team. When anyone from the approvers team requests changes on the PR, GitHub adds "requested changes on behalf of argocd-approvers" automatically.
I think you are misinterpreting this. "Requesting changes" is not the same thing as "rejecting". I think Alex's intention is simply to ask more questions and gather feedback to reach agreement before approving and merging by other approvers. Let's keep this discussion technical. |
@JorTurFer this looks very promising. Could you please add the capability to enable block and mutex profiling? By setting some envs maybe? I'd say this makes sense as there are lot of locking operations in argocd and it's services. See: https://github.com/DataDog/go-profiler-notes/blob/main/guide%2FREADME.md |
I'll check it. This week I'm quite disconnected due to a business trip but I'll tackle this asap and update the PR with them (or a reason for not adding them if it's not possible xD) |
- merge conflicts fixed by agilgur5 Signed-off-by: Jorge Turrado <[email protected]>
- merge conflicts fixed by agilgur5 Signed-off-by: Jorge Turrado <[email protected]>
- merge conflicts fixed by agilgur5 Signed-off-by: Jorge Turrado <[email protected]>
- merge conflicts fixed by agilgur5 Signed-off-by: Jorge Turrado <[email protected]>
Signed-off-by: Jorge Turrado <[email protected]>
Signed-off-by: Jorge Turrado <[email protected]>
Co-authored-by: Anton Gilgur <[email protected]> Signed-off-by: Jorge Turrado Ferrero <[email protected]>
Signed-off-by: Jorge Turrado <[email protected]>
Co-authored-by: Anton Gilgur <[email protected]> Signed-off-by: Jorge Turrado Ferrero <[email protected]>
Signed-off-by: Jorge Turrado <[email protected]>
Signed-off-by: Jorge Turrado <[email protected]>
Co-authored-by: Anton Gilgur <[email protected]> Signed-off-by: Jorge Turrado Ferrero <[email protected]>
@JorTurFer what happened to this issue? I understand there previously was some heated discussion.. |
Hello |
UPDATE: pprof server was enabled as default ALWAYS, this is a security risk so this PR has removed the default always on integration in favor of an on demand integration.
In order to profile current ArgoCD components, users have to modify the code and deploy their own image into their clusters. Although this works perfectly, it's not really user-friendly.
controller-runtime
added sometime ago a native support for pprof via registering a http server on demand to enable and expose pprof. There are some blockers to integrate that solution (as I shared in the issue), so I've created a pkg that can be used in all ArgoCD components to support this.Fixes #16594
Checklist: