-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EXPORTER] Support handling retry-able errors for OTLP/gRPC #3219
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for opentelemetry-cpp-api-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3219 +/- ##
==========================================
+ Coverage 88.16% 88.21% +0.06%
==========================================
Files 198 198
Lines 6224 6259 +35
==========================================
+ Hits 5487 5521 +34
- Misses 737 738 +1
|
dd0c9db
to
c4ee1dd
Compare
{ | ||
TestTraceService(std::vector<grpc::StatusCode> status_codes) : status_codes_(status_codes) {} | ||
|
||
inline grpc::Status Export( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not seem possible, at least not in a truthful manner, to test this by mocking Export()
on the client side.
Instead, the real exporter is used for this test and the client retry behavior is directly observed on the server side.
@@ -317,7 +317,7 @@ if(BUILD_TESTING) | |||
add_executable(otlp_grpc_exporter_test test/otlp_grpc_exporter_test.cc) | |||
target_link_libraries( | |||
otlp_grpc_exporter_test ${GTEST_BOTH_LIBRARIES} ${CMAKE_THREAD_LIBS_INIT} | |||
${GMOCK_LIB} opentelemetry_exporter_otlp_grpc) | |||
${GMOCK_LIB} opentelemetry_exporter_otlp_grpc gRPC::grpc++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Required for retryable integration tests using real grpc server
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can only link grpc directly when grpc is built as a dynamic library.Or there will be conflict in some environments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was basing it on the fact that opentelemetry_exporter_otlp_grpc_client links against the lib higher in this cmake file.
Should this be made optional or guarded somehow? What would be the right approach?
Update: I tried linking against a static grpc++ lib and it still appears to work. Is there an easy way to reproduce this problem you describe?
@@ -357,6 +363,205 @@ TEST_F(OtlpGrpcExporterTestPeer, ConfigUnknownInsecureFromEnv) | |||
} | |||
# endif | |||
|
|||
# ifndef NO_GETENV | |||
TEST_F(OtlpGrpcExporterTestPeer, ConfigRetryDefaultValues) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defaults matching what is currently implemented in other flavors of OTel (verified dotnet, java, and js). I don't believe rust has a retry policy in place, though.
c4ee1dd
to
651c1da
Compare
dd7ec38
to
7d422cd
Compare
std::uint32_t retry_policy_max_attempts{}; | ||
|
||
/** The initial backoff delay between retry attempts, random between (0, initial_backoff). */ | ||
float retry_policy_initial_backoff{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use std::chrono::duration<>
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had that as a chrono duration initially, but it was not really of any use for otlp/grpc since it just gets passed down to the service config, so it was moved to otlp/http, where it is being required to perform some computations for the backoff.
FYI, implementation in previous commit was like this: cb14857
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my understanding, the exporting of both otlp/http and otlp/grpc will cost much more CPU than type conversion here. I think it's more important to make it clear what this parameters means(We don't know the meaning and the unit of this variable by just the name and comments here), and also float number has EPS and is more imprecise.
What do you think about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, precision is probably very subjective since examples of normal use cases are limited to a single decimal place (and this is how it is formatted here before passing the config settings to grpc library), which seems logical given that measuring backoff in tens of milliseconds or lower is probably a very niche requirement.
I think there is some truth in that chrono duration makes the type more descriptive. Part of the reasoning I went back to float was because I could not find a common place where I could alias this to a more descriptive name without having to repeat it in at least one more header file (for instance, otlp_environment.h and http_client.h).
For now, I will revert/update this in #3223 until it is approved/merged to avoid duplicating all these work in progress changes for common code bits...
@@ -80,6 +80,34 @@ static bool GetStringDualEnvVar(const char *signal_name, | |||
return exists; | |||
} | |||
|
|||
static std::uint32_t GetUintEnvVarOrDefault(opentelemetry::nostd::string_view signal_env, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code changes in this file are also present in #3223
Planning to review and merge the OTLP HTTP PR first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All changes from the above PR were backported here, as well.
The only minor annoyance with the current use of the WITH_OTLP_RETRY_PREVIEW flag on CI is that none of the maintainer builds have WITH_OTLP_GRPC enabled, so the new code would not be unit tested.
Would it be reasonable also enabling this new feature flag, for instance with "cmake.exporter.otprotocol.test", in order to have at least one build that exercises this code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, thanks.
More comments to follow.
# include "opentelemetry/exporters/otlp/protobuf_include_prefix.h" | ||
# include "opentelemetry/nostd/shared_ptr.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move nostd/shared_ptr.h outside of the protobuf include prefix + suffix block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I never bothered to notice that these includes carried special meaning. Good to know!
64b875c
to
0a8bc82
Compare
6a92aa0
to
7f3d420
Compare
7f3d420
to
8677f89
Compare
Fixes #2049
Changes
This change introduces a retry policy in OTLP/gRPC exporter for select failures via the gRPC service config mechanism
The changes to support retries for OTLP/HTTP exporter are addressed in #3223
For significant contributions please make sure you have completed the following items:
CHANGELOG.md
updated for non-trivial changes