-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS Connection Implementation #23282
Conversation
0226f3f
to
2121c83
Compare
9b4f8fa
to
8734297
Compare
#[clap(long, env = "AWS_EXTERNAL_CONNECTION_ROLE")] | ||
aws_external_connection_role: Option<String>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jubrad Using AWS_EXTERNAL_CONNECTION_ROLE
to pass the new role to assume. Lmk if you would prefer a different name.
MitigationsCompleting required mitigations increases Resilience Coverage.
Bug Hotspots:
|
4631ad9
to
c609e09
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adapter/SQL changes LGTM.
c609e09
to
a5de426
Compare
Gonna take a look at this by EOD! |
I'd like to standardize on calling these "AWS connections" everywhere possible, without bringing "external" into the picture. "External" came from "external ID", but that's a standalone term of art, and I think it's clearer to call the role here just the "AWS connection role". (If anything, the "external connection role" seems like the customer supplied role, not our internal intermediary role.) Also unplumb the argument from the coordinator, since as mentioned in the review for MaterializeInc#23282 I think we'll want to instead plumb this argument through the ConnectionContext.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed up a big batch of changes. This looks nearly mergeable. Outstanding blockers:
- Fixing the SQL option parsing enum
- Manual testing on staging
Other nice to haves:
- Localstack tests for IAM (mis)configurations
- Reference documentation for
mz_aws_connections
Datum::from(assume_role_session_name), | ||
Datum::from(principal), | ||
Datum::from(external_id.as_deref()), | ||
Datum::from(example_trust_policy.as_ref().map(|p| p.into_element())), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@guswynn I golfed this a little more to get rid of the holder entirely.
src/sql/src/plan/error.rs
Outdated
@@ -231,6 +232,23 @@ pub enum PlanError { | |||
// TODO(benesch): eventually all errors should be structured. | |||
Unstructured(String), | |||
} | |||
#[derive(Clone, Debug)] | |||
pub enum ConnectionParsingError { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a separate type? We don't have any other example of splitting out PlanError
within this crate.
src/sql/src/plan/error.rs
Outdated
Self::ConflictingOptions => f.write_str("invalid CONNECTION: ASSUME ROLE ARN cannot be provided simultaneously with ACCESS KEY ID and SECRET ACCESS KEY"), | ||
Self::MissingRequiredOptions => f.write_str("invalid CONNECTION: must specify either ASSUME ROLE ARN or both ACCESS KEY ID and SECRET ACCESS KEY"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two options have AWS specific error messages but the variants don't indicate that they are AWS specific.
@@ -28,9 +28,11 @@ | |||
NAMESPACE = ENVIRONMENT_NAME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed this composition to "aws" to reflect its new purpose as a general test of AWS-related functionality.
} | ||
} | ||
|
||
Ok(()) | ||
} | ||
|
||
pub(crate) fn validate_by_default(&self) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I substantially refactored the code in this file to have more comments, produce better error messages (e.g., there were several instances of "error!"
with no additional context in user-facing error messages), and to reduce duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this introduces a new DDL syntax, it needs a misc/python/materialize/checks/all_checks/* test, to guard against Mz restarts and upgrades . Consider taking an existing test on secrets from that directory and cloning it a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this introduces a new DDL syntax, it needs a misc/python/materialize/checks/all_checks/* test, to guard against Mz restarts and upgrades . Consider taking an existing test on secrets from that directory and cloning it a bit.
4452da2
to
e1f493a
Compare
pushing a not-quite-finished commit to test something |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@guswynn I can't figure out why the AWS SDK isn't automatically figuring out the region? Do you have a reference for that?
It looks like maybe something got fixed in the latest version of the AWS SDK around regions and AssumeRole
? Eyeballing smithy-lang/smithy-rs#3014 and it seems like it could definitely help us.
@@ -231,6 +241,9 @@ impl AwsAssumeRole { | |||
if let Some(external_id) = external_id { | |||
credentials = credentials.external_id(external_id); | |||
} | |||
if let Some(region) = region { | |||
credentials = credentials.region(region); | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And especially spooked that we need to override the region in the AssumeRole builder. This feels like something is wrong!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the first one on line 226 is required, but I agree that its cursed (perhaps fixed by the pr you linked above?)
I followed https://github.com/MaterializeInc/materialize/blob/main/src/persist/src/s3.rs#L134-L137, and based my understanding on seeing this: https://docs.rs/aws-config/0.55.1/src/aws_config/lib.rs.html#563-567
Its possible the second AssumeRoleProdider
doesn't need it, do you want me to try that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pushed a commit removing this second one to test this, after the build finishes ill test in staging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rip, its required
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You didn't bump the AWS SDK version did you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benesch I didn't, because the next bump is from 0.x to 1.x and I don't want to block this pr on that change
@@ -341,14 +358,18 @@ impl AwsConnection { | |||
// case of failure. | |||
let _ = sts_client.get_caller_identity().send().await?; | |||
|
|||
let region = match &self.region { | |||
Some(region_name) => Some(Region::new(region_name.clone())), | |||
None => region::default_provider().region().await, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit spooked that this was necessary
Changes to accept either credentials or assume role proto implement validate fix tests role chaining Fix validate connection Added table aws_connections cloud test Adding more columns to mz_aws_connections from updated design temp refactor Fix the access_key_id_secret_id Fixing the assume role validation Adding testdrive tests Added TODO Getting materialize external connection role from environmentd Added tests for failure scenarios Formatting docs python lint update design doc Fixing proto lint Adding #[serde(skip)] to aws_principal_context in catalog state Adding Deserialize trait update comment Fix tests
Co-authored-by: Nikhil Benesch <[email protected]> Removing mz_principal
revert Refactor aws connection Moving tests to secret-aws-secret-manager specifying column names Refactor parsing errors clippy fix Added TODOs to restructure errors revert not required changes fix test
f1605f7
to
efc1fd5
Compare
@philip-stoev pushed a commit with a basic check for these connections and their syntax |
I didn't get a chance to do the AWS SDK region upgrade as I hoped, so I'm good with proceeding with this as is for now. |
748e730
to
f21813e
Compare
f21813e
to
1467fae
Compare
1467fae
to
587223b
Compare
Thanks for all the efforts and getting this merged 🙏, this would have been deep in merge conflict land otherwise. |
Motivation
Implementation for https://github.com/MaterializeInc/database-issues/issues/6945
Depends upon https://github.com/MaterializeInc/cloud/pull/8224
Tips for reviewer
Comments inline.
Split out a PR to just accept the new environmentd cli arg #23626, in case the current PR is delayed. Will rebase once that's merged.
This PR is making backwards incompatible changes to the
CREATE CONNECTION
SQL for AWS connection. This has been put behind a feature flag and nobody's using it yet, so it should be fine.Currently changes are still required on the cloud's side to pass a new AWS role arn to environmentd https://github.com/MaterializeInc/cloud/pull/8224. I am trying to get the materialize changes in before I go on a vacation next week.
I will create a separate docs PR when this goes to private preview.
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.cc @benesch @jubrad @alex-hunt-materialize @hlburak