-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFD 0193 - Stable UNIX user UIDs #50414
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good, though could you add some of the missing sections(Security, Backwards Compatibility, etc.) from the RFD template and expand on them a bit?
rfd/0193-stable-unix-user.md
Outdated
|
||
## Goal | ||
|
||
After the appropriate setting is enabled in the control plane, all compliant (i.e. up to date) Teleport SSH nodes will query the control plane to know which UID to use when attempting to provision a host user if a Teleport user logs into the machine over SSH with a username that doesn't currently exist as a host user on the machine, with a roleset that allows for host user creation in "keep" mode for the specified machine and username, and with no `host_user_uid` trait. Using the returned UID will be a requirement for both the user and for its primary group: if another host user has the same UID, or a group has the same GID - and thus user creation will fail - the login will also fail. If the host user already exists on the machine, just like the current behavior, the login will just proceed with the existing user. The `host_user_uid` trait, if set, will take priority over the stable UID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if another host user has the same UID, or a group has the same GID - and thus user creation will fail - the login will also fail
What is the means to resolving this state? Should there be an event or notification emitted to alert admins that manual intervention is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm honestly not sure of the extent of how detailed we can make our errors through SSH, but surely the user that can't log in will ask someone for help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At minimum we should make sure that the ssh service logs are very clear about the problem to reduce support load
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's probably out of scope for this RFD, but if we had a reasonable way to surface non-fatal errors after the session is established it would probably be better to inform them that they're running an unexpected UID for their login without breaking the connection. Maybe that's something we could revisit down the line
Co-authored-by: rosstimothy <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a note here that this only applies to users provisioned in KEEP mode. What kind of interplay does this have with static_host_users? What would happen if there is not a name conflict, but there is a UID conflict? Should we prevent the static_host_user resource from being created?
It was already there, actually 😅
It's technically possible for static host users to apply to hosts for which there is no dynamic autoprovisioning, and it's already possible to have a UID conflict between autoprovisioned users and static users (via user traits), so I don't think we should add checks for that. |
last_uid: 7019999 | ||
``` | ||
|
||
Teleport SSH servers will check the `enabled` field to know if the feature is enabled, and - if so - they will query the auth server for the UID to use through a new rpc whenever they need to provision a new host user in "keep" mode with no otherwise defined UID. In the initial implementation, provisioned host groups other than the primary group will be generated according to the default system behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this information be dictated to the agent by the control plane once the PDP work is implemented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stapled access decision info coming from the control plane to the SSH server are still very much to be determined, but we could pass along a stable UID given some conditions (the user having provisioning enabled on the server, a single username allowed to log in, the feature being enabled and the unix username already having an associated stable UID), and let the server use the RPC defined here to fetch a stable UID if it's missing from the stapled info. In general we shouldn't preemptively define a stable UID for every username, and the control plane can't know if the server already has the username registered in the local user directory.
@eriktate friendly ping |
LGTM! This should integrate pretty seamlessly with host user creation since we already support UID overrides via traits 🤞 |
|
||
The cluster state storage will contain a bidirectional mapping of usernames and UIDs, consisting of two items per username, one keyed by username at `/stable_unix_user/by_username/<hex username>` containing the UID and one keyed by UID at `/stable_unix_user/by_uid/<encoded uid>` containing the username. The UID encoding consists of transforming the UID by subtracting it to the maximum 32-bit integer (`0x7fff_ffff`), then writing it in hexadecimal big-endian. This allows for reading the occupied UIDs in large-to-small order through a backend range read in ascending order (since we currently can't scan the backend backwards). | ||
|
||
If reading `by_username/<hex username>` succeeds, the UID stored in it will be returned (even if outside of the currently defined UID range); otherwise, the next available free UID is searched by issuing a range read with size one from the end to the beginning of the configured range. Such a read will return the biggest UID in use in the range - or nothing, if no UIDs are in range - letting us pick a free UID. Note that this, at least in the initial implementation, will only allow for using the contiguous range of unused UIDs at the end of the configured range. Changes in the UID range configuration might not result in actually using the entire range. This limitation stems from ease of implementation and might be lifted in the future, if such a need arises (as a workaround, the available range can be shrunk to precede any allocated contiguous range of UIDs). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully understand how these values are stored in the backend yet, but wouldn't the range read operation described here generally just return the first item in /stable_unix_user/by_uid
? Since it seems like we would be scanning in ascending order and we aren't allowing deletions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The keys in by_uid are stored backwards precisely to support reading the largest uid in range with a backend range get (which always reads in ascending order).
rfd/0193-stable-unix-user.md
Outdated
|
||
## Goal | ||
|
||
After the appropriate setting is enabled in the control plane, all compliant (i.e. up to date) Teleport SSH nodes will query the control plane to know which UID to use when attempting to provision a host user if a Teleport user logs into the machine over SSH with a username that doesn't currently exist as a host user on the machine, with a roleset that allows for host user creation in "keep" mode for the specified machine and username, and with no `host_user_uid` trait. Using the returned UID will be a requirement for both the user and for its primary group: if another host user has the same UID, or a group has the same GID - and thus user creation will fail - the login will also fail. If the host user already exists on the machine, just like the current behavior, the login will just proceed with the existing user. The `host_user_uid` trait, if set, will take priority over the stable UID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's probably out of scope for this RFD, but if we had a reasonable way to surface non-fatal errors after the session is established it would probably be better to inform them that they're running an unexpected UID for their login without breaking the connection. Maybe that's something we could revisit down the line
* RFD 0193 - Stable UNIX user UIDs * Compatibility, security and observability * Define the required approvers Co-authored-by: rosstimothy <[email protected]> * List rpc and changes to the allocation strategy * clarify keep mode and interplay with static users * Advise against UID overlaps * rfd spellchecker appeasement * update subcommand name * clarify the use of a time-based cache * only emit an audit log event on create * use plural for backend prefix like other resources --------- Co-authored-by: rosstimothy <[email protected]>
RFD for #50292
This is the RFD for a new feature that will allow autoprovisioned unix users to be created with the same UID and primary GID across all SSH servers in a cluster.