RFD 0193 - Stable UNIX user UIDs #50414

espadolini · 2024-12-19T01:33:51Z

This is the RFD for a new feature that will allow autoprovisioned unix users to be created with the same UID and primary GID across all SSH servers in a cluster.

rosstimothy

Overall looks good, though could you add some of the missing sections(Security, Backwards Compatibility, etc.) from the RFD template and expand on them a bit?

rfd/0193-stable-unix-user.md

rosstimothy · 2025-01-08T16:59:34Z

rfd/0193-stable-unix-user.md

+
+## Goal
+
+After the appropriate setting is enabled in the control plane, all compliant (i.e. up to date) Teleport SSH nodes will query the control plane to know which UID to use when attempting to provision a host user if a Teleport user logs into the machine over SSH with a username that doesn't currently exist as a host user on the machine, with a roleset that allows for host user creation in "keep" mode for the specified machine and username, and with no `host_user_uid` trait. Using the returned UID will be a requirement for both the user and for its primary group: if another host user has the same UID, or a group has the same GID - and thus user creation will fail - the login will also fail. If the host user already exists on the machine, just like the current behavior, the login will just proceed with the existing user. The `host_user_uid` trait, if set, will take priority over the stable UID.


if another host user has the same UID, or a group has the same GID - and thus user creation will fail - the login will also fail

What is the means to resolving this state? Should there be an event or notification emitted to alert admins that manual intervention is needed?

I'm honestly not sure of the extent of how detailed we can make our errors through SSH, but surely the user that can't log in will ask someone for help?

At minimum we should make sure that the ssh service logs are very clear about the problem to reduce support load

I think it's probably out of scope for this RFD, but if we had a reasonable way to surface non-fatal errors after the session is established it would probably be better to inform them that they're running an unexpected UID for their login without breaking the connection. Maybe that's something we could revisit down the line

Co-authored-by: rosstimothy <[email protected]>

rosstimothy

Let's add a note here that this only applies to users provisioned in KEEP mode. What kind of interplay does this have with static_host_users? What would happen if there is not a name conflict, but there is a UID conflict? Should we prevent the static_host_user resource from being created?

espadolini · 2025-01-13T18:56:23Z

Let's add a note here that this only applies to users provisioned in KEEP mode.

It was already there, actually 😅

What would happen if there is not a name conflict, but there is a UID conflict? Should we prevent the static_host_user resource from being created?

It's technically possible for static host users to apply to hosts for which there is no dynamic autoprovisioning, and it's already possible to have a UID conflict between autoprovisioned users and static users (via user traits), so I don't think we should add checks for that.

rosstimothy · 2025-01-16T14:11:06Z

rfd/0193-stable-unix-user.md

+    last_uid: 7019999
+```
+
+Teleport SSH servers will check the `enabled` field to know if the feature is enabled, and - if so - they will query the auth server for the UID to use through a new rpc whenever they need to provision a new host user in "keep" mode with no otherwise defined UID. In the initial implementation, provisioned host groups other than the primary group will be generated according to the default system behavior.


Will this information be dictated to the agent by the control plane once the PDP work is implemented?

The stapled access decision info coming from the control plane to the SSH server are still very much to be determined, but we could pass along a stable UID given some conditions (the user having provisioning enabled on the server, a single username allowed to log in, the feature being enabled and the unix username already having an associated stable UID), and let the server use the RPC defined here to fetch a stable UID if it's missing from the stapled info. In general we shouldn't preemptively define a stable UID for every username, and the control plane can't know if the server already has the username registered in the local user directory.

rfd/0193-stable-unix-user.md

espadolini · 2025-01-16T17:05:28Z

@eriktate friendly ping

eriktate · 2025-01-16T21:51:24Z

LGTM!

This should integrate pretty seamlessly with host user creation since we already support UID overrides via traits 🤞

eriktate · 2025-01-16T21:37:29Z

rfd/0193-stable-unix-user.md

+
+The cluster state storage will contain a bidirectional mapping of usernames and UIDs, consisting of two items per username, one keyed by username at `/stable_unix_user/by_username/<hex username>` containing the UID and one keyed by UID at `/stable_unix_user/by_uid/<encoded uid>` containing the username. The UID encoding consists of transforming the UID by subtracting it to the maximum 32-bit integer (`0x7fff_ffff`), then writing it in hexadecimal big-endian. This allows for reading the occupied UIDs in large-to-small order through a backend range read in ascending order (since we currently can't scan the backend backwards).
+
+If reading `by_username/<hex username>` succeeds, the UID stored in it will be returned (even if outside of the currently defined UID range); otherwise, the next available free UID is searched by issuing a range read with size one from the end to the beginning of the configured range. Such a read will return the biggest UID in use in the range - or nothing, if no UIDs are in range - letting us pick a free UID. Note that this, at least in the initial implementation, will only allow for using the contiguous range of unused UIDs at the end of the configured range. Changes in the UID range configuration might not result in actually using the entire range. This limitation stems from ease of implementation and might be lifted in the future, if such a need arises (as a workaround, the available range can be shrunk to precede any allocated contiguous range of UIDs).


I don't fully understand how these values are stored in the backend yet, but wouldn't the range read operation described here generally just return the first item in /stable_unix_user/by_uid? Since it seems like we would be scanning in ascending order and we aren't allowing deletions?

The keys in by_uid are stored backwards precisely to support reading the largest uid in range with a backend range get (which always reads in ascending order).

eriktate · 2025-01-16T21:45:30Z

rfd/0193-stable-unix-user.md

+
+## Goal
+
+After the appropriate setting is enabled in the control plane, all compliant (i.e. up to date) Teleport SSH nodes will query the control plane to know which UID to use when attempting to provision a host user if a Teleport user logs into the machine over SSH with a username that doesn't currently exist as a host user on the machine, with a roleset that allows for host user creation in "keep" mode for the specified machine and username, and with no `host_user_uid` trait. Using the returned UID will be a requirement for both the user and for its primary group: if another host user has the same UID, or a group has the same GID - and thus user creation will fail - the login will also fail. If the host user already exists on the machine, just like the current behavior, the login will just proceed with the existing user. The `host_user_uid` trait, if set, will take priority over the stable UID.


I think it's probably out of scope for this RFD, but if we had a reasonable way to surface non-fatal errors after the session is established it would probably be better to inform them that they're running an unexpected UID for their login without breaking the connection. Maybe that's something we could revisit down the line

* RFD 0193 - Stable UNIX user UIDs * Compatibility, security and observability * Define the required approvers Co-authored-by: rosstimothy <[email protected]> * List rpc and changes to the allocation strategy * clarify keep mode and interplay with static users * Advise against UID overlaps * rfd spellchecker appeasement * update subcommand name * clarify the use of a time-based cache * only emit an audit log event on create * use plural for backend prefix like other resources --------- Co-authored-by: rosstimothy <[email protected]>

RFD 0193 - Stable UNIX user UIDs

7675077

espadolini added no-changelog Indicates that a PR does not require a changelog entry c-mzz Internal Customer Reference labels Dec 19, 2024

espadolini requested a review from rosstimothy December 19, 2024 01:33

github-actions bot added rfd Request for Discussion size/sm labels Dec 19, 2024

github-actions bot requested review from capnspacehook and probakowski December 19, 2024 01:34

rosstimothy reviewed Jan 6, 2025

View reviewed changes

Compatibility, security and observability

088e602

espadolini requested a review from rosstimothy January 8, 2025 15:48

rosstimothy reviewed Jan 8, 2025

View reviewed changes

Define the required approvers

b93ea3c

Co-authored-by: rosstimothy <[email protected]>

espadolini requested review from eriktate and removed request for probakowski and capnspacehook January 8, 2025 17:12

List rpc and changes to the allocation strategy

24a31d6

espadolini requested a review from rosstimothy January 10, 2025 22:52

rosstimothy reviewed Jan 13, 2025

View reviewed changes

espadolini added 2 commits January 13, 2025 19:43

clarify keep mode and interplay with static users

0c2f056

Advise against UID overlaps

fb33af4

rfd spellchecker appeasement

0d745da

espadolini mentioned this pull request Jan 16, 2025

Stable UNIX users: storage and auth API #51102

Open

espadolini added do-not-merge and removed do-not-merge labels Jan 16, 2025

rosstimothy reviewed Jan 16, 2025

View reviewed changes

espadolini added 3 commits January 16, 2025 18:04

update subcommand name

eaf6724

clarify the use of a time-based cache

64c34e7

only emit an audit log event on create

63fe73a

rosstimothy approved these changes Jan 16, 2025

View reviewed changes

eriktate approved these changes Jan 16, 2025

View reviewed changes

use plural for backend prefix like other resources

7bca6da

espadolini added this pull request to the merge queue Jan 17, 2025

Merged via the queue into master with commit f867cdc Jan 17, 2025
41 checks passed

espadolini deleted the rfd/0193-stable-unix-user branch January 17, 2025 11:58

espadolini mentioned this pull request Jan 17, 2025

Stable UNIX users: functionality #51200

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFD 0193 - Stable UNIX user UIDs #50414

RFD 0193 - Stable UNIX user UIDs #50414

espadolini commented Dec 19, 2024

rosstimothy left a comment

rosstimothy Jan 8, 2025

espadolini Jan 8, 2025

rosstimothy Jan 8, 2025

eriktate Jan 16, 2025

rosstimothy left a comment

espadolini commented Jan 13, 2025

rosstimothy Jan 16, 2025

espadolini Jan 16, 2025

espadolini commented Jan 16, 2025

eriktate commented Jan 16, 2025

eriktate Jan 16, 2025

espadolini Jan 17, 2025

eriktate Jan 16, 2025


		## Goal

		After the appropriate setting is enabled in the control plane, all compliant (i.e. up to date) Teleport SSH nodes will query the control plane to know which UID to use when attempting to provision a host user if a Teleport user logs into the machine over SSH with a username that doesn't currently exist as a host user on the machine, with a roleset that allows for host user creation in "keep" mode for the specified machine and username, and with no `host_user_uid` trait. Using the returned UID will be a requirement for both the user and for its primary group: if another host user has the same UID, or a group has the same GID - and thus user creation will fail - the login will also fail. If the host user already exists on the machine, just like the current behavior, the login will just proceed with the existing user. The `host_user_uid` trait, if set, will take priority over the stable UID.


		The cluster state storage will contain a bidirectional mapping of usernames and UIDs, consisting of two items per username, one keyed by username at `/stable_unix_user/by_username/<hex username>` containing the UID and one keyed by UID at `/stable_unix_user/by_uid/<encoded uid>` containing the username. The UID encoding consists of transforming the UID by subtracting it to the maximum 32-bit integer (`0x7fff_ffff`), then writing it in hexadecimal big-endian. This allows for reading the occupied UIDs in large-to-small order through a backend range read in ascending order (since we currently can't scan the backend backwards).

		If reading `by_username/<hex username>` succeeds, the UID stored in it will be returned (even if outside of the currently defined UID range); otherwise, the next available free UID is searched by issuing a range read with size one from the end to the beginning of the configured range. Such a read will return the biggest UID in use in the range - or nothing, if no UIDs are in range - letting us pick a free UID. Note that this, at least in the initial implementation, will only allow for using the contiguous range of unused UIDs at the end of the configured range. Changes in the UID range configuration might not result in actually using the entire range. This limitation stems from ease of implementation and might be lifted in the future, if such a need arises (as a workaround, the available range can be shrunk to precede any allocated contiguous range of UIDs).

RFD 0193 - Stable UNIX user UIDs #50414

RFD 0193 - Stable UNIX user UIDs #50414

Conversation

espadolini commented Dec 19, 2024

rosstimothy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rosstimothy left a comment

Choose a reason for hiding this comment

espadolini commented Jan 13, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

espadolini commented Jan 16, 2025

eriktate commented Jan 16, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment