-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Teleport 17 Test Plan #48003
Comments
Desktop Access @probakowski
Binaries / OS compatibilityVerify that our software runs on the minimum supported OS versions as per Windows @ravicious
Azure offers virtual machines with the Windows 10 2016 LTSB image. This image runs on Windows 10 macOS @camscale
Linux @camscale
Machine ID @strideynet @timothyb89
With an SSH node registered to the Teleport cluster:
With a Postgres DB registered to the Teleport cluster:
With a Kubernetes cluster registered to the Teleport cluster:
With a HTTP application registered to the Teleport cluster:
Host users creation @eriktateHost users creation docs Host users are considered "managed" when they belong to one of the teleport system groups:
CA rotations @espadolini
Proxy Peering
SSH Connection Resumption @fspmarshallVerify that SSH works, and that resumable SSH is not interrupted across a Teleport Cloud tenant upgrade.
Verify that SSH works, and that resumable SSH is not interrupted across a control plane restart (of either the root or the leaf cluster).
EC2 Discovery @marcoandredinis
Azure Discovery @marcoandredinis
GCP Discovery @atburke
IP Pinning @strideynetAdd a role with
Identity @smallinsky
Teleport SAML Identity Provider @flyinghermitVerify SAML IdP service provider resource management. Docs:
Manage Service Provider (SP)
SAML service provider catalog
Resources |
|
Database Access load test (PostgreSQL and MySQL)SetupRegion: EKS with a single node group:
Teleport cluster (all deployed on the EKS cluster):
Databases:
Note: Databases were configured using discovery running inside the database agent.
MySQL10 connections/second (90th Percentile 70ms)
50 connections/second (90th Percentile 70ms)
PostgreSQL10 connections/second (90th Percentile 87ms)
50 connections/second (90th Percentile 90ms)
|
(Doesn't seem a regression. Likely broken in the last few versions.) |
Performance: ansible-like load on dynamodb on cloudSetupThree proxies on tenant4x nodes in each of usw2, use1, euc1, two auths on tenant8x nodes in euc1, cluster configured for higher number of incoming connections, no SNS for audit logs. Three EKS clusters in usw2, use1, euc1, each with 64 m5.8xlarge nodes running 20k agents each, joining with a static token. Two m7a.48xlarge runners in usw2 and euc1 running tbot and ssh (the ansible-like load), connecting to all 60k nodes and running 4 commands every 360 seconds on average (see The agents were spun up and left alone for about 15 minutes, then the sessions were started and ran for about 30 minutes. ProblemsFrom a cold start (unused Cloud staging tenant) the dynamodb table in used took a while to internally scale, with throttling that lasted for 5 or 10 minutes; no problem after that. The ansible-like setup isn't capable of handling new or dropped agents, and in a first attempt (with clusters running 40 nodes instead of 64) some went missing because the kubernetes node they were on just died; a new set of agents was then spun up, with GOMAXPROCS set to 2 and memory request and limit set to 350Mi, which fit with a bit of extra headroom on 64 nodes and resulted in no more errors. Results |
SSO MFA ceremony breaks |
Performance: ansible-like load on multi-region crdb cloudSetupThree proxies on tenant4x nodes in each of usw2, use1, euc1. Four auths on tenant8x nodes, two in usw2 and two in euc1. Three EKS clusters in usw2, use1, euc1, each with 50 m5.8xlarge nodes running 20k agents each, joining with a static token. Two m7a.48xlarge runners in usw2 and euc1 running tbot and ssh (the ansible-like load), connecting to all 60k nodes and running 4 commands every 360 seconds on average (see The agents were spun up and left alone for a few minutes. The first set of 60k sessions were started, cluster was left to stabilize for a bit, then the second set was started. ProblemsInitial attempt appeared to overload some element of cloud-staging networking stack, resulting in a large number of failed connections with no apparent teleport-originating cause. Subsequent attempts succeeded. Results |
non-blocking: IBM docs are out of date |
Manual Testing Plan
Below are the items that should be manually tested with each release of Teleport.
These tests should be run on both a fresh installation of the version to be released
as well as an upgrade of the previous version of Teleport.
Adding nodes to a cluster @eriktate
Labels @eriktate
Trusted Clusters @bernardjkim
RBAC @eriktate
Make sure that invalid and valid attempts are reflected in audit log. Do this with both Teleport and Agentless nodes.
Verify that custom PAM environment variables are available as expected. @atburke
Users @codingllama
With every user combination, try to login and signup with invalid second
factor, invalid password to see how the system reacts.
WebAuthn in the release
tsh
binary is implemented using libfido2 forlinux/macOS. Ask for a statically built pre-release binary for realistic
tests. (
tsh fido2 diag
should work in our binary.) Webauthn in Windowsbuild is implemented using
webauthn.dll
. (tsh webauthn diag
withsecurity key selected in dialog should work.)
Touch ID requires a signed
tsh
, ask for a signed pre-release binary so youmay run the tests.
Windows Webauthn requires Windows 10 19H1 and device capable of Windows
Hello.
Adding Users OTP
Adding Users WebAuthn
Adding Users via platform authenticator
Managing MFA devices
tsh mfa add
tsh mfa add
tsh mfa add
tsh mfa ls
tsh mfa rm
tsh mfa rm
Login with MFA
tsh mfa add
Login OIDC
Login SAML
Login GitHub
Deleting Users
Backends @rosstimothy
Session Recording @capnspacehook
Enhanced Session Recording @Joerger
disk
,command
andnetwork
events are being logged.enhanced_recording
role option.Auditd @Joerger
teleport/lib/auditd/common.go
Lines 25 to 34 in 7744f72
Audit Log @rosstimothy
Audit log with dynamodb
Audit log with Firestore
Failed login attempts are recorded
Interactive sessions have the correct Server ID
server_id
is the ID of the node in "session_recording: node" modeserver_id
is the ID of the node in "session_recording: proxy" modeforwarded_by
is the ID of the proxy in "session_recording: proxy" modeNode/Proxy ID may be found at
/var/lib/teleport/host_uuid
in thecorresponding machine.
Node IDs may also be queried via
tctl nodes ls
.Exec commands are recorded
scp
commands are recordedSubsystem results are recorded
Subsystem testing may be achieved using both
Recording Proxy mode
and
OpenSSH integration.
Assuming the proxy is
proxy.example.com:3023
andnode1
is a node runningOpenSSH/sshd, you may use the following command to trigger a subsystem audit
log:
sftp -o "ProxyCommand ssh -o 'ForwardAgent yes' -p 3023 %[email protected] -s proxy:%h:%p" root@node1
External Audit Storage @nklaassen
External Audit Storage must be tested on an Enterprise Cloud tenant.
Instructions for deploying a custom release to a cloud staging tenant: https://github.com/gravitational/teleport.e/blob/master/dev-deploy.md
tsh play <session-id>
worksInteract with a cluster using
tsh
@capnspacehookThese commands should ideally be tested for recording and non-recording modes as they are implemented in a different ways.
Interact with a cluster using
ssh
@JoergerMake sure to test both recording and regular proxy modes.
Verify proxy jump functionality @atburke
Log into leaf cluster via root, shut down the root proxy and verify proxy jump works.
Interact with a cluster using the Web UI @atburke
X11 Forwarding @Joerger
xeyes
andxclip
:apt install x11-apps xclip
xeyes
. Thenbrew install xclip
.ssh_service.x11.enabled = yes
tsh ssh -X user@node xeyes
tsh ssh -X root@node xeyes
tsh ssh -Y server01 "echo Hello World | xclip -sel c && xclip -sel c -o"
should print "Hello World"tsh ssh -X server01 "echo Hello World | xclip -sel c && xclip -sel c -o"
should fail with "BadAccess" X errorUser accounting @atburke
/var/run/utmp
on Linux./var/log/wtmp
on Linux.Combinations @Joerger
For some manual testing, many combinations need to be tested. For example, for
interactive sessions the 12 combinations are below.
Add an agentless Node in a local cluster.
Add a Teleport Node in a local cluster.
Add an agentless Node in a remote (leaf) cluster.
Add a Teleport Node in a remote (leaf) cluster.
Teleport with EKS/GKE @tigrato
Teleport with multiple Kubernetes clusters @tigrato
Note: you can use GKE or EKS or minikube to run Kubernetes clusters.
Minikube is the only caveat - it's not reachable publicly so don't run a proxy there.
tsh login
, check thattsh kube ls
has your clusterkubectl get nodes
,kubectl exec -it $SOME_POD -- sh
tsh login
, check thattsh kube ls
has your clusterkubectl get nodes
,kubectl exec -it $SOME_POD -- sh
tsh login
, check thattsh kube ls
has your clusterkubectl get nodes
,kubectl exec -it $SOME_POD -- sh
tsh login
, check thattsh kube ls
has both clusterstsh kube login
kubectl get nodes
,kubectl exec -it $SOME_POD -- sh
on the new clustertsh login
, check thattsh kube ls
has all clustersname
andlabels
Step 2
login value matching the rowsname
columnname
orlabels
in the search bar worksname
columKubernetes exec via WebSockets/SPDY @tigrato
To control usage of websockets on kubectl side environment variable
KUBECTL_REMOTE_COMMAND_WEBSOCKETS
can be used:KUBECTL_REMOTE_COMMAND_WEBSOCKETS=true kubectl -v 8 exec -n namespace podName -- /bin/bash --version
. With-v 8
logging levelyou should be able to see
X-Stream-Protocol-Version: v5.channel.k8s.io
in case kubectl is connected over websockets to Teleport.To do tests you'll need kubectl version at least 1.29, Kubernetes cluster v1.29 or less (doesn't support websockets stream protocol v5)
and cluster v1.30 (does support it by default) and to access them both through kube agent and kubeconfig each.
KUBECTL_REMOTE_COMMAND_WEBSOCKETS=false
KUBECTL_REMOTE_COMMAND_WEBSOCKETS=true
X-Stream-Protocol-Version: v5.channel.k8s.io
)X-Stream-Protocol-Version: v5.channel.k8s.io
)Kubernetes auto-discovery @tigrato
tctl create
.tctl create -f
.tctl rm
.Kubernetes Secret Storage @hugoShaka
Statefulset
Kubernetes Pod RBAC @tigrato
kubernetes_resources
:{"kind":"pod","name":"*","namespace":"*"}
- must allow access to every pod.{"kind":"pod","name":"<somename>","namespace":"*"}
- must allow access to pod<somename>
in every namespace.{"kind":"pod","name":"*","namespace":"<somenamespace>"}
- must allow access to any pod in<somenamespace>
namespace.*
wildcards -<some-name>-*
and regex forname
andnamespace
fields.go-client
.kubernetes_resources
:kubernetes_groups
that denies exec into a podsearch_as_roles
is not allowed.Teleport with FIPS mode @eriktate
ACME @timothyb89
Migrations @timothyb89
SSH should work for both main and old clusters
SSH should work
Command Templates
When interacting with a cluster, the following command templates are useful:
OpenSSH
Teleport
Teleport with SSO Providers
GitHub External SSO @greedy52
tctl sso
family of commands @TenerFor help with setting up sso connectors, check out the [Quick GitHub/SAML/OIDC Setup Tips]
tctl sso configure
helps to construct a valid connector definition:tctl sso configure github ...
creates valid connector definitionstctl sso configure oidc ...
creates valid connector definitionstctl sso configure saml ...
creates valid connector definitionstctl sso test
test a provided connector definition, which can be loaded fromfile or piped in with
tctl sso configure
ortctl get --with-secrets
. Validconnectors are accepted, invalid are rejected with sensible error messages.
tctl sso test
.SSO login on remote host @atburke
tsh
should be running on a remote host (e.g. over an SSH session) and use thelocal browser to complete and SSO login. Run
tsh login --callback <remote.host>:<port> --bind-addr localhost:<port> --auth <auth>
on the remote host. Note that the
--callback
URL must be able to resolve to the--bind-addr
over HTTPS.Teleport Plugins @EdwardDowling @bernardjkim
Teleport Operator @hugoShaka
teleport-cluster
Helm chart and the operator enabledAWS Node Joining @hugoShaka
Docs
ec2:DescribeInstances
permissions for local account:TELEPORT_TEST_EC2=1 go test ./integration -run TestEC2NodeJoin
TELEPORT_TEST_EC2=1 go test ./integration -run TestIAMNodeJoin
Kubernetes Node Joining @bernardjkim
Azure Node Joining @marcoandredinis
Docs
GCP Node Joining @marcoandredinis
Docs
Cloud Labels @marcoandredinis
and with tag
foo
:bar
. Verify that a node running on the instance has labelaws/foo=bar
.foo
:bar
. Verify that a node running on theinstance has label
azure/foo=bar
.and with label
foo
:bar
and tagbaz
:quux
. Verify that a node running on the instance has labelsgcp/label/foo=bar
andgcp/tag/baz=quux
.Passwordless @codingllama
This feature has additional build requirements, so it should be tested with a
pre-release build (eg:
https://cdn.teleport.dev/tsh-v16.0.0-alpha.2.pkg
).This sections complements "Users -> Managing MFA devices".
tsh
binaries foreach operating system (Linux, macOS and Windows) must be tested separately for
FIDO2 items.
Diagnostics
Commands should pass all tests.
tsh fido2 diag
(macOS/Linux)tsh touchid diag
(macOS only)tsh webauthnwin diag
(Windows only)Registration
tsh mfa add
, choose WEBAUTHN andpasswordless)
tsh mfa add
, choose TOUCHID)tsh mfa add
, choose WEBAUTHN andpasswordless)
Login
tsh login --auth=passwordless
)tsh login --auth=passwordless
)tsh login --auth=passwordless --mfa-mode=cross-platform
uses FIDO2tsh login --auth=passwordless --mfa-mode=platform
uses platform authenticatortsh login --auth=passwordless --mfa-mode=auto
prefers platform authenticatorthe same device)
(
auth_service.authentication.passwordless = false
)(
auth_service.authentication.connector_name = passwordless
)(
tsh login --auth=local
)Touch ID support commands
tsh touchid ls
workstsh touchid rm
works (careful, may lock you out!)Device Trust @codingllama
Device Trust requires Teleport Enterprise.
This feature has additional build requirements, so it should be tested with a
pre-release build (eg:
https://cdn.teleport.dev/teleport-ent-v16.0.0-alpha.2-linux-amd64-bin.tar.gz
).Client-side enrollment requires a signed
tsh
for macOS, make sure to use thetsh
binary fromtsh.app
.Additionally, Device Trust Web requires Teleport Connect to be installed (device
authentication for the Web is handled by Connect).
A simple formula for testing device authorization is:
Inventory management
tctl devices add
)tctl devices add --enroll
)tctl devices ls
)tctl devices rm
)tctl devices rm
)tctl devices enroll
)tctl devices enroll
)Device enrollment
Enroll/authn device on macOS (
tsh device enroll
)Enroll/authn device on Windows (
tsh device enroll
)Enroll/authn device on Linux (
tsh device enroll
)Linux users need read/write permissions to /dev/tpmrm0. The simplest way is
to assign yourself to the
tss
group. Seehttps://goteleport.com/docs/access-controls/device-trust/device-management/#troubleshooting.
Verify device extensions on TLS certificate
Note that different accesses have different certificates (Database, Kube,
etc).
Verify device extensions on SSH certificate
Device authentication
tsh or Connect
Web UI (requires Connect)
Confirm that it works by failing first. Most protocols can be tested using
device_trust.mode="required". App Access and Desktop Access require a custom
role (see enforcing device trust).
For SSO users confirm that device web authentication happens successfully.
Device authorization
extensions on login
(device_trust.mode="optional" and role.spec.options.device_trust_mode="required")
and require_session_mfa=true
(root with mode="optional" and leaf with mode="required")
Device audit (see lib/events/codes.go)
Web Authentication Confirmed" events
Corresponding "Device Authenticated" events have both
web_authentication=true and web_authentication_id set.
data (for certificates with device extensions)
Binary support
tsh
for macOS gives a sane errormessage for
tsh device enroll
attempts.Device support commands
tsh device collect
(macOS)tsh device asset-tag
(macOS)tsh device collect
(Windows)tsh device asset-tag
(Windows)tsh device collect
(Linux)tsh device asset-tag
(Linux)Hardware Key Support @Joerger
Hardware Key Support is an Enterprise feature and is not available for OSS.
You will need a YubiKey 4.3+ to test this feature.
This feature has additional build requirements, so it should be tested with a pre-release build (eg:
https://cdn.teleport.dev/teleport-ent-v16.0.0-alpha.2-linux-amd64-bin.tar.gz
).Server Access
This test should be carried out on Linux, MacOS, and Windows.
Set
auth_service.authentication.require_session_mfa: hardware_key_touch
in your cluster auth settings and login.tsh login
tsh ssh
tsh proxy db --tunnel
HSM Support @nklaassen
Docs
Run the full test suite with each HSM/KMS:
Moderated session @eriktate
Create two Teleport users, a moderator and a user. Configure Teleport roles to require that the moderator moderate the user's sessions. Use
TELEPORT_HOME
totsh login
as the user in one terminal, and the moderator in another.Ensure the default
terminationPolicy
ofterminate
has not been changed.For each of the following cases, create a moderated session with the user using
tsh ssh
and join this session with the moderator usingtsh join --role moderator
:Ctrl+C
in the user terminal disconnects the moderator as the session has ended.Ctrl+C
in the moderator terminal disconnects the moderator and terminates the user's session as the session no longer has a moderator.t
in the moderator terminal terminates the session for all participants.Performance @rosstimothy @fspmarshall @espadolini
Scaling Test
Scale up the number of nodes/clusters a few times for each configuration below.
Perform simulated load testing on non-cloud backends
Perform ansible-like load testing on cloud backends
Perform the following additional scaling tests on a single backend:
Soak Test
Run 30 minute soak test directly against direct and tunnel nodes
and via label based matching. Tests should be run against a Cloud
tenant.
Concurrent Session Test
Run a concurrent session test that will spawn 5 interactive sessions per node in the cluster:
Robustness
resources which do not require a moderated session and in async recording
mode from an already issued certificate.
which require a moderated session and in async recording mode from an already
issued certificate.
are restarted.
Teleport with Cloud Providers
AWS @hugoShaka
GCP @marcoandredinis
IBM @hugoShaka
endpoint
parameter #48760 and out of date docs)Application Access @gabrielcorado
debug_app: true
works.name.rootProxyPublicAddr
and well aspublicAddr
.name.rootProxyPublicAddr
.app.session.start
andapp.session.chunk
events are created in the Audit Log.app.session.chunk
points to a 5 minute session archive with multipleapp.session.request
events inside.tsh play <chunk-id>
can fetch and print a session chunk archive.tsh apps login
.tsh
commands.tsh aws
tsh aws --endpoint-url
(this is a hidden flag)tsh apps login
.tsh az
commands.tsh proxy az
andaz
commands.tsh apps login
.tsh gcloud
commands.tsh gsutil
commands.tsh proxy gcloud
andgcloud
/gsutil
commands.tctl create
.tctl create -f
.tctl rm
.Add Application
links to documentation.Database Access @greedy52
Some tests are marked with "coverved by E2E test" and automatically completed
by default. In cases the E2E test is flaky or disabled, deselect the task for
manualy testing.
IMPORTANT: for this round of testing, please pick a different signature
algorithm suite other than the default
legacy
. See RFD 136. @greedy52 @Tener @GavinFrazarselect pg_sleep(10)
followed by ctrl-c is a good query to test.)valkey
if possible) @GavinFrazarassume_role_arn: ""
andexternal_id: "<id>"
assume_role_arn: ""
andexternal_id: "<id>"
Verify all supported modes:
keep
,best_effort_drop
db.session.start
is emitted when you connect.db.session.end
is emitted when you disconnect.db.session.query
is emitted when you execute a SQL query.tsh db ls
shows only databases matching role'sdb_labels
.db_users
.db_names
.db.session.start
is emitted when connection attempt is denied.db_names
.db.session.query
is emitted when command fails due to permissions.tsh db connect
.tctl create
.tctl create -f
.tctl rm
.Please configure discovery in Discovery Service instead of Database Service.
assume_role_arn
andexternal_id
is set.assume_role_arn
andexternal_id
is set.name
,description
,type
, andlabels
Step 2
login value matching the rowsname
columnlabels
tsh bench
load tests (instructions on Notion -> Database Access -> Load test) @Tenertsh play
) @TenerTLS Routing @greedy52
v2
configuration starts only a single listener for proxy service, in contrast withv1
configuration.Given configuration:
*:3080
for proxy service. Given the configuration above, 3022 and 3025 will be opened for other services.v1
, there should be additional ports 3023 and 3024.multiplex
modeauth_service.proxy_listener_mode: "multiplex"
web_proxy_addr == tunnel_addr
tsh db connect
works through proxy running inmultiplex
mode @GavinFrazartsh proxy db
with a GUI client.multiplex
modessh -o "ForwardAgent yes" -o "ProxyCommand tsh proxy ssh" [email protected]
ssh -o "ForwardAgent yes" -o "ProxyCommand tsh proxy ssh --user=%r --cluster=leaf-cluster %h:%p" [email protected]
tsh ssh
access through proxy running in multiplex modemultiplex
mode, usingtsh
multiplex
mode behind L7 load balancer @greedy52tsh login
andtctl
tsh ssh
andtsh config
tsh proxy db
andtsh db connect
tsh proxy app
andtsh aws
tsh proxy kube
The text was updated successfully, but these errors were encountered: