Teleport 17 Test Plan #48003

r0mant · 2024-10-28T15:29:10Z

Manual Testing Plan

Below are the items that should be manually tested with each release of Teleport.
These tests should be run on both a fresh installation of the version to be released
as well as an upgrade of the previous version of Teleport.

User accounting @atburke

Verify that active interactive sessions are tracked in /var/run/utmp on Linux.
Verify that interactive sessions are logged in /var/log/wtmp on Linux.

Combinations @Joerger

For some manual testing, many combinations need to be tested. For example, for
interactive sessions the 12 combinations are below.

Add an agentless Node in a local cluster.
- Connect using OpenSSH.
- Connect using Teleport.
- Connect using the Web UI.
- Remove the Node (but keep its custom CA in sshd config).
  - Verify that it fails to connect when using OpenSSH.
  - Verify that it fails to connect when using Teleport.
  - Verify that it fails to connect when using the Web UI.
Add a Teleport Node in a local cluster.
- Connect using OpenSSH.
- Connect using Teleport.
- Connect using the Web UI.
Add an agentless Node in a remote (leaf) cluster.
- Connect using OpenSSH from root cluster.
- Connect using Teleport from root cluster.
- Connect using the Web UI from root cluster.
- Remove the Node (but keep its custom CA in sshd config).
  - Verify that it fails to connect when using OpenSSH from root cluster.
  - Verify that it fails to connect when using Teleport from root cluster.
  - Verify that it fails to connect when using the Web UI from root cluster.
Add a Teleport Node in a remote (leaf) cluster.
- Connect using OpenSSH from root cluster.
- Connect using Teleport from root cluster.
- Connect using the Web UI from root cluster.

Teleport with EKS/GKE @tigrato

Deploy Teleport on a single EKS cluster
Deploy Teleport on two EKS clusters and connect them via trusted cluster feature
Deploy Teleport Proxy outside GKE cluster fronting connections to it (use this script to generate a kubeconfig)
Deploy Teleport Proxy outside EKS cluster fronting connections to it (use this script to generate a kubeconfig)

Teleport with multiple Kubernetes clusters @tigrato

Note: you can use GKE or EKS or minikube to run Kubernetes clusters.
Minikube is the only caveat - it's not reachable publicly so don't run a proxy there.

Kubernetes exec via WebSockets/SPDY @tigrato

To control usage of websockets on kubectl side environment variable KUBECTL_REMOTE_COMMAND_WEBSOCKETS can be used:
KUBECTL_REMOTE_COMMAND_WEBSOCKETS=true kubectl -v 8 exec -n namespace podName -- /bin/bash --version. With -v 8 logging level
you should be able to see X-Stream-Protocol-Version: v5.channel.k8s.io in case kubectl is connected over websockets to Teleport.
To do tests you'll need kubectl version at least 1.29, Kubernetes cluster v1.29 or less (doesn't support websockets stream protocol v5)
and cluster v1.30 (does support it by default) and to access them both through kube agent and kubeconfig each.

Kubernetes auto-discovery @tigrato

Kubernetes Secret Storage @hugoShaka

Kubernetes Secret storage for Agent's Identity
- Install Teleport agent with a short-lived token
  - Validate if the Teleport is installed as a Kubernetes Statefulset
  - Restart the agent after token TTL expires to see if it reuses the same identity.
- Force cluster CA rotation

Kubernetes Pod RBAC @tigrato

Teleport with FIPS mode @eriktate

Perform trusted clusters, Web and SSH sanity check with all teleport components deployed in FIPS mode.

ACME @timothyb89

Teleport can fetch TLS certificate automatically using ACME protocol.

Migrations @timothyb89

Migrate trusted clusters
- Migrate auth server on main cluster, then rest of the servers on main cluster
  SSH should work for both main and old clusters
- Migrate auth server on remote cluster, then rest of the remote cluster
  SSH should work

Command Templates

When interacting with a cluster, the following command templates are useful:

OpenSSH

# when connecting to the recording proxy, `-o 'ForwardAgent yes'` is required.
ssh -o "ProxyCommand ssh -o 'ForwardAgent yes' -p 3023 %[email protected] -s proxy:%h:%p" \
  node.example.com

# the above command only forwards the agent to the proxy, to forward the agent
# to the target node, `-o 'ForwardAgent yes'` needs to be passed twice.
ssh -o "ForwardAgent yes" \
  -o "ProxyCommand ssh -o 'ForwardAgent yes' -p 3023 %[email protected] -s proxy:%h:%p" \
  node.example.com

# when connecting to a remote cluster using OpenSSH, the subsystem request is
# updated with the name of the remote cluster.
ssh -o "ProxyCommand ssh -o 'ForwardAgent yes' -p 3023 %[email protected] -s proxy:%h:%[email protected]" \
  node.foo.com

Teleport

# when connecting to a OpenSSH node, remember `-p 22` needs to be passed.
tsh --proxy=proxy.example.com --user=<username> --insecure ssh -p 22 node.example.com

# an agent can be forwarded to the target node with `-A`
tsh --proxy=proxy.example.com --user=<username> --insecure ssh -A -p 22 node.example.com

# the --cluster flag is used to connect to a node in a remote cluster.
tsh --proxy=proxy.example.com --user=<username> --insecure ssh --cluster=foo.com -p 22 node.foo.com

Teleport with SSO Providers

GitHub External SSO @greedy52

Teleport OSS
- GitHub organization without external SSO succeeds
- GitHub organization with external SSO fails
Teleport Enterprise
- GitHub organization without external SSO succeeds
- GitHub organization with external SSO succeeds

`tctl sso` family of commands @Tener

For help with setting up sso connectors, check out the [Quick GitHub/SAML/OIDC Setup Tips]

tctl sso configure helps to construct a valid connector definition:

tctl sso configure github ... creates valid connector definitions
tctl sso configure oidc ... creates valid connector definitions
tctl sso configure saml ... creates valid connector definitions

tctl sso test test a provided connector definition, which can be loaded from
file or piped in with tctl sso configure or tctl get --with-secrets. Valid
connectors are accepted, invalid are rejected with sensible error messages.

SSO login on remote host @atburke

SSO login on a remote host

tsh should be running on a remote host (e.g. over an SSH session) and use the
local browser to complete and SSO login. Run
tsh login --callback <remote.host>:<port> --bind-addr localhost:<port> --auth <auth>
on the remote host. Note that the --callback URL must be able to resolve to the
--bind-addr over HTTPS.

Teleport Plugins @EdwardDowling @bernardjkim

Teleport Operator @hugoShaka

Test deploying a Teleport cluster with the teleport-cluster Helm chart and the operator enabled
Test deploying a standalone operator against Teleport Cloud
Test that operator can reconcile
- TeleportUser
- TeleportRole
- TeleportProvisionToken

AWS Node Joining @hugoShaka

Docs

On EC2 instance with ec2:DescribeInstances permissions for local account:
TELEPORT_TEST_EC2=1 go test ./integration -run TestEC2NodeJoin
On EC2 instance with any attached role:
TELEPORT_TEST_EC2=1 go test ./integration -run TestIAMNodeJoin
EC2 Join method in IoT mode with node and auth in different AWS accounts
IAM Join method in IoT mode with node and auth in different AWS accounts

Kubernetes Node Joining @bernardjkim

Join a Teleport node running in the same Kubernetes cluster via a Kubernetes in-cluster ProvisionToken
Join a tbot instance running in a different Kubernetes cluster as Teleport with a Kubernetes JWKS ProvisionToken

Azure Node Joining @marcoandredinis

Docs

Join a Teleport node running in an Azure VM

GCP Node Joining @marcoandredinis

Docs

Join a Teleport node running in a GCP VM.

Cloud Labels @marcoandredinis

Create an EC2 instance with tags in instance metadata enabled
and with tag foo: bar. Verify that a node running on the instance has label
aws/foo=bar.
Create an Azure VM with tag foo: bar. Verify that a node running on the
instance has label azure/foo=bar.
Create a GCP instance with the required permissions
and with label
foo: bar and tag
baz: quux. Verify that a node running on the instance has labels
gcp/label/foo=bar and gcp/tag/baz=quux.

Passwordless @codingllama

This feature has additional build requirements, so it should be tested with a
pre-release build (eg: https://cdn.teleport.dev/tsh-v16.0.0-alpha.2.pkg).

This sections complements "Users -> Managing MFA devices". tsh binaries for
each operating system (Linux, macOS and Windows) must be tested separately for
FIDO2 items.

Device Trust @codingllama

Device Trust requires Teleport Enterprise.

This feature has additional build requirements, so it should be tested with a
pre-release build (eg: https://cdn.teleport.dev/teleport-ent-v16.0.0-alpha.2-linux-amd64-bin.tar.gz).

Client-side enrollment requires a signed tsh for macOS, make sure to use the
tsh binary from tsh.app.

Additionally, Device Trust Web requires Teleport Connect to be installed (device
authentication for the Web is handled by Connect).

A simple formula for testing device authorization is:

# Before enrollment.
# Replace with other kinds of access, as appropriate (db, kube, etc)
tsh ssh node-that-requires-device-trust
> ERROR: ssh: rejected: administratively prohibited (unauthorized device)

# Register/enroll the device.
tsh device enroll --current-device
tsh logout; tsh login

# After enrollment
tsh ssh node-that-requires-device-trust
> $

Hardware Key Support @Joerger

Hardware Key Support is an Enterprise feature and is not available for OSS.

You will need a YubiKey 4.3+ to test this feature.

This feature has additional build requirements, so it should be tested with a pre-release build (eg: https://cdn.teleport.dev/teleport-ent-v16.0.0-alpha.2-linux-amd64-bin.tar.gz).

Server Access

This test should be carried out on Linux, MacOS, and Windows.

Set auth_service.authentication.require_session_mfa: hardware_key_touch in your cluster auth settings and login.

tsh login
- Prompts for Yubikey touch with message "Tap your YubiKey" (separate from normal MFA prompt).
Server Access tsh ssh
- Requires yubikey to be connected
- Prompts for touch (if not cached)
Database Access: tsh proxy db --tunnel
- Requires yubikey to be connected
- Prompts for touch (if not cached)

HSM Support @nklaassen

Docs

Run the full test suite with each HSM/KMS:

$ make run-etcd # in background shell
$
$ # test YubiHSM
$ yubihsm-connector -d # in a background shell
$ cat /etc/yubihsm_pkcs11.conf
# /etc/yubihsm_pkcs11.conf
connector = http://127.0.0.1:12345
debug
$ TELEPORT_TEST_YUBIHSM_PKCS11_PATH=/usr/local/lib/pkcs11/yubihsm_pkcs11.dylib TELEPORT_TEST_YUBIHSM_PIN=0001password YUBIHSM_PKCS11_CONF=/etc/yubihsm_pkcs11.conf go test ./lib/auth/keystore -v --count 1
$ TELEPORT_TEST_YUBIHSM_PKCS11_PATH=/usr/local/lib/pkcs11/yubihsm_pkcs11.dylib TELEPORT_TEST_YUBIHSM_PIN=0001password YUBIHSM_PKCS11_CONF=/etc/yubihsm_pkcs11.conf TELEPORT_ETCD_TEST=1 go test ./integration/hsm -v --count 1 --timeout 20m # this takes ~12 minutes
$
$ # test AWS KMS
$ # login in to AWS locally
$ AWS_ACCOUNT="$(aws sts get-caller-identity | jq -r '.Account')"
$ TELEPORT_TEST_AWS_KMS_ACCOUNT="${AWS_ACCOUNT}" TELEPORT_TEST_AWS_KMS_REGION=us-west-2 go test ./lib/auth/keystore -v --count 1
$ TELEPORT_TEST_AWS_KMS_ACCOUNT="${AWS_ACCOUNT}" TELEPORT_TEST_AWS_KMS_REGION=us-west-2 TELEPORT_ETCD_TEST=1 go test ./integration/hsm -v --count 1
$
$ # test AWS CloudHSM
$ # set up the CloudHSM cluster and run this on an EC2 that can reach it
$ TELEPORT_TEST_CLOUDHSM_PIN="<CU_username>:<CU_password>" go test ./lib/auth/keystore -v --count 1
$ TELEPORT_TEST_CLOUDHSM_PIN="<CU_username>:<CU_password>" TELEPORT_ETCD_TEST=1 go test ./integration/hsm -v --count 1
$
$ # test GCP KMS
$ # login in to GCP locally
$ TELEPORT_TEST_GCP_KMS_KEYRING=projects/<account>/locations/us-west3/keyRings/<keyring> go test ./lib/auth/keystore -v --count 1
$ TELEPORT_TEST_GCP_KMS_KEYRING=projects/<account>/locations/us-west3/keyRings/<keyring> TELEPORT_ETCD_TEST=1 go test ./integration/hsm -v --count 1

Moderated session @eriktate

Create two Teleport users, a moderator and a user. Configure Teleport roles to require that the moderator moderate the user's sessions. Use TELEPORT_HOME to tsh login as the user in one terminal, and the moderator in another.

Ensure the default terminationPolicy of terminate has not been changed.

For each of the following cases, create a moderated session with the user using tsh ssh and join this session with the moderator using tsh join --role moderator:

Ensure that Ctrl+C in the user terminal disconnects the moderator as the session has ended.
Ensure that Ctrl+C in the moderator terminal disconnects the moderator and terminates the user's session as the session no longer has a moderator.
Ensure that t in the moderator terminal terminates the session for all participants.

Performance @rosstimothy @fspmarshall @espadolini

Scaling Test

Scale up the number of nodes/clusters a few times for each configuration below.

Verify that there are no memory/goroutine/file descriptor leaks
Compare the baseline metrics with the previous release to determine if resource usage has increased
Restart all Auth instances and verify that all nodes/clusters reconnect

Perform simulated load testing on non-cloud backends

etcd - 30k
Firestore - 30k
Postgres - 30k

Perform ansible-like load testing on cloud backends

DynamoDB
CRDB

Perform the following additional scaling tests on a single backend:

500 trusted clusters.

Soak Test

Run 30 minute soak test directly against direct and tunnel nodes
and via label based matching. Tests should be run against a Cloud
tenant.

tsh bench ssh --duration=30m user@direct-dial-node ls
tsh bench ssh --duration=30m user@reverse-tunnel-node ls
tsh bench ssh --duration=30m user@foo=bar ls
tsh bench ssh --duration=30m --random user@foo ls

Concurrent Session Test

Cluster with 1k reverse tunnel nodes

Run a concurrent session test that will spawn 5 interactive sessions per node in the cluster:

tsh bench web sessions --max=5000 user ls
tsh bench web sessions --max=5000 --web user ls

Verify that all 5000 sessions are able to be established.
Verify that tsh and the web UI are still functional.

Robustness

Connectivity Issues:

Verify that a lack of connectivity to Auth does not prevent access to
resources which do not require a moderated session and in async recording
mode from an already issued certificate.
Verify that a lack of connectivity to Auth prevents access to resources
which require a moderated session and in async recording mode from an already
issued certificate.
Verify that an open session is not terminated when all Auth instances
are restarted.

Teleport with Cloud Providers

AWS @hugoShaka

Deploy Teleport to AWS. Using DynamoDB & S3
Deploy Teleport Enterprise to AWS. Using HA Setup https://goteleport.com/docs/deploy-a-cluster/deployments/aws-ha-autoscale-cluster-terraform/

GCP @marcoandredinis

Deploy Teleport to GCP. Using Cloud Firestore & Cloud Storage
Deploy Teleport to GKE. Google Kubernetes engine.
Deploy Teleport Enterprise to GCP.

IBM @hugoShaka

Deploy Teleport to IBM Cloud. Using IBM Database for etcd & IBM Object Store (broken: Teleport does not seem to honour the s3 endpoint parameter #48760 and out of date docs)
Deploy Teleport to IBM Cloud Kubernetes.
Deploy Teleport Enterprise to IBM Cloud.

Application Access @gabrielcorado

Database Access @greedy52

Some tests are marked with "coverved by E2E test" and automatically completed
by default. In cases the E2E test is flaky or disabled, deselect the task for
manualy testing.

IMPORTANT: for this round of testing, please pick a different signature
algorithm suite other than the default legacy. See RFD 136. @greedy52 @Tener @GavinFrazar

TLS Routing @greedy52

Verify that teleport proxy v2 configuration starts only a single listener for proxy service, in contrast with v1 configuration.
Given configuration:

version: v2
proxy_service:
  enabled: "yes"
  public_addr: ['root.example.com']
  web_listen_addr: 0.0.0.0:3080

There should be total of three listeners, with only *:3080 for proxy service. Given the configuration above, 3022 and 3025 will be opened for other services.

lsof -i -P | grep teleport | grep LISTEN
  teleport  ...  TCP *:3022 (LISTEN)
  teleport  ...  TCP *:3025 (LISTEN)
  teleport  ...  TCP *:3080 (LISTEN) # <-- proxy service

In contrast for the same configuration with version v1, there should be additional ports 3023 and 3024.

lsof -i -P | grep teleport | grep LISTEN
  teleport  ...  TCP *:3022 (LISTEN)
  teleport  ...  TCP *:3025 (LISTEN)
  teleport  ...  TCP *:3023 (LISTEN) # <-- extra proxy service port
  teleport  ...  TCP *:3024 (LISTEN) # <-- extra proxy service port
  teleport  ...  TCP *:3080 (LISTEN) # <-- proxy service

Run Teleport Proxy in multiplex mode auth_service.proxy_listener_mode: "multiplex"
- Trusted cluster
  - Setup trusted clusters using single port setup web_proxy_addr == tunnel_addr
```
kind: trusted_cluster
spec:
  ...
  web_proxy_addr: root.example.com:443
  tunnel_addr: root.example.com:443
  ...
```
Database Access
- Verify that tsh db connect works through proxy running in multiplex mode @GavinFrazar
  - Postgres
  - MySQL
  - MariaDB
  - MongoDB
  - CockroachDB
  - Redis
  - MSSQL @greedy52
  - Snowflake
  - Elasticsearch.
  - OpenSearch. @greedy52
  - Cassandra/ScyllaDB.
  - Oracle.
- Verify connecting to a database through TLS ALPN SNI local proxy tsh proxy db with a GUI client.
- Verify connecting to a database through Teleport Connect.
Application Access @gabrielcorado
- Verify app access through proxy running in multiplex mode
SSH Access @greedy52
- Connect to a OpenSSH server through a local ssh proxy ssh -o "ForwardAgent yes" -o "ProxyCommand tsh proxy ssh" [email protected]
- Connect to a OpenSSH server on leaf-cluster through a local ssh proxyssh -o "ForwardAgent yes" -o "ProxyCommand tsh proxy ssh --user=%r --cluster=leaf-cluster %h:%p" [email protected]
- Verify tsh ssh access through proxy running in multiplex mode
Kubernetes access: @greedy52
- Verify kubernetes access through proxy running in multiplex mode, using tsh
- Verify kubernetes access through Teleport Connect
Teleport Proxy single port multiplex mode behind L7 load balancer @greedy52
- Agent can join through Proxy and maintain reverse tunnel
- tsh login and tctl
- SSH Access: tsh ssh and tsh config
- Database Access: tsh proxy db and tsh db connect
- Application Access: tsh proxy app and tsh aws
- Kubernetes Access: tsh proxy kube

The text was updated successfully, but these errors were encountered:

r0mant · 2024-10-28T15:29:35Z

Desktop Access @probakowski

Binaries / OS compatibility

Verify that our software runs on the minimum supported OS versions as per
https://goteleport.com/docs/installation/#operating-system-support

Windows @ravicious

tsh runs on the minimum supported Windows version
Teleport Connect runs on the minimum supported Windows version

Azure offers virtual machines with the Windows 10 2016 LTSB image. This image runs on Windows 10
rev. 1607, which is the exact minimum Windows version that we support.

macOS @camscale

tsh runs on the minimum supported macOS version
tctl runs on the minimum supported macOS version
teleport runs on the minimum supported macOS version
tbot runs on the minimum supported macOS version
Teleport Connect runs on the minimum supported macOS version

Linux @camscale

tsh runs on the minimum supported Linux version
tctl runs on the minimum supported Linux version
teleport runs on the minimum supported Linux version
tbot runs on the minimum supported Linux version
Teleport Connect runs on the minimum supported Linux version

Machine ID @strideynet @timothyb89

Verify you are able to create a new bot user with tctl bots add robot --roles=access. Follow the instructions provided in the output to start tbot
- Directly connecting to the auth server
- Connecting to the auth server via the proxy reverse tunnel
Verify that after the renewal period (default 20m, but this can be reduced via configuration), that newly generated certificates are placed in the destination directory
Verify that sending both SIGUSR1 and SIGHUP to a running tbot process causes a renewal and new certificates to be generated

With an SSH node registered to the Teleport cluster:

Verify you are able to connect to the SSH node using openssh with the generated ssh_config in the destination directory
Verify you are able to connect to the SSH node using tsh with the identity file in the destination directory
Verify you are able to connect to the SSH node using the SSH multiplexer service

With a Postgres DB registered to the Teleport cluster:

Verify you are able to interact with a database using tbot db connect with a database output
Verify you are able to connect to the database using tbot proxy db with a database output
Verify you are able to produce an authenticated tunnel using tbot proxy db --tunnel with a database output and then able to connect to the database through the tunnel without credentials

With a Kubernetes cluster registered to the Teleport cluster:

Verify the kubeconfig produced by a Kubernetes output can be used to run basic commands (e.g kubectl get pods)
- With ALPN routing
- Without ALPN routing

With a HTTP application registered to the Teleport cluster:

Verify the certificates produced by an application output can be used directly against the proxy (e.g curl --cert ./out/tlscert --key ./out/key https://httpbin.teleport.example.com/headers)
Verify you are able to produce an authenticated tunnel using tbot proxy app httpbin with an application output and then able to connect to the application through the tunnel without credentials curl localhost:port/headers

Host users creation @eriktate

Host users creation docs
Host users creation RFD

Host users are considered "managed" when they belong to one of the teleport system groups: teleport-system, teleport-keep. Users outside of these groups are considered "unmanaged". Any users in the teleport-static group are
also managed, but not considered for role-based host user creation.

Verify host users creation functionality
- non-existing users are created automatically
- non-existing configured groups are created automatically
- users are added to groups
  - created and/or managed users are added to the teleport-system group when create_host_user_mode: "insecure-drop"
  - created and/or managed users are added to the teleport-keep group when create_host_user_mode: "keep"
- managed users have their groups reconciled to reflect any host_groups changes (additions and removals)
- failure to create or update host users does not bail out of SSH connections when host user already exists (can be forced by setting create_host_user_mode: "off")
- users belonging to teleport-system are cleaned up after their session ends
  - cleanup occurs if a program was left running after session ends
- users belonging to teleport-keep are not cleaned up after their session ends
- sudoers file creation is successful
  - invalid sudoers files are not created
  - failure to write sudoers file, such as for invalid entries, does not bail out of SSH connections
- unmanaged host users are accessible over SSH
  - unmanaged host users are not modified when teleport-keep is not included in host_groups
  - unmanaged host users are modified when teleport-keep is included in host_groups
- setting disable_create_host_user: true stops user creation from occurring
- setting create_host_user_default_shell: <bash, zsh, fish, etc.> should set the default shell for a newly created host user to the chosen shell (validated by confirming shell path has been written to the end of the user's record in /etc/passwd)

CA rotations @espadolini

Verify the CA rotation functionality itself (by checking in the backend or with tctl get cert_authority)
- standby phase: only active_keys, no additional_trusted_keys
- init phase: active_keys and additional_trusted_keys
- update_clients and update_servers phases: the certs from the init phase are swapped
- standby phase: only the new certs remain in active_keys, nothing in additional_trusted_keys
- rollback phase (second pass, after completing a regular rotation): same content as in the init phase
- standby phase after rollback: same content as in the previous standby phase
- Changing signature_algorithm_suite should change the algorithm used by new CA issuers when entering init. Only issued certificates change algorithm if the suite is changed at other times.
- Even after changing signature_algorithm_suite, entering the rollback phase correctly restores the original issuer, no matter the algorithm.
Verify functionality in all phases (clients might have to log in again in lieu of waiting for credentials to expire between phases)
- SSH session in tsh from a previous phase
- SSH session in web UI from a previous phase
- New SSH session with tsh
- New SSH session with web UI
- New SSH session in a child cluster on the same major version
- New SSH session in a child cluster on the previous major version
- New SSH session from a parent cluster
- Application access through a browser
- Application access through curl with tsh apps login
- kubectl get po after tsh kube login
- Database access (no configuration change should be necessary if the database CA isn't rotated, other Teleport functionality should not be affected if only the database CA is rotated)

Proxy Peering

Proxy Peering docs

Verify that Proxy Peering works for the following protocols:
- SSH @atburke
- Kubernetes @tigrato
- Database @greedy52
- Windows Desktop @probakowski
- App Access @gabrielcorado

SSH Connection Resumption @fspmarshall

Verify that SSH works, and that resumable SSH is not interrupted across a Teleport Cloud tenant upgrade.

	Standard node	Non-resuming node	Peered node	Agentless node
`tsh ssh`
`tsh ssh --no-resume`
Teleport Connect
Web UI (not resuming)
OpenSSH (standard `tsh config`)
OpenSSH (changing `ProxyCommand` to `tsh proxy ssh --no-resume`)

Verify that SSH works, and that resumable SSH is not interrupted across a control plane restart (of either the root or the leaf cluster).

	Tunnel node	Direct dial node
`tsh ssh`
`tsh ssh --no-resume`
`tsh ssh` (from a root cluster)
`tsh ssh --no-resume` (from a root cluster)
OpenSSH (without `ProxyCommand`)	n/a
OpenSSH's `ssh-keyscan`	n/a

EC2 Discovery @marcoandredinis

EC2 Discovery docs

Verify EC2 instance discovery
- Only EC2 instances matching given AWS tags have the installer executed on them
- Only the IAM permissions mentioned in the discovery docs are required for operation
- Custom scripts specified in different matchers are executed
- Custom SSM documents specified in different matchers are executed
- New EC2 instances with matching AWS tags are discovered and added to the teleport cluster
  - Large numbers of EC2 instances (51+) are all successfully added to the cluster
- Nodes that have been discovered do not have the install script run on the node multiple times

Azure Discovery @marcoandredinis

Azure Discovery docs

Verify Azure VM discovery
- Only Azure VMs matching given Azure tags have the installer executed on them
- Only the IAM permissions mentioned in the discovery docs are required for operation
- Custom scripts specified in different matchers are executed
- New Azure VMs with matching Azure tags are discovered and added to the teleport cluster
  - Large numbers of Azure VMs (51+) are all successfully added to the cluster
- Nodes that have been discovered do not have the install script run on the node multiple times

GCP Discovery @atburke

GCP Discovery docs

Verify GCP instance discovery
- Only GCP instances matching given GCP tags have the installer executed on them
- Only the IAM permissions mentioned in the discovery docs are required for operation
- Custom scripts specified in different matchers are executed
- New GCP instances with matching GCP tags are discovered and added to the teleport cluster
  - Large numbers of GCP instances (51+) are all successfully added to the cluster
- Nodes that have been discovered do not have the install script run on the node multiple times

IP Pinning @strideynet

Add a role with pin_source_ip: true (requires Enterprise) to test IP pinning.
Testing will require changing your IP (that Teleport Proxy sees).
Docs: IP Pinning

Verify that it works for SSH Access
- You can access tunnel node with tsh ssh on root cluster
- You can access direct access node with tsh ssh on root cluster
- You can access tunnel node from Web UI on root cluster
- You can access direct access node from Web UI on root cluster
- You can access tunnel node with tsh ssh on leaf cluster
- You can access direct access node with tsh ssh on leaf cluster
- You can access tunnel node from Web UI on leaf cluster
- You can access direct access node from Web UI on leaf cluster
- You can download files from nodes in Web UI (small arrows at top left corner)
- If you change your IP you no longer can access nodes.
Verify that it works for Kube Access
- You can access Kubernetes cluster through standalone Kube service on root cluster
- You can access Kubernetes cluster through agent inside Kubernetes on root cluster
- You can access Kubernetes cluster through standalone Kube service on leaf cluster
- You can access Kubernetes cluster through agent inside Kubernetes on leaf cluster
- If you change your IP you no longer can access Kube clusters.
Verify that it works for DB Access
- You can access DB servers on root cluster
- You can access DB servers on leaf cluster
- If you change your IP you no longer can access DB servers.
Verify that it works for App Access
- You can access App service on root cluster
- You can access App service on leaf cluster
- If you change your IP you no longer can access App services.
Verify that it works for Desktop Access
- You can access Desktop service on root cluster
- If you change your IP you no longer can access Desktop services.
- You can access Desktop service on leaf cluster

Identity @smallinsky

Teleport SAML Identity Provider @flyinghermit

Verify SAML IdP service provider resource management.

Docs:

Verify SAML IdP guide instructions work.

Manage Service Provider (SP)

saml_idp_service_provider resource can be created, updated and deleted with tctl create/update/delete sp.yaml command.
- SP can be created with name and entity descriptor.
- SP can be created with name, entity_id, acs_url.
  - Verify Entity descriptor is generated.
- Verify attribute mapping configuration works.
- Verify test attribute mapping command. $ tctl idp saml test-attribute-mapping --users <usernames or name of file containing user spec> --sp <name of file containing user spec> --format <json/yaml/defaults to text>

SAML service provider catalog

GCP Workforce Identity Federation
- Verify guided flow works end-to-end, signing into GCP web console from Teleport resource page.
- Verify that when a SAML resource is created with preset value preset: gcp-workforce, Teleport adds
  relay state relay_state: https://console.cloud.google/ value in the resulting resource spec.

Resources

Quick GitHub/SAML/OIDC Setup Tips

greedy52 · 2024-10-31T18:01:25Z

TLS error when agent joins Proxy behind L7 load balancer #48238

eriktate · 2024-10-31T20:01:03Z

Migrating unmanaged users to Teleport fails #48243

Joerger · 2024-10-31T23:46:09Z

ssh -R fails for Teleport nodes #48254

Joerger · 2024-11-01T18:50:33Z

tsh config doesn't work for agentless nodes by hostname #48315

GavinFrazar · 2024-11-01T20:40:19Z

Oracle access fails through trusted cluster #48324

GavinFrazar · 2024-11-01T22:27:09Z

Improve error message when user does not have an MFA device registered #48328

bernardjkim · 2024-11-01T23:16:37Z

Modify Trusted Cluster role_map #48330

bernardjkim · 2024-11-01T23:16:43Z

~~- #48331~~

gabrielcorado · 2024-11-04T18:58:15Z

Application resource enrollment page doesn't render (WebUI) #48389

atburke · 2024-11-05T01:29:42Z

Default installer fails on a discovered Debian/Ubuntu GCP compute instance #48418

nklaassen · 2024-11-05T01:52:30Z

auth server using YubiHSM2 fails to start after v17 upgrade #48419

Tener · 2024-11-06T12:15:46Z

Database Access load test (PostgreSQL and MySQL)

Setup

Region: eu-central-1

EKS with a single node group:

Min: 2, Max: 10 instances.
Instance class: m5.4xlarge
Kubernetes version: 1.29

Teleport cluster (all deployed on the EKS cluster):

DynamoDB backend
3 Auth servers
3 Proxies instances
1 Database Agent

Databases:

Single PostgreSQL RDS instance on a db.t4g.xlarge instance class. Accessed through RDS Proxy with single RW endpoint.
Single MySQL RDS instance on a db.t4g.xlarge instance class. Accessed through RDS Proxy with single RW endpoint.

Note: Databases were configured using discovery running inside the database agent.

tsh bench commands were executed inside the cluster.

MySQL

10 connections/second (90th Percentile 70ms)

# tsh bench mysql mysql-proxy-rdsproxy --db-user=teleport --db-name=mysql --rate=10 --duration=30m

* Requests originated: 18000
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25         58 ms
50         61 ms
75         64 ms
90         70 ms
95         75 ms
99         90 ms
100        1854 ms

50 connections/second (90th Percentile 70ms)

# tsh bench mysql mysql-proxy-rdsproxy --db-user=teleport --db-name=mysql --rate=50 --duration=30m

* Requests originated: 89998
* Requests failed: 2637
* Last error: handleAuthResult: readAuthResult: ERROR 1105 (HY000): failed to connect to any of the database servers

Histogram

Percentile Response Duration
---------- -----------------
25         56 ms
50         58 ms
75         62 ms
90         70 ms
95         76 ms
99         103 ms
100        3087 ms

PostgreSQL

10 connections/second (90th Percentile 87ms)

# tsh bench postgres postgres-proxy-rdsproxy --db-user=teleport --db-name=postgres --rate=10 --duration=30m

* Requests originated: 18000
* Requests failed: 0

Histogram

Percentile Response Duration
---------- -----------------
25         70 ms
50         74 ms
75         80 ms
90         87 ms
95         93 ms
99         127 ms
100        3661 ms

50 connections/second (90th Percentile 90ms)

# tsh bench postgres postgres-proxy-rdsproxy --db-user=teleport --db-name=postgres --rate=50 --duration=30m

* Requests originated: 89997
* Requests failed: 2518
* Last error: failed to connect to `host=127.0.0.1 user=teleport database=postgres`: failed to receive message (unexpected EOF)

Histogram

Percentile Response Duration
---------- -----------------
25         71 ms
50         74 ms
75         81 ms
90         90 ms
95         100 ms
99         152 ms
100        3133 ms

greedy52 · 2024-11-06T14:43:03Z

SQL server with PKINIT fails with login error when DB server and client CAs are different #48517

(Doesn't seem a regression. Likely broken in the last few versions.)

gabrielcorado · 2024-11-06T17:02:08Z

Logging in to Azure CLI access not working (application access) #48522

espadolini · 2024-11-06T18:26:13Z

Performance: ansible-like load on dynamodb on cloud

Setup

Three proxies on tenant4x nodes in each of usw2, use1, euc1, two auths on tenant8x nodes in euc1, cluster configured for higher number of incoming connections, no SNS for audit logs.

Three EKS clusters in usw2, use1, euc1, each with 64 m5.8xlarge nodes running 20k agents each, joining with a static token.

Two m7a.48xlarge runners in usw2 and euc1 running tbot and ssh (the ansible-like load), connecting to all 60k nodes and running 4 commands every 360 seconds on average (see assets/loadtest/ansible-like), so about 1300 new sessions per second. Nodes were referenced by hostname but tbot was configured to use proxy templates and nodes were searched by predicate.

The agents were spun up and left alone for about 15 minutes, then the sessions were started and ran for about 30 minutes.

Problems

From a cold start (unused Cloud staging tenant) the dynamodb table in used took a while to internally scale, with throttling that lasted for 5 or 10 minutes; no problem after that.

The ansible-like setup isn't capable of handling new or dropped agents, and in a first attempt (with clusters running 40 nodes instead of 64) some went missing because the kubernetes node they were on just died; a new set of agents was then spun up, with GOMAXPROCS set to 2 and memory request and limit set to 350Mi, which fit with a bit of extra headroom on 64 nodes and resulted in no more errors.

Results

rosstimothy · 2024-11-06T20:16:09Z

Performance Test Results

etcd¹

30k Resources

500 Trusted Clusters

Postgres¹

30k Resources

Firestore¹

30k Resources

30k tests were performed using the simulated method described in the v14 Test Plan ↩ ↩² ↩³

ravicious · 2024-11-07T11:39:01Z

Auth server doesn't delete headless auth request when tsh --headless command is canceled #48569

hugoShaka · 2024-11-07T21:23:44Z

SSO MFA ceremony breaks tctl on auth-only clusters: #48633

greedy52 · 2024-11-08T21:09:20Z

tsh apps login prompt for relogin when multiple cloud identities are available #48715

hugoShaka · 2024-11-11T18:14:03Z

Teleport does not seem to honour the s3 endpoint parameter #48760

fspmarshall · 2024-11-13T20:28:52Z

Performance: ansible-like load on multi-region crdb cloud

Setup

Three proxies on tenant4x nodes in each of usw2, use1, euc1. Four auths on tenant8x nodes, two in usw2 and two in euc1.

Three EKS clusters in usw2, use1, euc1, each with 50 m5.8xlarge nodes running 20k agents each, joining with a static token.

Two m7a.48xlarge runners in usw2 and euc1 running tbot and ssh (the ansible-like load), connecting to all 60k nodes and running 4 commands every 360 seconds on average (see assets/loadtest/ansible-like), so about 1300 new sessions per second. Nodes were referenced by hostname but tbot was configured to use proxy templates and nodes were searched by predicate.

The agents were spun up and left alone for a few minutes. The first set of 60k sessions were started, cluster was left to stabilize for a bit, then the second set was started.

Problems

Initial attempt appeared to overload some element of cloud-staging networking stack, resulting in a large number of failed connections with no apparent teleport-originating cause. Subsequent attempts succeeded.

Results

hugoShaka · 2024-11-14T22:07:34Z

non-blocking: IBM docs are out of date

r0mant added the test-plan A list of tasks required to ship a successful product release. label Oct 28, 2024

greedy52 mentioned this issue Oct 28, 2024

update test plan for db access #48039

Merged

zmb3 closed this as completed Nov 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Teleport 17 Test Plan #48003

Teleport 17 Test Plan #48003

r0mant commented Oct 28, 2024 •

edited by EdwardDowling

Loading

r0mant commented Oct 28, 2024 •

edited by espadolini

Loading

greedy52 commented Oct 31, 2024 •

edited

Loading

eriktate commented Oct 31, 2024 •

edited by GavinFrazar

Loading

Joerger commented Oct 31, 2024

Joerger commented Nov 1, 2024

GavinFrazar commented Nov 1, 2024

GavinFrazar commented Nov 1, 2024

bernardjkim commented Nov 1, 2024

bernardjkim commented Nov 1, 2024 •

edited

Loading

gabrielcorado commented Nov 4, 2024

atburke commented Nov 5, 2024 •

edited

Loading

nklaassen commented Nov 5, 2024

Tener commented Nov 6, 2024 •

edited

Loading

greedy52 commented Nov 6, 2024

gabrielcorado commented Nov 6, 2024

espadolini commented Nov 6, 2024 •

edited

Loading

rosstimothy commented Nov 6, 2024

ravicious commented Nov 7, 2024

hugoShaka commented Nov 7, 2024

greedy52 commented Nov 8, 2024

hugoShaka commented Nov 11, 2024

fspmarshall commented Nov 13, 2024

hugoShaka commented Nov 14, 2024

Teleport 17 Test Plan #48003

Teleport 17 Test Plan #48003

Comments

r0mant commented Oct 28, 2024 • edited by EdwardDowling Loading

Manual Testing Plan

User accounting @atburke

Combinations @Joerger

Teleport with EKS/GKE @tigrato

Teleport with multiple Kubernetes clusters @tigrato

Kubernetes exec via WebSockets/SPDY @tigrato

Kubernetes auto-discovery @tigrato

Kubernetes Secret Storage @hugoShaka

Kubernetes Pod RBAC @tigrato

Teleport with FIPS mode @eriktate

ACME @timothyb89

Migrations @timothyb89

Command Templates

OpenSSH

Teleport

Teleport with SSO Providers

GitHub External SSO @greedy52

tctl sso family of commands @Tener

SSO login on remote host @atburke

Teleport Plugins @EdwardDowling @bernardjkim

Teleport Operator @hugoShaka

AWS Node Joining @hugoShaka

Kubernetes Node Joining @bernardjkim

Azure Node Joining @marcoandredinis

GCP Node Joining @marcoandredinis

Cloud Labels @marcoandredinis

Passwordless @codingllama

Device Trust @codingllama

Hardware Key Support @Joerger

Server Access

HSM Support @nklaassen

Moderated session @eriktate

Performance @rosstimothy @fspmarshall @espadolini

Scaling Test

Soak Test

Concurrent Session Test

Robustness

Teleport with Cloud Providers

AWS @hugoShaka

GCP @marcoandredinis

IBM @hugoShaka

Application Access @gabrielcorado

Database Access @greedy52

TLS Routing @greedy52

r0mant commented Oct 28, 2024 • edited by espadolini Loading

Desktop Access @probakowski

Binaries / OS compatibility

Windows @ravicious

macOS @camscale

Linux @camscale

Machine ID @strideynet @timothyb89

Host users creation @eriktate

CA rotations @espadolini

Proxy Peering

SSH Connection Resumption @fspmarshall

EC2 Discovery @marcoandredinis

Azure Discovery @marcoandredinis

GCP Discovery @atburke

IP Pinning @strideynet

Identity @smallinsky

Teleport SAML Identity Provider @flyinghermit

Docs:

Manage Service Provider (SP)

SAML service provider catalog

Resources

greedy52 commented Oct 31, 2024 • edited Loading

eriktate commented Oct 31, 2024 • edited by GavinFrazar Loading

Joerger commented Oct 31, 2024

Joerger commented Nov 1, 2024

GavinFrazar commented Nov 1, 2024

GavinFrazar commented Nov 1, 2024

bernardjkim commented Nov 1, 2024

bernardjkim commented Nov 1, 2024 • edited Loading

gabrielcorado commented Nov 4, 2024

atburke commented Nov 5, 2024 • edited Loading

nklaassen commented Nov 5, 2024

r0mant commented Oct 28, 2024 •

edited by EdwardDowling

Loading

`tctl sso` family of commands @Tener

r0mant commented Oct 28, 2024 •

edited by espadolini

Loading

greedy52 commented Oct 31, 2024 •

edited

Loading

eriktate commented Oct 31, 2024 •

edited by GavinFrazar

Loading

bernardjkim commented Nov 1, 2024 •

edited

Loading

atburke commented Nov 5, 2024 •

edited

Loading

Tener commented Nov 6, 2024 •

edited

Loading

espadolini commented Nov 6, 2024 •

edited

Loading

etcd¹

Postgres¹

Firestore¹