Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: High availability explanation page #940

Merged
merged 3 commits into from
Jan 10, 2025
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions docs/src/snap/explanation/high-availability.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# High Availability
bschimke95 marked this conversation as resolved.
Show resolved Hide resolved

High availability (HA) is a core feature of {{ product }}, ensuring that
a Kubernetes cluster remains operational and resilient, even when nodes or
critical components encounter failures. This capability is crucial for
maintaining continuous service for applications and workloads running in
production environments.

HA is automatically enabled in {{ product }} for clusters with three or
bschimke95 marked this conversation as resolved.
Show resolved Hide resolved
more nodes independent of the deployment method. By distributing key components
across multiple nodes, HA reduces the risk of downtime and service
interruptions, offering built-in redundancy and fault tolerance.

## Key Components of a Highly Available Cluster
bschimke95 marked this conversation as resolved.
Show resolved Hide resolved

A highly available Kubernetes cluster exhibits the following characteristics:

### 1. **Multiple Nodes for Redundancy**
bschimke95 marked this conversation as resolved.
Show resolved Hide resolved

Having multiple nodes in the cluster ensures workload distribution and
redundancy. If one node fails, workloads can be rescheduled on other available
nodes without disrupting services. This node-level redundancy minimizes the
bschimke95 marked this conversation as resolved.
Show resolved Hide resolved
impact of hardware or network failures.

### 2. **Control Plane Redundancy**
bschimke95 marked this conversation as resolved.
Show resolved Hide resolved

The control plane manages the cluster’s state and operations. For high
availability, the control plane components—such as the API server, scheduler,
and controller-manager—are distributed across multiple nodes. This prevents a
single point of failure from rendering the cluster inoperable.

### 3. **Highly Available Datastore**
bschimke95 marked this conversation as resolved.
Show resolved Hide resolved

By default, {{ product }} uses **dqlite** to manage the Kubernetes
bschimke95 marked this conversation as resolved.
Show resolved Hide resolved
cluster state. Dqlite leverages the Raft consensus algorithm for leader
election and voting, ensuring reliable data replication and failover
capabilities. When a leader node fails, a new leader is elected seamlessly
without administrative intervention. This mechanism allows the cluster to
remain operational even in the event of node failures. More details on
replication and leader elections can be found in
the [dqlite replication documentation][dqlite-replication].

<!-- LINKS -->
[dqlite-replication]: https://dqlite.io/docs/explanation/replication
1 change: 1 addition & 0 deletions docs/src/snap/explanation/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ channels
clustering
ingress
epa
high-availability
security
cis
```
Expand Down
Loading