- Infrastructure Charts
- cluster
cluster
Deploys cluster with Cluster API and Kubevirt, including addons.
Deploys a Kubernetes cluster with Cluster API and KubeVirt.
Includes the following addons:
- Calico
- KubeVirt Cloud Controller Manager for exposing LoadBalancer type services.
- Traefik
- OpenEBS
- Cert Manager
- External DNS
- External Secrets
- OpenTelemetry Collector
- Velero
Prerequisites
The chart requires the following addons to be installed on the host cluster:
- Argo CD
- KubeVirt
- Cluster API with Cluster API Provider KubeVirt
- Rook
- External Secrets
- CoreWarden DNS
- Load balancer implementation
- Kyverno with policies for generating Argo CD cluster secrets and External Secrets SecretStores
This chart is built to work specifically for the Perseus Cloud project, and unless you’re running a cluster of the project this chart will likely not work for you.
The chart requires a virtual machine image built from the
Kubernetes image builder project
available for download over HTTP(S).
Make sure to set config.image to the url of the virtual machine image.
Secrets containing Cloudflare token and CoreWarden API credentials on the host cluster are required.
Make sure to create the secrets, specify the namespace of the secrets in
config.externalSecrets.remoteNamespace, and authorize access to the secrets via config.rbac.
Tutorial
This tutorial will take you through creating a cluster on the Perseus Cloud platform.
The cluster chart needs a domain name for Gateway, Cert Manager and External DNS
functionality. This tutorial will use example.com.
Create a token for CloudFlare DNS with edit access to the DNZ zone of the domain
you will be using for the cluster.
Insert it into a secret in the tenant-secrets namespace:
apiVersion: v1
kind: Secret
metadata:
name: cert-manager
namespace: tenant-secrets
type: Opaque
data:
cloudflareToken: REDACTED
Create a CoreWarden DNS service account with edit access to the DNZ zone of the
domain you will be using for the cluster.
Insert it into a secret in the tenant-secrets namespace:
apiVersion: v1
kind: Secret
metadata:
name: cert-manager
namespace: tenant-secrets
type: Opaque
data:
id: REDACTED
secret: REDACTED
Create a values file with the following content. Replace exmaple.com with your
domain name.
# Inside values.yaml
config:
externalDNSWebhook:
zones:
- example.com.
gateway:
hostname: "*.example.com"
Install the chart:
helm upgrade --install cluster oci://ghcr.io/sneakybugs/cluster --version 6.0.2 --values values.yaml
Wait for all Argo Applications to become ready.
Get the Kubeconfig and use it:
kubectl get secret cluster-kubeconfig -o json | jq -r .data.value | base64 -d > tenant.conf
export KUBECONFIG="$(pwd)/tenant.conf"
Now you can access the tenant cluster API server:
kubectl get pods -A
You now have a fully functional cluster.
How-to guides
How to upgrade a cluster’s Kubernetes version
This guide assumes you have a cluster of version v1.34.2 installes with
my-cluster release name.
Edit the config.version and config.image fields of your values to contain a
new version and new VM image for that version:
# Inside values.yaml
# ...
config:
version: v1.35.0
image: https://objects.infra.f6n.io/images/rocky-10.1-k8s-1.35.0-5986a60-20260102.qcow2
Applying the new values will cause a rollout of both control plane and worker nodes. Upgrade the chart to apply the new values:
helm upgrade my-cluster oci://ghcr.io/sneakybugs/cluster --version 6.0.2 --values values.yaml
Use the following command on the tenant cluster to see how many nodes have been rolled to the new version:
kubectl get nodes
Once all nodes of the tenant cluster are of the new version, the cluster is upgraded.
How to restore a cluster from a backup
This guide assumes you want to restore from a cluster called old-cluster in the
example namespace.
Copy the values of the old-cluster Helm release to a file named values.yaml.
Override the following values:
# Inside values.yaml
# ...
config:
cephCSIRBD:
existingRadosNamespace: example-old-cluster
velero:
storage:
prefix: example-old-cluster
existingObjectBucketClaim: example-old-cluster-velero
existingObjectBucketUser: example-old-cluster-velero-backup
Install the chart with a new name. We will use new-cluster for this guide.
helm upgrade --install new-cluster oci://ghcr.io/sneakybugs/cluster --version 6.0.2 --values values.yaml
Wait for all Argo applications to become synced and healthy. Get the new cluster’s Kubeconfig:
kubectl get secret new-cluster-kubeconfig -o json | jq -r .data.value | base64 -d > tenant.conf
export KUBECONFIG="$(pwd)/tenant.conf"
Create a restore.yaml file with the following content.
For the purpose of this guide we will restore from a backup called my-backup.
# Inside restore.yaml
apiVersion: velero.io/v1
kind: Restore
metadata:
name: old-cluster-restore
spec:
backupName: my-backup
Create the restore resource:
kubectl apply -f restore.yaml
Wait until the restore is completed.
You now have a restored replica of old-cluster.
How to destroy a cluster
To destroy a cluster uninstall the chart:
helm uninstall my-cluster
Clusters with features.backups enabled the chart keeps a
CephBlockPoolRadosNamespace, a ObjectBucketClaim and a CephObjectStoreUser.
Use the following commands to identify resources that were kept:
kubectl get -n rook-ceph cephblockpoolradosnamespace
kubectl get -n rook-ceph objectbucketclaim
kubectl get -n rook-ceph cephobjectstoreuser
Only delete the CephBlockPoolRadosNamespace and ObjectBucketClaim IF YOU WANT TO LOSE BACKUP DATA.
Configuration reference
| Parameter | Description | Default |
|---|---|---|
nameOverride | Override chart name. | "" |
fullnameOverride | Override full release name. | "" |
argocdNamespace | Namespace to deploy Argo CD resources to. | ”argocd” |
versions.calico | Calico version to deploy. | ”v3.30.3” |
versions.certManager | Cert Manager version to deploy. | ”v1.19.1” |
versions.components | cluster-components chart version to deploy. | ”9.0.0” |
versions.telemetryExporterComponents | telemetry-exporter-components chart version to deploy. | ”4.0.0” |
versions.externalDNS | ExternalDNS version to deploy. | ”1.19.0” |
versions.externalSecrets | External Secrets version to deploy. | ”0.20.3” |
versions.envoyGateway | Envoy Gateway version to deploy. | ”1.5.4” |
versions.kubeStateMetrics | Kube State Metrics chart version to deploy. | ”6.3.0” |
versions.kubePrometheusStack | kube-prometheus-stack chart version to deploy. | ”78.2.1” |
versions.prometheusNodeExporter | Node exporter chart version to deploy. | ”4.48.0” |
versions.prometheusOperatorCRDs | Prometheus Operator CRDs chart version to deploy. | ”24.0.1” |
versions.openTelemetryOperator | OpenTelemetry Operator chart version to deploy. | ”0.97.1” |
versions.velero | Velero chart version to deploy. | ”11.1.1” |
versions.veleroPluginForAWS | Velero AWS plugin version to deploy. | ”1.12.2” |
versions.kro | Kro version to deploy. | ”0.7.1” |
versions.cephCSIRBD | ceph-csi-rbd version to deploy. | ”3.15.0” |
versions.metricsServer | Metrics server version to deploy. | ”3.13.0” |
versions.externalSnapshotter | external-snapshotter version to deploy. | ”v8.4.0” |
features.backups | Enable Velero backups when true. When enabled CephBlockPoolRadosNamespace, ObjectBucketClaim and CephObjectStoreUser are kept when deleting the chart for use when restoring backups. | true |
features.telemetryExporter | Enable OpenTelemetry exporter when true. | true |
features.kro | Enable Kro when true. | true |
features.components | Disable cluster-components chart when false. | true |
features.exporterComponents | Disable telemetry-exporter-components chart when false. | true |
config.podSubnet | Pod subnet to use. | ”10.243.0.0/16” |
config.serviceSubnet | Service subnet to use. | ”10.95.0.0/16” |
config.version | Kubernetes version of the cluster. Triggers control plane rollout. | ”v1.34.2” |
config.image | Node image to use. Triggers node rollout when changed. | ”https://objects.infra.f6n.io/images/rocky-10.1-k8s-1.34.2-20251221.qcow2” |
config.controlPlane.replicas | Control plane node count. | 1 |
config.controlPlane.resources.storage | Control plane node disk size. Triggers node rollout when changed. | ”16Gi” |
config.controlPlane.resources.cores | Control plane node core count. Triggers node rollout when changed. | 2 |
config.controlPlane.resources.memory | Control plane node RAM size. Triggers node rollout when changed. | ”4Gi” |
config.workers.replicas | Worker node count. | 1 |
config.workers.resources.storage | Worker node disk size. Triggers node rollout when changed. | ”32Gi” |
config.workers.resources.cores | Worker node core count. Triggers node rollout when changed. | 4 |
config.workers.resources.memory | Worker node RAM size. Triggers node rollout when changed. | ”8Gi” |
config.cephCSIRBD.existingRadosNamespace | When specified the cluster will use an existing Rados namespace instead of creating one. | null |
config.cephCSIRBD.rookNamespace | Namespace Rook is installed in on the management cluster. | ”rook-ceph” |
config.cephCSIRBD.blockPoolName | CephBlockPool name to use for volumes. | ”ceph-blockpool” |
config.cephCSIRBD.cephMonitors | Ceph mon endpoints for connecting to the Ceph cluster. | [“10.1.0.10:6789”] |
config.imageRegistries | Image registy overrides. Require node rollout to take effect. | [{“prefix”: “docker.io”, “location”: “oci.infra.f6n.io/docker”}, {“prefix”: “quay.io”, “location”: “oci.infra.f6n.io/quay”}, {“prefix”: “ghcr.io”, “location”: “oci.infra.f6n.io/ghcr”}, {“prefix”: “registry.k8s.io”, “location”: “oci.infra.f6n.io/k8s”}, {“prefix”: “oci.external-secrets.io”, “location”: “oci.infra.f6n.io/external-secrets”}] |
config.gateway.listeners | List of Gateway listeners. Must specify hostname, can optionally specify protocol, port and tls. | [{“hostname”: “*.example.com”}] |
config.otlpExporter.endpoint | Centralized OpenTelemetry Collector endpoint to export telemetry to. | ”otel.ops.f6n.io:4317” |
config.externalSecrets.remoteNamespace | ”tenant-secrets” | |
config.externalSecrets.url | ”https://10.1.0.10:6443” | |
config.externalDNSWebhook.repository | ”ghcr.io/sneakybugs/corewarden-externaldns-provider” | |
config.externalDNSWebhook.apiEndpoint | DNS API server endpoint. | ”https://dns.infra.f6n.io/v1” |
config.externalDNSWebhook.tag | ”4.1.3” | |
config.certManager | Values for cluster-components chart certManager field. | {} |
config.velero.storage.rookNamespace | Rook namespace to use for backup ObjectBucketClaim in the management cluster. | ”rook-ceph” |
config.velero.storage.cephObjectStoreName | Rook CephObjectStore to use for the bucket in the management cluster. | ”ceph-objectstore” |
config.velero.storage.cephObjectBucketStorageClassName | Rook StorageClass name to use for the bucket in the management cluster. | ”ceph-bucket” |
config.velero.storage.s3Url | S3 endpoint URL for backup storage. | ”https://objects.infra.f6n.io” |
config.velero.storage.prefix | Prefix to store backups in the S3 bucket, defaults to | "" |
config.velero.storage.accessMode | Backup location access mode, ReadWrite or ReadOnly. | ”ReadWrite” |
config.velero.storage.existingObjectBucketClaim | null | |
config.velero.storage.existingObjectBucketUser | null | |
config.velero.backup.schedule | Cron schedule to perform backups at. | ”0 4 * * *“ |
config.velero.backup.ttl | Time to keep backups. | ”720h” |
config.rbac.namespace | Namespace for role in the management cluster. | ”tenant-secrets” |
config.rbac.rules | Role rules for the namespace in the management cluster. | [{“apiGroups”: [""], “resources”: [“secrets”], “verbs”: [“get”, “list”, “watch”], “resourceNames”: [“cert-manager”, “external-dns”, “backup”]}, {“apiGroups”: [“authorization.k8s.io”], “resources”: [“selfsubjectrulesreviews”], “verbs”: [“create”]}] |