Kubernetes Featured

Automating Talos Kubernetes Deployments on Proxmox - Part 1

Zachary Thill
· 10 min read
Send by email

Abstract

As you can see in the title above, this article is part 1 of our 3 part series. The premise of this series is to inform you on how you can deploy Talos kubernetes clusters onto Proxmox, install helm charts to them, and automate the entire process using DevOps tools like Terraform, Helmfile, and Gitlab CI/CD. This series will assume that you're already framiliar with these concepts, tools, and technologies.

Part 1 will discuss the terraform portion while part 2 will go over helmfile and standardizing application releases across environments. Finally, part 3 will go over the Gitlab CI/CD automation, connecting everything together into one cohesive code ecosystem.

All code referenced in this article can be found on my Gitlab here.

Talos Linux

Talos Linux is an imutable operating system designed to only run kubernetes in a minimal and secure manor. What makes Talos special and separates it from the crowd is the fact that we can configure our own custom cloud image from their image factory website, download it on our platform, and bootstrap virtual machines incredibly easily with their official terraform provider. Furthermore, Talos is incredibly secure because all communications with the nodes are done through API calls that are authenticated with mTLS. With that said, there is no need for extraneous packages and underlying services like openssh-server which reduces the overall attack surface of the system. Now that you're framiliar with Talos Linux and it's benefits, we can move onto the actual meat and potatoes of this article.

Terraform Module

I want to start out by mentioning that this isn't going to be your typical terraform module where we create a folder, write some terraform code, and then reference it with a module resource passing in some common input variables. Doing it that way would require us to create another folder, write up even more terraform code, and possibly even define some tfvars files with values related to our environments/workspaces (if your backend supports them). However, the method I'm about to show you enables us to configure and deploy a cluster with as little as a single yaml file, and standardize configurations based on their environment regardless of whether your terraform backend supports workspaces or not.

Folder Structure

.
├── README.md
├── configs
│   ├── dev
│   │   ├── talos-internal
│   │   │   └── values.yaml      # Cluster specific values
│   │   └── values.yaml          # Dev default values
│   └── production
│       ├── talos-internal
│       │   └── values.yaml      # Cluster specific values
│       └── values.yaml          # Production default values
├── kubernetes.sh                # Bash script used to help manage the kubernetes cluster lifecycle
└── terraform
    ├── backend.tf               # Configures Providers & HTTP Backend to Gitlab
    ├── data.tf                  # Terraform data resources
    ├── image
    │   └── schematic.yaml       # Talos linux image configuration
    ├── inputs.tf                # Terraform variables
    ├── main.tf                  # Terraform resources
    ├── outputs.tf               # Terraform output resources
    └── templates
        └── machine-config.tftpl # Talos machine configurations (controlplane & workers)

Variables & Values

In this terraform module we only have two input variables that need to be passed in order to deploy our kubernetes cluster:

variable "env" {
  type        = string
  description = "The name of the environment in which the kubernetes cluster will be deployed"
}

variable "cluster_name" {
  type        = string
  description = "The name of the kubernetes cluster being deployed"
}

When we go to execute our terraform code, we can pass these values either as environment variables or use the --var flag like so:

terraform init
terraform plan --var=cluster_name=talos-internal --var=env=dev

From the folder structure defined above, you can see that we have a directory called configs that is comprised of subfolders representing our environments, and more nested folders representing our kubernetes clusters that contain their configration values respectivly. In addition to this, every environment folder comes with a values.yaml file of it's own, defining the default configurations that can later be overriden, or rather, take the place of our cluster configs.

Here is an example of both:

Environment Defaults:

pve_cluster:
  name: cluster-1
  nodes:
    - pve-node-1

cluster:
  node_groups:
    master:
      count: 3
      cpu:
        cores: 4
        architecture: x86_64
      memory: 8196
      disk:
        interface: virtio0
        size: 250
      network_devices:
        nic0:
          bridge: vmbr0
          vlan_id: 0
          firewall: false
    worker:
      count: 3
      cpu:
        cores: 4
      memory: 8196
      disk:
        interface: virtio0
        size: 250
      network_devices:
        nic0:
          bridge: vmbr0
          vlan_id: 0
          firewall: false

Cluster Config:

# configs/dev/talos-internal/values.yaml
pve_cluster:
  name: cluster-1
  nodes:
    - pve-node-1

cluster:
  name: talos-internal-dev
  description: "K8s cluster meant for running private services"
  talos_version: v1.8.0
  kubernetes_version: v1.30.5
  node_groups:
    master:
      count: 3
      cpu:
        cores: 2
      memory: 8196
      disk:
        size: 250
      network_devices:
        nic0:
          bridge: vmbr0
          vlan_id: 0
    worker:
      count: 3
      cpu:
        cores: 4
      memory: 8196
      disk:
        size: 250
      network_devices:
        nic0:
          bridge: vmbr0
          vlan_id: 0

As you can see from the above two yaml files, each have some properties in common, some that match, and even some that aren't defined at all. This is all valid code because we utilize deep merging against these two files to return the finished cluster configuration. Nativly, terraform doesn't support deep merging of objects, but luckily some fellow engineer solved this problem for us by creating a custom terraform provider called Utils. Within the Utils provider, we have a resource called utils_deep_merge_yaml that solves this problem.

Moving back to terraform, it is important that we follow the given file structure because our terraform code contains the resource (data.utils_deep_merge_yaml.cluster_config) that is responsible for parsing our configuration files and merging their values together as stated above.

# terraform/data.tf
data "utils_deep_merge_yaml" "cluster_config" {
  input = [
    file("${path.root}/../configs/${var.env}/values.yaml"),
    file("${path.root}/../configs/${var.env}/${var.cluster_name}/values.yaml")
  ]
}

Once this resource is done processing our configurations, we can inject the result into a local variable, and use the yamldecode function to convert it into HCL format.

# terraform/main.tf
locals {
  config              = yamldecode(data.utils_deep_merge_yaml.cluster_config.output)
  talos_cluster_nodes = concat(proxmox_virtual_environment_vm.master, proxmox_virtual_environment_vm.worker)
}

After these configurations are in place, we can move onto the next step.

Generate Talos Linux Image

Before we can create any virtual machines, we need to have the cloud image present on our Proxmox cluster so we can provision it to the VMs programatically. To do this, we will first want to go to the Talos image factory website and go through the options and add any system extensions we'd like to have until it returns us the schematic configuration:

# terraform/image/schematic.yaml
customization:
  systemExtensions:
    officialExtensions:
      - siderolabs/i915-ucode
      - siderolabs/intel-ucode
      - siderolabs/qemu-guest-agent

After this, using the Talos and Proxmox terraform providers we will want to create the following resources:

# terraform/main.tf
resource "talos_image_factory_schematic" "main" {
  schematic = file("${path.root}/image/schematic.yaml")
}

resource "proxmox_virtual_environment_download_file" "main" {
  node_name               = element(local.config.pve_cluster.nodes, 0)
  content_type            = "iso"
  datastore_id            = "local"
  decompression_algorithm = "zst"
  overwrite               = false

  url       = data.talos_image_factory_urls.main.urls["disk_image"]
  file_name = "${local.config.cluster.name}-${local.config.cluster.talos_version}.img"
}
# terraform/data.tf
data "talos_image_factory_urls" "main" {
  talos_version = local.config.cluster.talos_version
  schematic_id  = talos_image_factory_schematic.main.id
  platform      = "nocloud"
  architecture  = "amd64"
}

These code blocks above will grab the schematic ID of our customized image and download it as a local image file on our Proxmox cluster.

Provision Proxmox Virtual Machines

Now that our Proxmox cluster has the Talos image present on it's filesystem, we can go ahead and write up the code to provision the virtual machines:

# terraform/main.tf
resource "proxmox_virtual_environment_pool" "main" {
  pool_id = "${local.config.cluster.name}-pool"
}

resource "proxmox_virtual_environment_vm" "master" {
  count = local.config.cluster.node_groups.master.count

  name          = "${local.config.cluster.name}-master-${count.index + 1}"
  description   = local.config.cluster.description
  node_name     = element(local.config.pve_cluster.nodes, count.index)
  pool_id       = "${local.config.cluster.name}-pool"
  scsi_hardware = "virtio-scsi-single"

  cpu {
    cores        = local.config.cluster.node_groups.master.cpu.cores
    architecture = try(local.config.cluster.node_groups.master.cpu.architecture, "x86_64")
    type         = "host"
  }

  memory {
    dedicated = local.config.cluster.node_groups.master.memory
  }

  disk {
    interface = local.config.cluster.node_groups.master.disk.interface
    size      = local.config.cluster.node_groups.master.disk.size
    file_id   = proxmox_virtual_environment_download_file.main.id
  }

  dynamic "network_device" {
    for_each = local.config.cluster.node_groups.master.network_devices
    content {
      enabled  = true
      bridge   = network_device.value.bridge
      vlan_id  = network_device.value.vlan_id
      firewall = network_device.value.firewall
    }
  }

  agent {
    enabled = true
  }

  initialization {
    ip_config {
      ipv4 {
        address = try(element(local.config.cluster.node_groups.master.ipv4_addresses, count.index), "dhcp")
        gateway = try(local.config.cluster.default_gateway, null)
      }
    }
  }

  depends_on = [
    proxmox_virtual_environment_pool.main,
    proxmox_virtual_environment_download_file.main
  ]
}

resource "proxmox_virtual_environment_vm" "worker" {
  count = local.config.cluster.node_groups.worker.count

  name          = "${local.config.cluster.name}-worker-${count.index + 1}"
  description   = local.config.cluster.description
  node_name     = element(local.config.pve_cluster.nodes, count.index)
  pool_id       = "${local.config.cluster.name}-pool"
  scsi_hardware = "virtio-scsi-single"

  cpu {
    cores        = local.config.cluster.node_groups.worker.cpu.cores
    architecture = try(local.config.cluster.node_groups.worker.cpu.architecture, "x86_64")
    type         = "host"
  }

  memory {
    dedicated = local.config.cluster.node_groups.worker.memory
  }

  disk {
    interface = local.config.cluster.node_groups.worker.disk.interface
    size      = local.config.cluster.node_groups.worker.disk.size
    file_id   = proxmox_virtual_environment_download_file.main.id
  }

  dynamic "network_device" {
    for_each = local.config.cluster.node_groups.worker.network_devices
    content {
      enabled  = true
      bridge   = network_device.value.bridge
      vlan_id  = network_device.value.vlan_id
      firewall = network_device.value.firewall
    }
  }

  agent {
    enabled = true
  }

  initialization {
    ip_config {
      ipv4 {
        address = try(element(local.config.cluster.node_groups.worker.ipv4_addresses, count.index), "dhcp")
        gateway = try(local.config.cluster.default_gateway, null)
      }
    }
  }

  depends_on = [
    proxmox_virtual_environment_pool.main,
    proxmox_virtual_environment_download_file.main
  ]
}

Bootstrap Talos Kubernetes Cluster

Now that we have the code for our virtual machines, we can create the resources needed for the Talos provider to bootstrap the cluster. Down below you will see that we are creating machine configurations for both our master (controlplane) and worker nodes. In Talos Linux, the machine configuration defines how an individual machine/node should be setup and configured. On the other hand, the client configuration, also known as the talosconfig, is a file that will be used by talosctl to connect and interact with the Talos nodes.

# terraform/data.tf
data "talos_machine_configuration" "master" {
  count            = local.config.cluster.node_groups.master.count
  cluster_name     = local.config.cluster.name
  machine_type     = "controlplane"
  cluster_endpoint = "https://${proxmox_virtual_environment_vm.master[0].ipv4_addresses[7][0]}:6443"
  machine_secrets  = talos_machine_secrets.main.machine_secrets
  config_patches = [templatefile("${path.root}/templates/machine-config.tftpl", {
    hostname     = "${local.config.cluster.name}-master-${count.index + 1}"
    cluster_name = local.config.pve_cluster.name
    node_name    = element(local.config.pve_cluster.nodes, count.index)
  })]
  talos_version      = local.config.cluster.talos_version
  kubernetes_version = local.config.cluster.kubernetes_version
}

data "talos_machine_configuration" "worker" {
  count            = local.config.cluster.node_groups.worker.count
  cluster_name     = local.config.cluster.name
  machine_type     = "worker"
  cluster_endpoint = "https://${proxmox_virtual_environment_vm.master[0].ipv4_addresses[7][0]}:6443"
  machine_secrets  = talos_machine_secrets.main.machine_secrets
  config_patches = [templatefile("${path.root}/templates/machine-config.tftpl", {
    hostname     = "${local.config.cluster.name}-worker-${count.index + 1}"
    cluster_name = local.config.pve_cluster.name
    node_name    = element(local.config.pve_cluster.nodes, count.index)
  })]
  talos_version      = local.config.cluster.talos_version
  kubernetes_version = local.config.cluster.kubernetes_version
}

data "talos_client_configuration" "main" {
  cluster_name         = local.config.cluster.name
  client_configuration = talos_machine_secrets.main.client_configuration
  nodes                = [for node in local.talos_cluster_nodes : node.ipv4_addresses[7][0]]
  endpoints            = [for node in proxmox_virtual_environment_vm.master : node.ipv4_addresses[7][0]]
}

Moving on, here we have the remaining resources responsible for instantiating the cluster. These resources will go ahead and apply the machine configuration to their respective nodes, bootstrap the kubernetes cluster, and grab the kubeconfig file so we can manage our cluster.

# terraform/main.tf
resource "talos_machine_secrets" "main" {
  talos_version = local.config.cluster.talos_version
}

resource "talos_machine_configuration_apply" "master" {
  count                       = local.config.cluster.node_groups.master.count
  client_configuration        = talos_machine_secrets.main.client_configuration
  machine_configuration_input = data.talos_machine_configuration.master[count.index].machine_configuration
  node                        = proxmox_virtual_environment_vm.master[count.index].ipv4_addresses[7][0]

  depends_on = [proxmox_virtual_environment_vm.master]
}

resource "talos_machine_configuration_apply" "worker" {
  count                       = local.config.cluster.node_groups.worker.count
  client_configuration        = talos_machine_secrets.main.client_configuration
  machine_configuration_input = data.talos_machine_configuration.worker[count.index].machine_configuration
  node                        = proxmox_virtual_environment_vm.worker[count.index].ipv4_addresses[7][0]

  depends_on = [proxmox_virtual_environment_vm.worker]
}

resource "talos_machine_bootstrap" "main" {
  client_configuration = talos_machine_secrets.main.client_configuration
  node                 = proxmox_virtual_environment_vm.master[0].ipv4_addresses[7][0]
  endpoint             = proxmox_virtual_environment_vm.master[0].ipv4_addresses[7][0]
}

resource "talos_cluster_kubeconfig" "main" {
  client_configuration = talos_machine_secrets.main.client_configuration
  node                 = proxmox_virtual_environment_vm.master[0].ipv4_addresses[7][0]
  endpoint             = proxmox_virtual_environment_vm.master[0].ipv4_addresses[7][0]
}

resource "local_file" "talosconfig" {
  filename        = "${path.root}/${local.config.cluster.name}-talosconfig"
  content         = data.talos_client_configuration.main.talos_config
  file_permission = "0600"

  depends_on = [data.talos_client_configuration.main]
}

resource "local_file" "kubeconfig" {
  filename        = "${path.root}/${local.config.cluster.name}-kubeconfig"
  content         = resource.talos_cluster_kubeconfig.main.kubeconfig_raw
  file_permission = "0600"

  depends_on = [resource.talos_cluster_kubeconfig.main]
}

Configure Terraform Backend

Now that all terraform resources have been accounted for, we need to configure our terraform backend so that we can actually run terraform and store our state file. Looking at the configuration below, you can see that I'm defining all the required providers, initializing the Proxmox provider, and using a http backend. More specifically, I am using the Gitlab managed state, which requires you to use the terraform http backend if you read their docs I've provided.

The reason you don't see any parameters being defined under the backend and the proxmox provider is because I'm passing them all as environment variables per the terraform and provider documentation. This way, I don't have any sensitive information being exposed to prying eyes. For more information on how to configure your backend just like this, go to the links I've provided above.

terraform {
  required_version = "1.7.5"
  backend "http" {
  }
  required_providers {
    proxmox = {
      source  = "bpg/proxmox"
      version = "0.66.2"
    }
    talos = {
      source  = "siderolabs/talos"
      version = "0.6.1"
    }
    utils = {
      source  = "cloudposse/utils"
      version = "1.26.0"
    }
    local = {
      source = "hashicorp/local"
    }
  }
}

provider "proxmox" {}

Terraform Outputs - Optional

Now that all the required terraform code has been written, we can add some optional outputs to our code. A lot of times, terraform outputs can be useful for either laying your modules, calling remote states, or simply documenting a configuration. Here are the outputs I added:

# terraform/outputs.tf
output "cluster_config" {
  value = local.config
}

output "master_machine_config" {
  value     = data.talos_machine_configuration.master
  sensitive = true
}

output "worker_machine_config" {
  value     = data.talos_machine_configuration.worker
  sensitive = true
}

output "client_configuration" {
  value     = data.talos_client_configuration.main
  sensitive = true
}

output "talosconfig" {
  value     = data.talos_client_configuration.main.talos_config
  sensitive = true
}

output "kubeconfig" {
  value     = talos_cluster_kubeconfig.main.kubeconfig_raw
  sensitive = true
}

Management Shell Script

If you didn't notice in the folder strucutre snippet towards the top of this page, there is a shell script named kubernetes.sh at to root level of our kubernetes directory. I crerated this script so that it would be easier to manage the lifecycle of our kubernetes clusters since this isn't your typical deployment style. Down below you can see a POSIX compliant shell script that contains a variety of different functions:

#!/bin/sh

set +e

red='\033[0;31m'
green='\033[0;32m'
blue='\033[0;34m'
clear='\033[0m'

######################################
# Script Requirnments
######################################
# Environment Variables - This bash script requires the follwing environment variables to be passed at script invocation:
# --- TF Backend ---
# TF_HTTP_ADDRESS
# TF_HTTP_LOCK_ADDRESS
# TF_HTTP_UNLOCK_ADDRESS
# TF_HTTP_LOCK_METHOD
# TF_HTTP_UNLOCK_METHOD
# TF_HTTP_USERNAME
# TF_HTTP_PASSWORD
# --- TF Provider ---
# PROXMOX_VE_ENDPOINT
# PROXMOX_VE_INSECURE
# PROXMOX_VE_USERNAME
# PROXMOX_VE_PASSWORD
# Arguments - This bash script requires the follwing arguments to be passed at script invocation:
#  - OPERATION
#  - CLUSTER_NAME
#  - ENV

OPERATION=$1
CLUSTER_NAME=$2
ENV=$3

init() {
  printf "${green}####################################${clear}\n"
  printf "${blue}Initializing Terraform ( ${CLUSTER_NAME} | ${ENV} )...${clear}\n"
  cd ./terraform
  terraform init
  printf "${green}####################################${clear}\n"
}

plan() {
  init
  printf "${green}####################################${clear}\n"
  printf "${blue}Executing Terraform Plan ( ${CLUSTER_NAME} | ${ENV} )...${clear}\n"
  terraform plan --var=cluster_name=${CLUSTER_NAME} --var=env=${ENV} -detailed-exitcode 2>&1 | tee .results
  terraform_code="${PIPESTATUS[0]}"
  printf "Terraform plan exited with code: ${terraform_code}\n"
  exit ${terraform_code}
  printf "${green}####################################${clear}\n"
}

apply() {
  init
  printf "${green}####################################${clear}\n"
  printf "${blue}Executing Terraform Apply ( ${CLUSTER_NAME} | ${ENV} )...${clear}\n"
  terraform apply --var=cluster_name=${CLUSTER_NAME} --var=env=${ENV} --auto-approve 2>&1 | tee .results
  terraform_code="${PIPESTATUS[0]}"
  exit "${terraform_code}"
  printf "${green}####################################${clear}\n"
}

destroy() {
  init
  printf "${green}####################################${clear}\n"
  printf "${blue}Executing Terraform Destroy ( ${CLUSTER_NAME} | ${ENV} )...${clear}\n"
  terraform destroy --var=cluster_name=${CLUSTER_NAME} --var=env=${ENV}
  printf "${green}####################################${clear}\n"
}

if [ "${OPERATION}" = "plan" ] || [ "${OPERATION}" = "apply" ] || [ "${OPERATION}" = "destroy" ]; then
  eval ${OPERATION}
else
  printf "${red}ERROR: invalid operation${clear} - (${OPERATION})\n"
  printf "${green}Valid options: (plan,apply,destroy,helm-apply,helm-plan) ${clear}\n"
fi

To use this script, make sure that all your required environment variables are set for the backend and run the following command:

. ./kubernetes.sh <operation> <cluster-name> <env>

Final Note

If you made it all the way through this article, I cannot thank you enough. I know we're only a third of the way through this series but up until now this has been quite the arduous journey to say the least. I really hope that this information was insightful and will be useful for your future endevours. Don't hesitate to leave a comment on this post, I'd love to hear your feedback. If you liked this article and want to see more, subscribe to our newsletter and stay tuned for parts 2 and 3.