Talos Linux Flux boot and troubleshooting

After setting up the cluster I needed to boot it from the Flux repository using our GitOps setup. I ran into a few issues with Talos Linux not being healthy.

Flux config on new cluster

As we have our homelab configured for GitOps the setup should be fairly easy.

First check that flux prechecks pass and then bootstrap the cluster for flux. (making sure our GitHub creds are in environment variables)

flux check --pre

flux bootstrap github \
  --owner=$GITHUB_USER \
  --repository=homelab \
  --branch=master \
  --path=./clusters/staging \
  --personal

Setup flux secret

We have existing secrets that have been encrypted with sops, so we need to manually create the secret in order for those secrets to be decrypted successfully.

# Add the private key to cluster
cat age.agekey |
kubectl create secret generic sops-age \
--namespace=flux-system \
--from-file=age.agekey=/dev/stdin

When I attempted to reconcile the flux cluster there were issues, and the cluster was unhealthy.

health

It turned out that some Talos Linux components where not coming up on one of the machines. I used various commands to check logs, edit settings and finally drained the node and deleted from the cluster.

# status of control plane nodes
talosctl -n 192.168.1.53 containers --kubernetes  

talosctl -n 192.168.1.53 logs -k  kube-system/kube-scheduler-rosie:kube-scheduler:3bbfaf624942

# debugging startup
talosctl -n 192.168.1.53 logs controller-runtime


# edit time settings
talosctl edit machineconfig -n 192.168.1.53 --mode=reboot

# cordon and delete bad nodes
# cordon and drain
talosctl -n 192.168.1.53 reset 

# delete node
kubectl delete node rosie

The dashboard command lets you get access to the running nodes stats, pretty cool.

# dashboard for a node
talosctl -n 192.168.1.53 dashboard

dashboard

The cluster is still not quite right as I need to get a storage setup on my NAS, so will get that done soon!

https://github.com/m4ttbr1tt/homelab