Lab 4: Node Lifecycle and Upgrade Planning
Time: 60 minutes
Objective: Practice safe node maintenance (cordon, drain, uncordon) and build a kubeadm upgrade plan with version-skew checks.
The Story
Routine node maintenance causes outages when teams skip sequencing and treat drain errors like annoyances. This lab trains disciplined operations: respect disruption budgets, keep scheduling intent explicit, and plan upgrades around version-skew rules before touching production nodes.
CKA Objectives Mapped
- Manage node scheduling and maintenance operations
- Plan and execute cluster lifecycle/upgrade tasks safely
- Troubleshoot drain failures and disruption constraints
Background: Node Lifecycle Operations
Node maintenance in Kubernetes is three separate operations with different effects. cordon marks a node SchedulingDisabled, which stops new placements but leaves existing pods running. drain actively evicts pods and then leaves the node cordoned, which is what makes it safe for maintenance. uncordon removes the scheduling block so the node can receive new pods again.
kubectl drain is not a blind delete; it uses the Eviction API, which checks PodDisruptionBudgets before allowing each pod to move. A PDB encodes availability policy such as minimum healthy replicas, so eviction is denied if moving a pod would violate that threshold. This is the protection that keeps voluntary disruptions like maintenance from becoming accidental outages.
For upgrades, draining first keeps workload continuity while node software changes happen. The standard pattern is cordon and drain a worker, upgrade node packages, run kubeadm upgrade node, then uncordon. That sequence keeps traffic on healthy nodes while one node is temporarily out of service.
Version skew rules are strict and drive upgrade order. Kubelet can be at most one minor version behind the API server, the control plane upgrades first, and workers follow. Minor versions cannot be skipped in supported workflows, so a jump like 1.29 to 1.31 must be two sequential upgrades through 1.30.
At a high level, kubeadm upgrades follow this rhythm: upgrade the kubeadm binary on the control-plane node, run kubeadm upgrade apply on control plane, then handle workers one at a time with drain, package upgrades, kubeadm upgrade node, and uncordon. If one sentence sticks, make it this: no new scheduling during maintenance, no evictions that violate PDB, and no version-skew shortcuts. For deeper detail, use the official upgrade guide: https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/.
Prerequisites
This lab uses a dedicated multi-node kind cluster:
cat <<'EOF' > /tmp/kind-ops.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: ops
nodes:
- role: control-plane
- role: worker
- role: worker
EOF
kind create cluster --name ops --config /tmp/kind-ops.yaml
kubectl config use-context kind-ops
kubectl get nodes -o wideStarter assets for this lab are in starter/:
kind-ops-cluster.yamlworkload-and-pdb.yamlmaintenance.shupgrade-checklist.mdupgrade-runbook-template.mdoutput-formatting-drills.md
Part 1: Deploy a Drain-Sensitive Workload
Create a deployment and strict PDB:
kubectl create namespace ops-lab
cat <<'EOF' | kubectl -n ops-lab apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: web
spec:
replicas: 2
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
containers:
- name: web
image: nginx:1.27
ports:
- containerPort: 80
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: web-pdb
spec:
minAvailable: 2
selector:
matchLabels:
app: web
EOFWait for readiness:
kubectl -n ops-lab get pods -o wide
kubectl -n ops-lab get pdbPart 2: Cordon and Drain
Pick a worker node:
WORKER="$(kubectl get nodes -o name | grep worker | head -1 | cut -d/ -f2)"
echo "$WORKER"Cordon it:
kubectl cordon "$WORKER"
kubectl get nodesAttempt drain:
kubectl drain "$WORKER" --ignore-daemonsets --delete-emptydir-data --timeout=60sExpected: drain fails because PDB minAvailable: 2 blocks eviction.
Capture evidence:
kubectl -n ops-lab get events --sort-by=.metadata.creationTimestamp | tail -20
kubectl -n ops-lab describe pdb web-pdbPart 3: Resolve the Block Safely
Allow one disruption:
kubectl -n ops-lab patch pdb web-pdb --type merge -p '{"spec":{"minAvailable":1}}'
kubectl drain "$WORKER" --ignore-daemonsets --delete-emptydir-data --timeout=120sValidate pods were rescheduled:
kubectl -n ops-lab get pods -o wideBring node back:
kubectl uncordon "$WORKER"
kubectl get nodesPart 4: Upgrade Planning and Version Skew Checks
Capture current versions:
kubectl version
kubectl get nodes -o custom-columns=NAME:.metadata.name,KUBELET:.status.nodeInfo.kubeletVersionCollect kubeadm upgrade signals from the control-plane container:
docker exec ops-control-plane kubeadm version
docker exec ops-control-plane kubeadm upgrade plan || trueBuild your plan document with:
- Target Kubernetes version
- Allowed version skew (control plane vs kubelet)
- Upgrade order (control plane first, workers next)
- Rollback and drain strategy per node
Part 5: Incident Triage Pattern
When maintenance fails, always run:
kubectl describe node "$WORKER"
kubectl -n ops-lab get pdb
kubectl -n ops-lab get events --sort-by=.metadata.creationTimestamp | tail -30Root causes to check first:
- PDB restrictions
- local storage blocking eviction
- DaemonSets ignored incorrectly
- insufficient capacity on remaining nodes
Part 6: Kubeadm Upgrade Simulation (kind)
This is a command-and-decision simulation of the kubeadm upgrade workflow. kind does not support real host package upgrades (apt-get + systemctl restart kubelet), but it does let you practice the control-plane planning flow and worker maintenance sequence.
Check current versions across the cluster:
kubectl get nodes -o custom-columns=NAME:.metadata.name,VERSION:.status.nodeInfo.kubeletVersion
docker exec ops-control-plane kubeadm version
docker exec ops-control-plane kubectl version --short 2>/dev/null || trueRun the control-plane upgrade plan:
docker exec ops-control-plane kubeadm upgrade planCapture the output and read it closely. Record available target versions and component version changes.
Document the correct upgrade order:
- Control plane node first:
kubeadm upgrade apply v1.XX.Y - Then each worker: drain -> upgrade kubelet/kubectl packages ->
kubeadm upgrade node-> uncordon - Version skew rule: kubelet may be up to one minor version behind the API server
Practice the worker maintenance sequence:
WORKER=$(kubectl get nodes -o name | grep worker | head -1 | cut -d/ -f2)
kubectl drain $WORKER --ignore-daemonsets --delete-emptydir-data
# In a real upgrade: apt-get update && apt-get install kubelet=X.Y.Z kubectl=X.Y.Z
# Then: systemctl daemon-reload && systemctl restart kubelet
kubectl uncordon $WORKERFill the runbook template with real data from your cluster:
cp starter/upgrade-runbook-template.md ./upgrade-runbook.mdFor full kubeadm upgrade execution, use the KubeVirt VM lab (when available) or the VirtualBox/Vagrant kubeadm scenario. Those environments allow real
apt-get installandsystemctl restart kubeletoperations.
Bonus: Output Formatting Speed Drills
CKA questions often require extracting specific fields with jsonpath or custom-columns.
Work through starter/output-formatting-drills.md after completing Part 1 — you only need a running cluster with workloads. Target: 2-3 minutes per drill.
Part 7: Graded JsonPath Output Check
Complete the short graded check using only -o jsonpath and/or --sort-by queries.
cp starter/jsonpath-check.txt ./jsonpath-check.txt
# Fill each VALUE using kubectl output-formatting commands
bash starter/validate-jsonpath-check.sh ./jsonpath-check.txtPassing target: 4/4 correct values.
Map this checkpoint to reinforcement scenario: 35-jerry-missing-jsonpath.
Verification Checklist
You are done when:
- You reproduced a drain failure and proved the PDB cause
- You adjusted policy and drained safely
- You uncordoned and restored normal scheduling
- You produced a version-skew-aware upgrade checklist
- You can explain the correct kubeadm upgrade order (control plane first, workers second)
- You executed drain/uncordon on at least one worker during simulated maintenance
- You filled in the upgrade runbook template with real version data from your cluster
- You passed the graded jsonpath/output-formatting check (
>=4/4)
Cleanup
kubectl config use-context kind-lab || true
kind delete cluster --name ops
rm -f /tmp/kind-ops.yamlReinforcement Scenarios
jerry-pod-unschedulable-taintjerry-node-drain-pdb-blocked35-jerry-missing-jsonpath36-jerry-upgrade-skipped

