Lab 7: Resource Observation with kubectl top
Time: 30–40 minutes
Objective: Enable metrics-server in kind, run a zoo of workloads with distinct resource profiles, and use kubectl top as a primary diagnostic tool.
The Story
When clusters feel slow, teams often guess and restart pods without evidence. This lab builds a faster habit: use kubectl top first, identify actual CPU or memory pressure, and separate scheduler reservations from live consumption before taking action.
CKA Objectives Mapped
- Use
kubectl topto monitor node and pod resource consumption - Distinguish CPU and memory units in metrics output
- Differentiate resource requests (scheduling reservations) from actual usage
- Identify resource hogs using sort flags and label selectors
Background
kubectl top is the built-in cluster thermometer. It reads live resource metrics from metrics-server — a lightweight aggregator that scrapes usage stats from each kubelet's /stats/summary endpoint every 60 seconds. Without it, kubectl top returns:
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
Two things you must be clear on before reading top output:
CPU is expressed in millicores (m). 1000m = 1 core. A pod showing 450m is using nearly half a CPU core. The % column on nodes shows usage relative to allocatable CPU, not raw capacity.
Memory is expressed in mebibytes (Mi) or gibibytes (Gi). Unlike CPU, memory is not compressible — a pod that uses 300Mi holds that memory until it exits. A node that has 2Gi total with 1.8Gi in use has very little headroom.
Requests vs actual: kubectl top shows what a pod is actually consuming right now. kubectl describe pod shows its requests — what the scheduler reserved for it. A pod can be far under or over its request; limits throttle or OOM-kill it if it exceeds them. In a healthy cluster, actual usage sits somewhere below the limit.
Prerequisites
This lab uses a dedicated two-worker kind cluster:
cat <<'EOF' > /tmp/kind-resource.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: resource-lab
nodes:
- role: control-plane
- role: worker
- role: worker
EOF
kind create cluster --name resource-lab --config /tmp/kind-resource.yaml
kubectl config use-context kind-resource-lab
kubectl get nodesYou should see three nodes (resource-lab-control-plane, resource-lab-worker, resource-lab-worker2) all in Ready state.
Starter assets for this lab are in starter/:
kind-resource-cluster.yamlworkload-zoo.yamlchallenge-validate.sh
Part 1: Install metrics-server
metrics-server ships with TLS verification enabled by default, which requires real certificates. kind uses self-signed certs, so we need to disable that check.
# Install upstream metrics-server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
# Patch for kind: add --kubelet-insecure-tls to skip cert validation
kubectl -n kube-system patch deployment metrics-server --type=json \
-p='[{"op":"add","path":"/spec/template/spec/containers/0/args/-","value":"--kubelet-insecure-tls"}]'
# Wait for rollout
kubectl -n kube-system rollout status deployment/metrics-server --timeout=90sNow wait for metrics to populate (metrics-server collects its first scrape ~15–30 seconds after pods are running, but the API may take up to 60 seconds before it starts returning data):
# Poll until this works — it will fail briefly then succeed
kubectl top nodesExpected output (values will differ):
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
resource-lab-control-plane 98m 2% 610Mi 32%
resource-lab-worker 35m 0% 290Mi 15%
resource-lab-worker2 34m 0% 288Mi 15%
If you see Error from server (ServiceUnavailable), wait 30 seconds and retry — metrics-server needs time to collect its first round of scrapes.
Part 2: Read a Quiet Cluster
Before deploying any workloads, take a baseline reading.
# Node-level view
kubectl top nodes
# Pod-level view across the whole cluster
kubectl top pods --all-namespacesLook at the kube-system pods. Note which system pods are using the most CPU and memory. You'll see etcd, kube-apiserver, kube-controller-manager, and now metrics-server itself.
# Sort system pods by memory
kubectl top pods -n kube-system --sort-by=memoryThis is your idle baseline. Keep this window open as a reference.
Part 3: Deploy the Workload Zoo
Deploy four workloads with distinct resource profiles into the resource-lab namespace:
| Workload | Profile | Expected top output |
|---|---|---|
cpu-burner (×2 replicas) |
Maxes CPU limit | ~450–500m CPU per pod |
memory-hog (×1) |
Holds ~250Mi RAM, near-zero CPU | ~5m CPU, ~270Mi memory |
idle-worker (×3 nginx) |
Nearly idle | ~1–2m CPU, ~4–8Mi memory |
batch-processor (Job, ×4 tasks, 2 parallel) |
Moderate CPU, completes | ~150–250m CPU while running |
kubectl apply -f starter/workload-zoo.yamlWatch everything come up:
kubectl -n resource-lab get pods -wThe batch-processor Job runs moderate CPU work and completes in ~60 seconds. The other three Deployments run indefinitely. Wait until all Deployment pods are Running before moving to Part 4.
Part 4: kubectl top pods — Core Commands
Give metrics-server a minute to collect from the new pods, then work through these commands:
Basic view:
kubectl top pods -n resource-labSort by CPU (descending):
kubectl top pods -n resource-lab --sort-by=cpuThe two cpu-burner pods should be at the top, each near 500m. This is the most common triage command when a node shows high CPU.
Sort by memory:
kubectl top pods -n resource-lab --sort-by=memorymemory-hog should be at the top, far above the idle nginx pods despite having very low CPU.
Filter by label:
# Only the compute tier
kubectl top pods -n resource-lab -l tier=compute
# Only the web tier
kubectl top pods -n resource-lab -l tier=webLabel selectors let you scope top to a single service stack in a busy cluster.
Cross-namespace view:
kubectl top pods --all-namespaces --sort-by=cpuNow you can see your cpu-burner pods competing with system pods for the top spots.
Node view with context:
kubectl top nodesCompare this to the baseline from Part 2. You should see CPU% on the workers climbing due to the cpu-burner Deployment.
Part 5: Requests vs Actual
Resource requests are what the scheduler uses to decide placement. Actual usage is what kubectl top shows. They are often different — understanding the gap is how you right-size workloads.
Pick one of the cpu-burner pods:
CPU_POD=$(kubectl get pods -n resource-lab -l app=cpu-burner -o name | head -1 | cut -d/ -f2)
echo "$CPU_POD"Check its requests:
kubectl -n resource-lab describe pod "$CPU_POD" | grep -A4 "Requests:"Output:
Requests:
cpu: 250m
memory: 64Mi
Check its actual usage:
kubectl top pod -n resource-lab "$CPU_POD"The actual CPU will be close to the limit (500m), which is double the request. This is how a workload can legitimately consume more than it requested — requests are a minimum reservation, not a ceiling (that's what limits are for).
Now do the same for memory-hog:
MEM_POD=$(kubectl get pods -n resource-lab -l app=memory-hog -o name | head -1 | cut -d/ -f2)
kubectl -n resource-lab describe pod "$MEM_POD" | grep -A4 "Requests:"
kubectl top pod -n resource-lab "$MEM_POD"The memory request is 300Mi but actual usage is closer to 270Mi — the request over-provisioned slightly. CPU actual usage is nearly 0m despite a 50m request, because the workload genuinely does nothing but sleep.
Key insight: A cluster that appears "full" based on requests may have plenty of real headroom. A cluster that looks lightly loaded can be hitting CPU saturation. kubectl top tells you what is actually happening.
Part 6: Observe the Batch Job Lifecycle
If the batch-processor Job is still running, watch its pods in top:
kubectl top pods -n resource-lab --sort-by=cpuBatch pods will show moderate CPU (~150–250m each). Once the Job completes, those pods move to Completed status and disappear from kubectl top output:
kubectl -n resource-lab get pods -l app=batch-processor
kubectl top pods -n resource-labCompleted pods hold no CPU or memory — resources are returned to the node immediately. This is the operational difference between a short Job and a long-running Deployment from a resource perspective.
Check Job status:
kubectl -n resource-lab describe job batch-processor | grep -E "Completions|Succeeded|Failed"Part 7: Graded Challenge
Answer these four questions using only kubectl top and kubectl describe. Do not look at pod YAML.
Create a file called challenge-answers.txt with this structure:
Q1: <full pod name of the pod using the most CPU in resource-lab>
Q2: <full pod name of the pod using the most memory in resource-lab>
Q3: <yes or no — is any node above 60% CPU?>
Q4: <CPU request for cpu-burner pods, in millicores with the m suffix>
Example:
Q1: cpu-burner-6d7f8b9c5-xk2vp
Q2: memory-hog-7c9b4d8f6-rl3wm
Q3: yes
Q4: 250m
Validate your answers:
bash starter/challenge-validate.sh challenge-answers.txtPassing target: 4/4.
Discovery Questions
-
Units: A node shows
CPU(cores): 1250m. How many full cores is that? If the node has 4 allocatable cores, what is the CPU%? -
Lag: You just deployed a new CPU-intensive pod.
kubectl top podsstill doesn't show it 10 seconds later. Is this a bug? What is the metrics-server scrape interval and why does it matter? -
Limits vs Requests: The
cpu-burnerpod has a request of 250m and a limit of 500m. If the node has no other workloads, willkubectl topshow 250m or 500m (or something else)? Why? -
Memory growth: A pod starts showing steadily increasing memory in
kubectl topover 30 minutes without any corresponding increase in traffic. What should you suspect, and what command would you run next? -
Missing from top: A pod is in
Runningstate but never appears inkubectl top pods. What are two possible explanations?
Verification Checklist
You are done when:
metrics-serveris running andkubectl top nodesreturns data- You identified top CPU and memory pods using
--sort-by - You explained one request-vs-actual mismatch using
describe+top - You completed the graded challenge with
4/4
Cleanup
kubectl config use-context kind-lab 2>/dev/null || true
kind delete cluster --name resource-lab
rm -f /tmp/kind-resource.yamlKey Takeaways
kubectl toprequires metrics-server; in kind, patch it with--kubelet-insecure-tls- CPU is in millicores (
m), memory inMi/Gi;%on nodes is relative to allocatable, not raw capacity --sort-by=cpuand--sort-by=memoryare the fastest path to finding resource hogs- Requests are scheduling reservations; actual usage can be higher (up to the limit) or much lower
- Completed Job pods disappear from
kubectl top— resources are released immediately - Use label selectors (
-l tier=compute) to scope top to a single service in a crowded namespace
Reinforcement Scenarios
27-jerry-resource-hog-hunt— identify and evict a CPU-hungry pod under time pressure28-jerry-hpa-not-scaling— metrics-server missing or broken; HPA cannot scalejerry-hpa-not-scaling(gym) — HPA troubleshooting starting from a dead metrics-server

