Lab 8: Vertical Pod Autoscaler (VPA) — Right-Sizing Resource Requests
Time: 25-30 minutes
Objective: Use VPA to observe and apply right-sized CPU/memory requests based on actual workload behavior
The Story
In Lab 5 you set up HPA to scale out — more replicas when CPU climbs. But HPA assumes your resource requests are already accurate. If Jerry's app has requests: cpu: 1000m and it actually uses 80m at peak, the scheduler reserves 12.5x the CPU it needs. On a 10-node cluster that's real money and real placement inefficiency.
VPA watches your pods over time and tells you — or automatically applies — better-fit requests and limits. On the CKA exam you're expected to know what VPA is, how it differs from HPA, and how to read its recommendations.
Background: Vertical Rightsizing and Disruption Tradeoffs
VPA continuously compares observed container usage against declared requests and limits, then produces bounded recommendations. Unlike HPA, VPA tuning often requires pod recreation to apply changes, so disruption policy and rollout timing matter. In practice, teams start in recommendation mode and promote changes deliberately.
VPA vs HPA — Quick Model
| HPA | VPA | |
|---|---|---|
| What it changes | spec.replicas |
resources.requests and limits per container |
| Responds to | Aggregate utilization across replicas | Per-pod utilization vs requests ratio |
| Scaling axis | Horizontal (more pods) | Vertical (bigger/smaller pods) |
| Pod restart required | No | Yes — VPA applies changes by evicting and recreating pods |
| Use together? | Yes, but don't let both control CPU — use VPA for memory, HPA for CPU |
Part 1: Install VPA
VPA is not included in kind by default. Install it from the official repo:
# Clone the autoscaler repo (VPA lives here)
git clone https://github.com/kubernetes/autoscaler.git --depth 1
cd autoscaler/vertical-pod-autoscaler
# Install VPA components
./hack/vpa-up.sh
# Verify the three VPA components are running
kubectl get pods -n kube-system | grep vpaYou should see three pods:
vpa-recommender— watches actual usage and generates recommendationsvpa-updater— evicts pods when recommendations differ significantly from current requestsvpa-admission-controller— rewrites pod specs at admission time if updateMode isAuto
Part 2: Deploy an Under-Resourced App
Create a deployment with intentionally wrong resource requests — the kind of thing Jerry would commit:
# deploy-overprovisioned.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: jerry-overprovisioned
spec:
replicas: 2
selector:
matchLabels:
app: jerry-overprovisioned
template:
metadata:
labels:
app: jerry-overprovisioned
spec:
containers:
- name: app
image: nginx:1.25-alpine
resources:
requests:
cpu: 500m # Jerry's guess — probably too high
memory: 256Mi # Jerry's guess — probably too high
limits:
cpu: 1000m
memory: 512Mikubectl apply -f deploy-overprovisioned.yaml
kubectl rollout status deployment/jerry-overprovisionedPart 3: Create a VPA Object in Recommendation Mode
Create a VPA targeting the deployment. Start in Off mode — this means VPA only recommends, it never touches your pods:
# vpa-recommend.yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: jerry-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: jerry-overprovisioned
updatePolicy:
updateMode: "Off" # Recommend only — do not evict or mutate pods
resourcePolicy:
containerPolicies:
- containerName: app
minAllowed:
cpu: 10m
memory: 32Mi
maxAllowed:
cpu: 2
memory: 1Gikubectl apply -f vpa-recommend.yamlVPA needs time to gather metrics. Wait a few minutes, then check recommendations:
kubectl describe vpa jerry-vpaLook for the Status.Recommendation section. You'll see Lower Bound, Target, Upper Bound, and Uncapped Target for each container.
Explore the VPA spec to understand all fields:
kubectl explain vpa.spec
kubectl explain vpa.spec.updatePolicy
kubectl explain vpa.spec.resourcePolicy.containerPoliciesPart 4: Generate Some Load to Shape the Recommendation
Run a brief load test so VPA has more signal to work with:
# Port-forward the app
kubectl port-forward deployment/jerry-overprovisioned 8080:80 &
# Generate requests for 60 seconds
for i in $(seq 1 300); do curl -s http://localhost:8080 > /dev/null; sleep 0.2; done
# Kill port-forward
kill %1After another 2-3 minutes, recheck the VPA recommendation:
kubectl describe vpa jerry-vpaCompare the Target recommendation to Jerry's original 500m / 256Mi requests.
Part 5: Understand Update Modes
VPA has four updateMode values — knowing these is exam-relevant:
| Mode | Behavior |
|---|---|
Off |
Recommendations only. Never modifies pods. |
Initial |
Applies recommendations only to new pods at creation. Never evicts running pods. |
Recreate |
Evicts pods and applies recommendations when the pod's current requests differ significantly from recommendation. |
Auto |
Combination of Initial + Recreate. Current default "full auto" mode. |
For production, Off or Initial are safest starting points. Auto in a cluster with no PodDisruptionBudgets is aggressive.
Part 6: Apply Recommendations Manually (Off Mode Workflow)
Since you're in Off mode, you control when changes land. Read the recommendation and apply it yourself:
# Get the target recommendation
kubectl get vpa jerry-vpa -o jsonpath='{.status.recommendation.containerRecommendations[0].target}'Update the deployment with the recommended values:
kubectl patch deployment jerry-overprovisioned -p='{
"spec": {
"template": {
"spec": {
"containers": [{
"name": "app",
"resources": {
"requests": {
"cpu": "<VPA target cpu>",
"memory": "<VPA target memory>"
}
}
}]
}
}
}
}'Verify the rollout applied cleanly:
kubectl rollout status deployment/jerry-overprovisioned
kubectl describe deployment jerry-overprovisioned | grep -A6 RequestsHPA + VPA Together — The Safe Pattern
If you're using both on the same deployment:
- Let HPA own CPU scaling (replica count based on CPU utilization)
- Let VPA own memory right-sizing (
resourcePolicyto exclude CPU from VPA control) - Set VPA updateMode to
OfforInitialand manually review before applying
Exclude CPU from VPA recommendations:
resourcePolicy:
containerPolicies:
- containerName: app
controlledResources:
- memory # VPA only recommends memory — leave CPU to HPADiscovery Questions
-
You have a pod with
requests.cpu: 500mand VPA recommendsTarget: 80m. InOffmode, does the running pod change? What about a new pod created after you switch toInitialmode? -
VPA evicts a pod to apply new resource requests. What cluster feature prevents VPA from taking down every pod in a deployment simultaneously?
-
Your deployment has 1 replica and VPA is set to
Auto. VPA wants to right-size resources. What happens to your running pod? Is there any downtime? -
HPA is scaling your deployment based on CPU. VPA is also running in
Automode and is changingrequests.cpu. Why might this cause the HPA to make poor decisions, and what would you do about it? -
kubectl describe vpa jerry-vpashowsLower Bound,Target, andUpper Bound. What is the practical difference betweenTargetandUpper Boundfor a production sizing decision?
Verification Checklist
You are done when:
- VPA components are running in
kube-system jerry-vpashows a non-empty recommendation for the target deployment- You can explain
Off,Initial, andAutoupdate modes - You applied a recommendation intentionally and verified rollout success
Cleanup
kubectl delete vpa jerry-vpa
kubectl delete deployment jerry-overprovisioned
cd ../../../..
rm -rf autoscalerKey Takeaways
- VPA right-sizes requests and limits based on observed usage — it doesn't change replica count
Offmode is recommendation-only and is the safest starting point in production- VPA applies changes by evicting and recreating pods — understand the disruption model
- HPA and VPA can coexist safely if you divide responsibility: HPA for CPU replicas, VPA for memory sizing
- CKA exam expects you to know the difference between HPA and VPA and read VPA recommendation output
Reinforcement Scenarios
27-jerry-resource-hog-huntjerry-hpa-not-scaling

