Week 8: GitOps with ArgoCD + DevSecOps Pipeline
Overview
Duration: 3 hours
Format: Lecture + Hands-on Labs
You've spent seven weeks building containers by hand — Dockerfiles, multi-stage builds, Kustomize overlays, network policies, Helm charts. Every deployment was manual: you pushed code and either ran kubectl apply yourself or waited for someone else's automation to pick it up.
This week you own the full loop. You'll install ArgoCD on your own cluster, fork a portfolio template that you'll keep after the course, understand the 8-stage security pipeline protecting it, and deploy it via GitOps. When something breaks, you'll learn the only way to fix it.
The result: a working DevSecOps portfolio piece — your code, your pipeline, your cluster, your deployment. Something you can show an employer.
Learning Outcomes
By the end of this class, you will be able to:
- Install ArgoCD on a kind cluster using Helm and explain what each component does
CKA: Cluster Architecture, Installation and Configuration - Run five DevSecOps tools locally (ruff, bandit, hadolint, trivy, kubeconform) and explain what each catches
CKA: Troubleshooting - Read a CI/CD pipeline definition and explain its fail-fast stage ordering
CKA: Troubleshooting - Deploy an application to Kubernetes via ArgoCD GitOps (push -> detect -> sync)
CKA: Cluster Architecture, Installation and Configuration - Demonstrate that rollback in GitOps means reverting the source, not clicking a UI button
CKA: Troubleshooting - Run a timed cluster-component troubleshooting sprint across DNS, scheduler symptoms, resource pressure, and multi-container log triage
CKA: Troubleshooting - Design and validate an HA control-plane topology and failure workflow
CKA: Cluster Architecture, Installation and Configuration - (Bonus) Integrate HashiCorp Vault for secrets management in a Kubernetes deployment
CKA: Cluster Architecture, Installation and Configuration - Complete a timed CKA-style mixed-domain mock sprint with evidence capture and remediation mapping
CKA: Troubleshooting + Cluster Architecture, Installation and Configuration + Services and Networking + Workloads and Scheduling + Storage
Pre-Class Setup
Verify tools:
kubectl version --client
kind version
helm version
docker version
gh auth status
pip --versionYou will also need a GitHub account and a fork of the devsecops-portfolio-template repository (Lab 1 walks you through this).
Key Concepts
GitOps: The Model
GitOps means Git is the single source of truth for your cluster state. An operator (ArgoCD) watches your repo and continuously reconciles the cluster to match what's in Git.
The core loop:
- Developer pushes a change to Git
- ArgoCD detects the change (polling or webhook)
- ArgoCD renders the manifests (
kustomize build) - ArgoCD compares rendered manifests to live cluster state
- ArgoCD applies the diff
ArgoCD Architecture
ArgoCD runs inside your cluster as a set of controllers:
- API Server — serves the UI and API
- Repo Server — clones repos, renders manifests (runs
kustomize build) - Application Controller — watches Applications, compares desired vs live state, syncs
Polling vs Webhooks
ArgoCD polls Git by default (every 3 minutes, we'll lower it to 30s for labs). Webhooks are faster but require your cluster to be reachable from GitHub — not possible with a local kind cluster.
Rollback in GitOps
This is the most important concept this week:
In GitOps, rollback means reverting the commit in Git. ArgoCD re-syncs the reverted state. Clicking "Rollback" in the UI works temporarily, but if auto-sync is on, ArgoCD will re-sync the bad commit from Git within seconds. The fix is always in the source.
DevSecOps Pipeline: Fail Fast
A good CI pipeline orders stages by cost: cheap, fast checks first — expensive checks later. The portfolio template's 8-stage pipeline follows this pattern:
- Code Quality (ruff + bandit) — seconds, no build needed
- Dockerfile Scan (hadolint + trivy config) — seconds, no build needed
- Build — minutes, produces the image
- Container Scan (trivy image) — needs the built image
- Push (GHCR) — only on main, only if scans pass
- K8s Validation (kubeconform) — validates manifests
- Integration Test — starts the container, hits endpoints
- Update Tag — commits new SHA back to Git (only on main)
If ruff finds a lint error in stage 1, you never waste time building a container. That's fail-fast.
CRDs and Operators
The 2025 CKA curriculum added CRDs and operators explicitly. You don't need to write an operator, but you need to diagnose failures at this layer.
What a CRD does: A CustomResourceDefinition teaches the API server about a new object kind — Application, Certificate, HelmRelease, etc. Once a CRD is installed, you can kubectl apply a Custom Resource (CR) of that kind and the API server will accept and store it.
What an operator does: An operator is a controller (usually a Deployment in the cluster) that watches CRs and reconciles real infrastructure to match. ArgoCD's Application controller is an operator — it watches Application CRs and syncs clusters.
The failure pattern to know:
# If you apply a CR and the CRD isn't installed yet, you get:
# error: no kind "Application" is registered for version "argoproj.io/v1alpha1"
# Check what CRDs are installed
kubectl get crd
kubectl get crd applications.argoproj.io # check a specific one
# Check if the operator (controller) is running and healthy
kubectl -n argocd get deployment argocd-application-controller
kubectl -n argocd logs deployment/argocd-application-controller --tail=50
# See the CR and its status conditions (operator writes status back here)
kubectl get application -n argocd my-app -o yaml | grep -A20 "status:"Reference: Custom Resources | Operator Pattern
HA Control Plane + etcd Quorum
Lab 6 covers HA topology in detail. The exam tests whether you understand quorum and what happens when you lose a control-plane node — know this even if you only skim the lab.
etcd quorum: etcd requires a majority of members to agree before accepting writes. With n members, the cluster needs (n/2)+1 members healthy to remain writable.
| Cluster size | Failure tolerance | Write availability |
|---|---|---|
| 1 member | 0 failures | Writes fail if the one node is lost |
| 3 members | 1 failure | Continues with 2 of 3 |
| 5 members | 2 failures | Continues with 3 of 5 |
What happens when you lose a control-plane node:
- If quorum is maintained (e.g. 2 of 3 etcd members still up): API server on remaining nodes continues serving reads and writes. Scheduling and reconciliation continue.
- If quorum is lost: API server goes read-only (existing pods keep running, kubelet continues, but no new scheduling or writes). Recovery requires restoring quorum — either by restarting the failed member or restoring from snapshot.
Quick HA inspection commands:
# Check etcd member list (run inside an etcd pod)
ETCD=$(kubectl -n kube-system get pods -l component=etcd -o name | head -1)
kubectl -n kube-system exec "$ETCD" -- etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
member list
# Confirm control-plane pods on each node
kubectl get pods -n kube-system -l tier=control-plane -o wideReference: HA Topology Options | etcd Clustering
CKA Cluster Troubleshooting Triage Order
Lab 5 is the most CKA-aligned lab in the course. Whether or not you run it end-to-end, internalize this triage sequence — it covers 30% of the exam's Troubleshooting domain.
When something is broken on a cluster, check in this order:
1. Node health kubectl get nodes
2. Control-plane pods kubectl -n kube-system get pods -l tier=control-plane
3. DNS kubectl run dns-check --rm -it --image=busybox:1.36 -- nslookup kubernetes
4. Scheduler logs kubectl -n kube-system logs deployment/kube-scheduler --tail=40
5. API server logs kubectl -n kube-system logs kube-apiserver-<node> --tail=40
6. kubelet on node journalctl -u kubelet -n 60 --no-pager (SSH to node)
7. Events kubectl get events -A --sort-by=.metadata.creationTimestamp | tail -40
8. Pod logs kubectl logs <pod> -c <container> --previous
Rule of thumb: Work from cluster-level down to pod-level. A failing node corrupts many pods; a missing DNS entry corrupts all pods on the cluster; a misconfigured deployment affects only one workload. Start broad, narrow down.
Labs
Lab 1: Install ArgoCD on kind with Helm (~40 min)
See labs/lab-01-argocd-install/
You will:
- Fork the portfolio template repo (you'll need it in Lab 3, but start CI running now)
- Install ArgoCD via Helm with a custom values file
- Access the ArgoCD UI via port-forward
- Extract the admin password and tour the interface
Lab 2: CI/CD Pipeline Deep Dive (~50 min)
See labs/lab-02-ci-pipeline-tools/
You will:
- Read the 8-stage CI pipeline and understand its fail-fast ordering
- Install and run five DevSecOps tools locally: ruff, bandit, hadolint, trivy, kubeconform
- Intentionally introduce security issues and watch the tools catch them
- Push a commit to your fork and watch CI run in GitHub Actions
Lab 3: The GitOps Loop + Revert (~40 min)
You will:
- Point ArgoCD at your forked portfolio repo
- Push a change and watch ArgoCD auto-sync it to the cluster
- Break the deployment on purpose, attempt a UI rollback, and learn why it doesn't stick
- Fix it the right way with
git revert
Lab 4: Vault Integration (Bonus, ~30 min)
See labs/lab-04-vault-integration/
You will:
- Install Vault in dev mode on your kind cluster
- Create a GitHub App for API access
- Store credentials in Vault and wire them to the portfolio app
- See the dashboard come alive with live GitHub data
Lab 5: Cluster Component Troubleshooting Sprint (~60 min)
CKA weight: Troubleshooting is 30% of the exam. This lab is structured as a timed sprint across the exact failure categories the exam tests. Treat it as a required lab, not an extension — the triage sequence it builds is the most exam-relevant hour in the course.
See labs/lab-05-cluster-component-troubleshooting/
You will:
- Run timed incidents for CoreDNS, scheduling failures, control-plane scheduler breakage, resource pressure, and multi-container pod triage
- Use
kubectl get events,describe, logs, and control-plane checks to isolate root causes - Document triage order and recovery commands like a real incident response runbook
Lab 6 (CKA Extension): HA Control Plane Design + Simulation (~40 min)
See labs/lab-06-ha-control-plane-design/
You will:
- Build a multi-control-plane kind topology to visualize HA behavior
- Inspect etcd quorum and control-plane endpoint flow
- Simulate control-plane node loss and verify cluster continuity
Lab 7 (CKA Capstone Draft): Timed Mock Sprint (~120 min)
See labs/lab-07-cka-mock-sprint/
You will:
- Run a timed 12-task mixed-domain sprint under exam-style constraints
- Capture command and verification evidence in a scoring worksheet
- Identify weak domains and map them to targeted
gymctlremediation drills
Discovery Questions
- You click "Rollback" in the ArgoCD UI but auto-sync is enabled. What happens 30 seconds later, and why?
- The CI pipeline runs code-quality first and container-scan fourth. Why not scan the container first?
- Bandit found a medium-severity issue but the pipeline still passed. Trivy found a HIGH CVE and the pipeline failed. Why the different thresholds?
- Your portfolio works locally with
docker runbut fails in Kubernetes. What's the first thing you check? - The CI pipeline's last job commits a new image tag back to Git. What prevents this from triggering an infinite pipeline loop?
Homework
Homework exercises run in your DevContainer via gymctl (the container-gym CLI).
| Exercise | Time | Focus |
|---|---|---|
jerry-argo-out-of-sync |
25 min | Debug an Application that won't sync (wrong path, bad creds, namespace issues) |
jerry-ci-pipeline-fix |
25 min | Fix a broken GitHub Actions workflow |
jerry-coredns-loop |
25 min | DNS outage triage and CoreDNS recovery |
jerry-node-notready-kubelet |
25 min | Node NotReady incident diagnosis and recovery evidence |
34-jerry-init-container-stuck |
25 min | Init:CrashLoopBackOff diagnosis in multi-container workloads |
37-jerry-scheduler-missing |
25 min | Scheduler/API symptom triage from control-plane manifest issues |
jerry-crd-operator-broken (optional extension) |
25 min | CRD and operator reconciliation triage |
Resources
- ArgoCD Docs: https://argo-cd.readthedocs.io/en/stable/
- ArgoCD Helm Chart: https://github.com/argoproj/argo-helm/tree/main/charts/argo-cd
- kubeconform: https://github.com/yannh/kubeconform
- Trivy: https://aquasecurity.github.io/trivy/
- Ruff: https://docs.astral.sh/ruff/
- Bandit: https://bandit.readthedocs.io/
- Hadolint: https://github.com/hadolint/hadolint
- GitHub Actions: https://docs.github.com/en/actions
- Custom Resources: https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/
- Operator Pattern: https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
- HA Topology Options: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/ha-topology/
- Troubleshooting Clusters: https://kubernetes.io/docs/tasks/debug/debug-cluster/
Course Wrap-Up
You started this course running docker run nginx. You're ending it with:
- A DevSecOps portfolio you own — code, pipeline, deployment
- An 8-stage CI/CD pipeline that catches bugs, vulnerabilities, and bad YAML before anything reaches your cluster
- ArgoCD watching your repo and syncing your cluster automatically
- The knowledge that in GitOps, the fix is always in the source
That's the full stack. Ship it.