DevOps Automation
Master CI/CD pipelines, GitOps workflows, and infrastructure automation with Kubernetes and Terraform. Learn to build automated deployment systems that enable rapid, reliable releases.
Prerequisites
- Git version control basics
- Linux command-line proficiency
- Basic understanding of containers (Docker)
- Familiarity with YAML syntax
- A cloud account (AWS/GCP/Azure) or local Kubernetes cluster
Learning Objectives
- Design and implement CI/CD pipelines
- Configure automated testing and validation
- Implement GitOps workflows with ArgoCD
- Automate infrastructure with Terraform
- Set up automated security scanning
- Implement canary and blue-green deployments
Step-by-Step Guide
1Set Up CI/CD Pipeline Foundation
Start with a robust CI/CD pipeline using GitHub Actions.
# .github/workflows/ci-cd.yml
name: CI/CD Pipeline
on:
push:
branches: [main, develop]
pull_request:
branches: [main]
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
jobs:
# Linting and SAST
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
- name: Install dependencies
run: npm ci
- name: Run linter
run: npm run lint
- name: Run SAST scan
run: npm audit --audit-level=moderate
# Unit Tests
test:
runs-on: ubuntu-latest
needs: lint
steps:
- uses: actions/checkout@v4
- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- name: Install dependencies
run: npm ci
- name: Run tests with coverage
run: npm run test:coverage
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v3
with:
files: coverage/lcov.info
fail_ci_if_error: false
# Build and Security Scan
build:
runs-on: ubuntu-latest
needs: test
steps:
- uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Log in to Container Registry
uses: docker/login-action@v2
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata
id: meta
uses: docker/metadata-action@v4
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
- name: Build and push Docker image
uses: docker/build-push-action@v4
with:
context: .
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@master
with:
image-ref: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
format: 'sarif'
output: 'trivy-results.sarif'
- name: Upload Trivy scan results
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: 'trivy-results.sarif'
# Integration Tests
integration-test:
runs-on: ubuntu-latest
needs: build
steps:
- uses: actions/checkout@v4
- name: Run integration tests
run: |
docker-compose -f docker-compose.test.yml up -d
npm run test:integration
docker-compose -f docker-compose.test.yml down
# Deploy to Staging
deploy-staging:
runs-on: ubuntu-latest
needs: integration-test
if: github.ref == 'refs/heads/develop'
environment: staging
steps:
- uses: actions/checkout@v4
- name: Deploy to staging
run: |
kubectl set image deployment/api api=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
kubectl rollout status deployment/api --timeout=300s
# Deploy to Production
deploy-production:
runs-on: ubuntu-latest
needs: deploy-staging
if: github.ref == 'refs/heads/main'
environment: production
steps:
- uses: actions/checkout@v4
- name: Deploy to production (canary)
run: |
# Deploy canary with 10% traffic
kubectl apply -f k8s/canary-deployment.yaml
# Monitor canary for 15 minutes
sleep 900
# If healthy, promote to full deployment
kubectl apply -f k8s/production-deployment.yaml
2Implement GitOps with ArgoCD
GitOps treats your Git repository as the single source of truth for infrastructure and applications.
# Install ArgoCD
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
# Access ArgoCD UI
kubectl port-forward svc/argocd-server -n argocd 8080:443
# Get initial password
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d
# Create Application manifest
cat > k8s-manifests/myapp-application.yaml << 'EOF'
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: myapp
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/yourorg/myapp.git
targetRevision: HEAD
path: k8s-manifests
destination:
server: https://kubernetes.default.svc
namespace: production
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
EOF
kubectl apply -f k8s-manifests/myapp-application.yaml
Directory Structure for GitOps:
repository/
├── app/
│ ├── src/
│ ├── Dockerfile
│ └── package.json
├── k8s-manifests/
│ ├── base/
│ │ ├── deployment.yaml
│ │ ├── service.yaml
│ │ └── configmap.yaml
│ ├── overlays/
│ │ ├── staging/
│ │ │ ├── kustomization.yaml
│ │ │ └── patch.yaml
│ │ └── production/
│ │ ├── kustomization.yaml
│ │ └── patch.yaml
│ └── argocd-applications/
│ └── myapp.yaml
└── terraform/
├── environments/
│ ├── staging/
│ └── production/
└── modules/
3Infrastructure as Code with Terraform
Automate infrastructure provisioning with Terraform.
# terraform/main.tf
terraform {
required_version = ">= 1.6.0"
backend "s3" {
bucket = "company-terraform-state"
key = "prod/infrastructure.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-locks"
}
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
kubernetes = {
source = "hashicorp/kubernetes"
version = "~> 2.23"
}
}
}
provider "aws" {
region = var.aws_region
}
provider "kubernetes" {
host = data.aws_eks_cluster.cluster.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority[0].data)
token = data.aws_eks_cluster_auth.cluster.token
}
# EKS Cluster
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "~> 19.0"
cluster_name = var.cluster_name
cluster_version = "1.28"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnets
cluster_endpoint_public_access = true
node_groups = {
general = {
desired_size = 3
max_size = 10
min_size = 2
instance_types = ["m6i.large"]
scaling_config = {
desired_size = 3
max_size = 10
min_size = 2
}
}
}
}
# Kubernetes Namespace
resource "kubernetes_namespace" "app" {
metadata {
name = var.app_name
labels = {
name = var.app_name
}
}
}
# Application Deployment
resource "kubernetes_deployment" "app" {
depends_on = [module.eks]
metadata {
name = var.app_name
namespace = kubernetes_namespace.app.metadata[0].name
labels = {
app = var.app_name
}
}
spec {
replicas = var.replicas
selector {
match_labels = {
app = var.app_name
}
}
template {
metadata {
labels = {
app = var.app_name
}
}
spec {
container {
name = var.app_name
image = var.container_image
port {
container_port = 8080
}
env {
name = "ENVIRONMENT"
value = var.environment
}
resources {
requests = {
memory = "256Mi"
cpu = "250m"
}
limits = {
memory = "512Mi"
cpu = "500m"
}
}
liveness_probe {
http_get {
path = "/health"
port = 8080
}
initial_delay_seconds = 30
period_seconds = 10
}
readiness_probe {
http_get {
path = "/ready"
port = 8080
}
initial_delay_seconds = 5
period_seconds = 5
}
}
}
}
}
}
# terraform/variables.tf
variable "aws_region" {
type = string
default = "us-east-1"
}
variable "cluster_name" {
type = string
}
variable "app_name" {
type = string
}
variable "container_image" {
type = string
}
variable "replicas" {
type = number
default = 3
}
variable "environment" {
type = string
}
4Automated Security Scanning
Integrate security scanning into your CI/CD pipeline.
# .github/workflows/security.yml
name: Security Scanning
on:
push:
branches: [main, develop]
pull_request:
branches: [main]
schedule:
- cron: '0 0 * * 0' # Weekly scan
jobs:
# SAST - Static Application Security Testing
sast:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run CodeQL
uses: github/codeql-action/init@v2
with:
languages: javascript
- run: github/codeql-action/analyze@v2
# Container Scanning
container-scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@master
with:
scan-type: 'fs'
scan-ref: '.'
format: 'table'
exit-code: '1'
severity: 'CRITICAL,HIGH'
# Infrastructure Scanning
infra-scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run Checkov
uses: bridgecrewio/checkov-action@v12
with:
directory: ./terraform
framework: terraform
soft_fail: true
# Secret Detection
secret-scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run gitleaks
uses: gitleaks/gitleaks-action@v2
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITLEAKS_LOG_LEVEL: 5
# Dependency Scanning
dependency-scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run npm audit
run: npm audit --audit-level=high
- name: Run Snyk test
uses: snyk/actions/node@master
env:
SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
with:
command: test
5Canary Deployments
Implement canary deployments for safe, gradual rollouts.
# k8s/canary-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-canary
labels:
app: api
version: canary
spec:
replicas: 1
selector:
matchLabels:
app: api
version: canary
template:
metadata:
labels:
app: api
version: canary
spec:
containers:
- name: api
image: ghcr.io/yourorg/api:v2.1.0
ports:
- containerPort: 8080
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
---
# Service routing 10% traffic to canary
apiVersion: v1
kind: Service
metadata:
name: api
spec:
selector:
app: api
ports:
- port: 80
targetPort: 8080
type: ClusterIP
---
# Istio VirtualService for traffic splitting (if using service mesh)
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: api-vs
spec:
hosts:
- api
http:
- route:
- destination:
host: api
subset: stable
weight: 90
- destination:
host: api
subset: canary
weight: 10
---
# Prometheus alert for canary monitoring
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: canary-alerts
spec:
groups:
- name: canary
rules:
- alert: CanaryErrorRateHigh
expr: |
sum(rate(http_requests_total{app="api",version="canary",status=~"5.."}[5m]))
/
sum(rate(http_requests_total{app="api",version="canary"}[5m])) > 0.05
for: 5m
labels:
severity: warning
annotations:
summary: "Canary error rate is high"
description: "Canary deployment has {{ $value | humanizePercentage }} error rate"
- alert: CanaryLatencyHigh
expr: |
histogram_quantile(0.95,
sum(rate(http_request_duration_seconds_bucket{app="api",version="canary"}[5m])) by (le))
>
histogram_quantile(0.95,
sum(rate(http_request_duration_seconds_bucket{app="api",version="stable"}[5m])) by (le)) * 1.5
for: 5m
labels:
severity: warning
annotations:
summary: "Canary latency is higher than stable"
6Automated Rollback
Implement automated rollback based on health checks.
# .github/workflows/rollback.yml
name: Automated Rollback
on:
workflow_dispatch:
inputs:
deployment_name:
description: 'Deployment to rollback'
required: true
default: 'api'
environment:
description: 'Environment'
required: true
type: choice
options:
- staging
- production
jobs:
rollback:
runs-on: ubuntu-latest
environment: ${{ github.event.inputs.environment }}
steps:
- name: Rollback deployment
run: |
kubectl rollout undo deployment/${{ github.event.inputs.deployment_name }} -n ${{ github.event.inputs.environment }}
kubectl rollout status deployment/${{ github.event.inputs.deployment_name }} -n ${{ github.event.inputs.environment }} --timeout=300s
- name: Verify rollback
run: |
# Wait for rollout to complete
kubectl rollout status deployment/${{ github.event.inputs.deployment_name }} -n ${{ github.event.inputs.environment }}
# Check health endpoint
for i in {1..10}; do
if curl -sf http://api.${{ github.event.inputs.environment }}.internal/health; then
echo "Health check passed"
exit 0
fi
sleep 10
done
echo "Health check failed"
exit 1
# Automated rollback script for use in pipelines
cat > scripts/rollback.sh << 'EOF'
#!/bin/bash
set -e
DEPLOYMENT=${1:-api}
NAMESPACE=${2:-production}
TIMEOUT=${3:-300}
echo "Rolling back $DEPLOYMENT in $NAMESPACE..."
# Get previous revision
kubectl rollout undo deployment/$DEPLOYMENT -n $NAMESPACE
# Wait for rollout
echo "Waiting for rollout to complete..."
kubectl rollout status deployment/$DEPLOYMENT -n $NAMESPACE --timeout=${TIMEOUT}s
# Verify health
echo "Verifying health..."
for i in {1..30}; do
if curl -sf http://$DEPLOYMENT.$NAMESPACE.svc.cluster.local/health; then
echo "✓ Health check passed"
exit 0
fi
sleep 10
done
echo "✗ Health check failed after 5 minutes"
exit 1
EOF
chmod +x scripts/rollback.sh
Best Practices
DevOps Automation Principles:
- Everything as Code: Infrastructure, configurations, policies - all in version control
- Immutable Infrastructure: Replace, don't update. Never patch production servers
- Automate Everything: If you do it twice, automate it
- Fail Fast: Catch issues early in the pipeline
- Single Source of Truth: Git should be the authoritative source
- Security First: Shift security left into the development process
CI/CD Pipeline Stages:
- Commit: Pre-commit hooks, linting
- Build: Compile, package, containerize
- Test: Unit, integration, E2E tests
- Scan: Security, vulnerability, compliance
- Stage: Deploy to staging environment
- Release: Canary deployment to production
- Monitor: Observe, alert, feedback
Assessment
1. What is GitOps?
2. Which tool is commonly used for Infrastructure as Code?
3. What is the main benefit of canary deployments?
4. What does SAST stand for?
Answer Key: 1-B, 2-C, 3-B, 4-A