DevOps Automation

Master CI/CD pipelines, GitOps workflows, and infrastructure automation with Kubernetes and Terraform. Learn to build automated deployment systems that enable rapid, reliable releases.

⏱️ 60 minutes 📊 Advanced 📝 6 steps 🏷️ IT Operations

Prerequisites

  • Git version control basics
  • Linux command-line proficiency
  • Basic understanding of containers (Docker)
  • Familiarity with YAML syntax
  • A cloud account (AWS/GCP/Azure) or local Kubernetes cluster

Learning Objectives

  • Design and implement CI/CD pipelines
  • Configure automated testing and validation
  • Implement GitOps workflows with ArgoCD
  • Automate infrastructure with Terraform
  • Set up automated security scanning
  • Implement canary and blue-green deployments

Step-by-Step Guide

1Set Up CI/CD Pipeline Foundation

Start with a robust CI/CD pipeline using GitHub Actions.

# .github/workflows/ci-cd.yml
name: CI/CD Pipeline

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  # Linting and SAST
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20'
          
      - name: Install dependencies
        run: npm ci
        
      - name: Run linter
        run: npm run lint
        
      - name: Run SAST scan
        run: npm audit --audit-level=moderate

  # Unit Tests
  test:
    runs-on: ubuntu-latest
    needs: lint
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'
          
      - name: Install dependencies
        run: npm ci
        
      - name: Run tests with coverage
        run: npm run test:coverage
        
      - name: Upload coverage to Codecov
        uses: codecov/codecov-action@v3
        with:
          files: coverage/lcov.info
          fail_ci_if_error: false

  # Build and Security Scan
  build:
    runs-on: ubuntu-latest
    needs: test
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2
        
      - name: Log in to Container Registry
        uses: docker/login-action@v2
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}
          
      - name: Extract metadata
        id: meta
        uses: docker/metadata-action@v4
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
          
      - name: Build and push Docker image
        uses: docker/build-push-action@v4
        with:
          context: .
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha
          cache-to: type=gha,mode=max
      
      - name: Run Trivy vulnerability scanner
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
          format: 'sarif'
          output: 'trivy-results.sarif'
          
      - name: Upload Trivy scan results
        uses: github/codeql-action/upload-sarif@v2
        with:
          sarif_file: 'trivy-results.sarif'

  # Integration Tests
  integration-test:
    runs-on: ubuntu-latest
    needs: build
    steps:
      - uses: actions/checkout@v4
      
      - name: Run integration tests
        run: |
          docker-compose -f docker-compose.test.yml up -d
          npm run test:integration
          docker-compose -f docker-compose.test.yml down

  # Deploy to Staging
  deploy-staging:
    runs-on: ubuntu-latest
    needs: integration-test
    if: github.ref == 'refs/heads/develop'
    environment: staging
    steps:
      - uses: actions/checkout@v4
      
      - name: Deploy to staging
        run: |
          kubectl set image deployment/api api=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
          kubectl rollout status deployment/api --timeout=300s

  # Deploy to Production
  deploy-production:
    runs-on: ubuntu-latest
    needs: deploy-staging
    if: github.ref == 'refs/heads/main'
    environment: production
    steps:
      - uses: actions/checkout@v4
      
      - name: Deploy to production (canary)
        run: |
          # Deploy canary with 10% traffic
          kubectl apply -f k8s/canary-deployment.yaml
          # Monitor canary for 15 minutes
          sleep 900
          # If healthy, promote to full deployment
          kubectl apply -f k8s/production-deployment.yaml

2Implement GitOps with ArgoCD

GitOps treats your Git repository as the single source of truth for infrastructure and applications.

# Install ArgoCD
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

# Access ArgoCD UI
kubectl port-forward svc/argocd-server -n argocd 8080:443
# Get initial password
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d

# Create Application manifest
cat > k8s-manifests/myapp-application.yaml << 'EOF'
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myapp
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/yourorg/myapp.git
    targetRevision: HEAD
    path: k8s-manifests
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true
EOF

kubectl apply -f k8s-manifests/myapp-application.yaml

Directory Structure for GitOps:

repository/
├── app/
│   ├── src/
│   ├── Dockerfile
│   └── package.json
├── k8s-manifests/
│   ├── base/
│   │   ├── deployment.yaml
│   │   ├── service.yaml
│   │   └── configmap.yaml
│   ├── overlays/
│   │   ├── staging/
│   │   │   ├── kustomization.yaml
│   │   │   └── patch.yaml
│   │   └── production/
│   │       ├── kustomization.yaml
│   │       └── patch.yaml
│   └── argocd-applications/
│       └── myapp.yaml
└── terraform/
    ├── environments/
    │   ├── staging/
    │   └── production/
    └── modules/

3Infrastructure as Code with Terraform

Automate infrastructure provisioning with Terraform.

# terraform/main.tf
terraform {
  required_version = ">= 1.6.0"
  
  backend "s3" {
    bucket         = "company-terraform-state"
    key            = "prod/infrastructure.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-locks"
  }
  
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "~> 2.23"
    }
  }
}

provider "aws" {
  region = var.aws_region
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.cluster.token
}

# EKS Cluster
module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 19.0"
  
  cluster_name    = var.cluster_name
  cluster_version = "1.28"
  
  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets
  
  cluster_endpoint_public_access = true
  
  node_groups = {
    general = {
      desired_size = 3
      max_size     = 10
      min_size     = 2
      
      instance_types = ["m6i.large"]
      
      scaling_config = {
        desired_size = 3
        max_size     = 10
        min_size     = 2
      }
    }
  }
}

# Kubernetes Namespace
resource "kubernetes_namespace" "app" {
  metadata {
    name = var.app_name
    labels = {
      name = var.app_name
    }
  }
}

# Application Deployment
resource "kubernetes_deployment" "app" {
  depends_on = [module.eks]
  
  metadata {
    name = var.app_name
    namespace = kubernetes_namespace.app.metadata[0].name
    labels = {
      app = var.app_name
    }
  }
  
  spec {
    replicas = var.replicas
    
    selector {
      match_labels = {
        app = var.app_name
      }
    }
    
    template {
      metadata {
        labels = {
          app = var.app_name
        }
      }
      
      spec {
        container {
          name  = var.app_name
          image = var.container_image
          
          port {
            container_port = 8080
          }
          
          env {
            name  = "ENVIRONMENT"
            value = var.environment
          }
          
          resources {
            requests = {
              memory = "256Mi"
              cpu    = "250m"
            }
            limits = {
              memory = "512Mi"
              cpu    = "500m"
            }
          }
          
          liveness_probe {
            http_get {
              path = "/health"
              port = 8080
            }
            initial_delay_seconds = 30
            period_seconds = 10
          }
          
          readiness_probe {
            http_get {
              path = "/ready"
              port = 8080
            }
            initial_delay_seconds = 5
            period_seconds = 5
          }
        }
      }
    }
  }
}

# terraform/variables.tf
variable "aws_region" {
  type        = string
  default     = "us-east-1"
}

variable "cluster_name" {
  type = string
}

variable "app_name" {
  type = string
}

variable "container_image" {
  type = string
}

variable "replicas" {
  type    = number
  default = 3
}

variable "environment" {
  type = string
}

4Automated Security Scanning

Integrate security scanning into your CI/CD pipeline.

# .github/workflows/security.yml
name: Security Scanning

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]
  schedule:
    - cron: '0 0 * * 0'  # Weekly scan

jobs:
  # SAST - Static Application Security Testing
  sast:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Run CodeQL
        uses: github/codeql-action/init@v2
        with:
          languages: javascript
      
      - run: github/codeql-action/analyze@v2

  # Container Scanning
  container-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Run Trivy vulnerability scanner
        uses: aquasecurity/trivy-action@master
        with:
          scan-type: 'fs'
          scan-ref: '.'
          format: 'table'
          exit-code: '1'
          severity: 'CRITICAL,HIGH'

  # Infrastructure Scanning
  infra-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Run Checkov
        uses: bridgecrewio/checkov-action@v12
        with:
          directory: ./terraform
          framework: terraform
          soft_fail: true

  # Secret Detection
  secret-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Run gitleaks
        uses: gitleaks/gitleaks-action@v2
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          GITLEAKS_LOG_LEVEL: 5

  # Dependency Scanning
  dependency-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Run npm audit
        run: npm audit --audit-level=high
      
      - name: Run Snyk test
        uses: snyk/actions/node@master
        env:
          SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
        with:
          command: test

5Canary Deployments

Implement canary deployments for safe, gradual rollouts.

# k8s/canary-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-canary
  labels:
    app: api
    version: canary
spec:
  replicas: 1
  selector:
    matchLabels:
      app: api
      version: canary
  template:
    metadata:
      labels:
        app: api
        version: canary
    spec:
      containers:
      - name: api
        image: ghcr.io/yourorg/api:v2.1.0
        ports:
        - containerPort: 8080
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"

---
# Service routing 10% traffic to canary
apiVersion: v1
kind: Service
metadata:
  name: api
spec:
  selector:
    app: api
  ports:
  - port: 80
    targetPort: 8080
  type: ClusterIP

---
# Istio VirtualService for traffic splitting (if using service mesh)
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: api-vs
spec:
  hosts:
  - api
  http:
  - route:
    - destination:
        host: api
        subset: stable
      weight: 90
    - destination:
        host: api
        subset: canary
      weight: 10

---
# Prometheus alert for canary monitoring
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: canary-alerts
spec:
  groups:
  - name: canary
    rules:
    - alert: CanaryErrorRateHigh
      expr: |
        sum(rate(http_requests_total{app="api",version="canary",status=~"5.."}[5m])) 
        / 
        sum(rate(http_requests_total{app="api",version="canary"}[5m])) > 0.05
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Canary error rate is high"
        description: "Canary deployment has {{ $value | humanizePercentage }} error rate"

    - alert: CanaryLatencyHigh
      expr: |
        histogram_quantile(0.95, 
          sum(rate(http_request_duration_seconds_bucket{app="api",version="canary"}[5m])) by (le)) 
        > 
        histogram_quantile(0.95,
          sum(rate(http_request_duration_seconds_bucket{app="api",version="stable"}[5m])) by (le)) * 1.5
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Canary latency is higher than stable"

6Automated Rollback

Implement automated rollback based on health checks.

# .github/workflows/rollback.yml
name: Automated Rollback

on:
  workflow_dispatch:
    inputs:
      deployment_name:
        description: 'Deployment to rollback'
        required: true
        default: 'api'
      environment:
        description: 'Environment'
        required: true
        type: choice
        options:
        - staging
        - production

jobs:
  rollback:
    runs-on: ubuntu-latest
    environment: ${{ github.event.inputs.environment }}
    steps:
      - name: Rollback deployment
        run: |
          kubectl rollout undo deployment/${{ github.event.inputs.deployment_name }} -n ${{ github.event.inputs.environment }}
          kubectl rollout status deployment/${{ github.event.inputs.deployment_name }} -n ${{ github.event.inputs.environment }} --timeout=300s

      - name: Verify rollback
        run: |
          # Wait for rollout to complete
          kubectl rollout status deployment/${{ github.event.inputs.deployment_name }} -n ${{ github.event.inputs.environment }}
          
          # Check health endpoint
          for i in {1..10}; do
            if curl -sf http://api.${{ github.event.inputs.environment }}.internal/health; then
              echo "Health check passed"
              exit 0
            fi
            sleep 10
          done
          echo "Health check failed"
          exit 1

# Automated rollback script for use in pipelines
cat > scripts/rollback.sh << 'EOF'
#!/bin/bash
set -e

DEPLOYMENT=${1:-api}
NAMESPACE=${2:-production}
TIMEOUT=${3:-300}

echo "Rolling back $DEPLOYMENT in $NAMESPACE..."

# Get previous revision
kubectl rollout undo deployment/$DEPLOYMENT -n $NAMESPACE

# Wait for rollout
echo "Waiting for rollout to complete..."
kubectl rollout status deployment/$DEPLOYMENT -n $NAMESPACE --timeout=${TIMEOUT}s

# Verify health
echo "Verifying health..."
for i in {1..30}; do
  if curl -sf http://$DEPLOYMENT.$NAMESPACE.svc.cluster.local/health; then
    echo "✓ Health check passed"
    exit 0
  fi
  sleep 10
done

echo "✗ Health check failed after 5 minutes"
exit 1
EOF

chmod +x scripts/rollback.sh

Best Practices

DevOps Automation Principles:
  • Everything as Code: Infrastructure, configurations, policies - all in version control
  • Immutable Infrastructure: Replace, don't update. Never patch production servers
  • Automate Everything: If you do it twice, automate it
  • Fail Fast: Catch issues early in the pipeline
  • Single Source of Truth: Git should be the authoritative source
  • Security First: Shift security left into the development process

CI/CD Pipeline Stages:

  1. Commit: Pre-commit hooks, linting
  2. Build: Compile, package, containerize
  3. Test: Unit, integration, E2E tests
  4. Scan: Security, vulnerability, compliance
  5. Stage: Deploy to staging environment
  6. Release: Canary deployment to production
  7. Monitor: Observe, alert, feedback

Assessment

1. What is GitOps?

2. Which tool is commonly used for Infrastructure as Code?

3. What is the main benefit of canary deployments?

4. What does SAST stand for?

Answer Key: 1-B, 2-C, 3-B, 4-A

Resources