Kubernetes Deployment

Deploy Checkly Private Locations on Kubernetes for production-scale monitoring with built-in orchestration, scaling, and reliability features. This guide covers Helm chart deployment, manual manifests, and production best practices.

Prerequisites

Before deploying on Kubernetes, ensure you have:

Kubernetes Environment

Running Kubernetes cluster (v1.19+)
kubectl configured and connected
Sufficient cluster resources
Network policies configured (if required)

Checkly Setup

Private Location created in Checkly
API key for the Private Location
Access to the checkly-k8s repository

Quick Start with Helm

The fastest way to deploy Checkly Agents on Kubernetes is using our official Helm chart.

Get the Helm Chart

# Clone the official repository
git clone https://github.com/checkly/checkly-k8s.git
cd checkly-k8s

# Verify chart structure
ls -la helm-chart/

Deploy with Helm

# Install with your API key
helm install checkly-agent \
  --set apiKeySecret.apiKey="pl_your_api_key_here" \
  ./helm-chart

# Install with custom values
helm install checkly-agent \
  --set apiKeySecret.apiKey="pl_your_api_key_here" \
  --set replicaCount=3 \
  --set resources.requests.memory="2Gi" \
  --set resources.requests.cpu="1" \
  ./helm-chart

# Install in custom namespace
kubectl create namespace checkly
helm install checkly-agent \
  --namespace checkly \
  --set apiKeySecret.apiKey="pl_your_api_key_here" \
  ./helm-chart

Verify Deployment

# Check pod status
kubectl get pods -l app.kubernetes.io/name=checkly-agent

# View agent logs
kubectl logs -l app.kubernetes.io/name=checkly-agent --tail=50

# Check agent connectivity
kubectl exec -it deployment/checkly-agent -- curl -I https://agent.checklyhq.com

The Helm chart automatically creates API key secrets, configures resource limits, and sets up pod security policies for production use.

Manual Kubernetes Manifests

For more control or custom environments, deploy using individual Kubernetes manifests.

1. Create Namespace (Optional)

# checkly-namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: checkly
  labels:
    name: checkly
    purpose: monitoring

kubectl apply -f checkly-namespace.yaml

2. Create API Key Secret

# api-key-secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: checkly-api-key
  namespace: checkly
type: Opaque
data:
  # Base64 encoded API key: echo -n "pl_your_api_key" | base64
  api-key: cGxfeW91cl9hcGlfa2V5X2hlcmU=

# Create secret directly (easier method)
kubectl create secret generic checkly-api-key \
  --namespace=checkly \
  --from-literal=api-key="pl_your_api_key_here"

3. Create Deployment

# checkly-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: checkly-agent
  namespace: checkly
  labels:
    app: checkly-agent
    version: "6.0.3"
spec:
  replicas: 2
  selector:
    matchLabels:
      app: checkly-agent
  template:
    metadata:
      labels:
        app: checkly-agent
      annotations:
        prometheus.io/scrape: "false"
    spec:
      containers:
      - name: checkly-agent
        image: checkly/agent:6.0.3
        imagePullPolicy: IfNotPresent
        env:
        - name: API_KEY
          valueFrom:
            secretKeyRef:
              name: checkly-api-key
              key: api-key
        - name: JOB_CONCURRENCY
          value: "3"
        - name: LOG_LEVEL
          value: "INFO"
        resources:
          requests:
            memory: "1Gi"
            cpu: "500m"
          limits:
            memory: "4Gi"
            cpu: "2"
        livenessProbe:
          exec:
            command:
            - /bin/sh
            - -c
            - "ps aux | grep -v grep | grep node"
          initialDelaySeconds: 30
          periodSeconds: 30
        readinessProbe:
          exec:
            command:
            - /bin/sh
            - -c
            - "ps aux | grep -v grep | grep node"
          initialDelaySeconds: 10
          periodSeconds: 10
        securityContext:
          allowPrivilegeEscalation: false
          runAsNonRoot: true
          runAsUser: 1000
          runAsGroup: 1000
          readOnlyRootFilesystem: false
          capabilities:
            drop:
            - ALL
      restartPolicy: Always
      terminationGracePeriodSeconds: 30
      securityContext:
        fsGroup: 1000

kubectl apply -f checkly-deployment.yaml

Production Configuration

Resource Planning

Configure appropriate resource requests and limits based on your workload:

# High-capacity agent configuration
resources:
  requests:
    memory: "2Gi"
    cpu: "1"
  limits:
    memory: "8Gi"
    cpu: "4"
env:
- name: JOB_CONCURRENCY
  value: "8"  # 8GB / 1GB per browser check

# API-only agent configuration  
resources:
  requests:
    memory: "500Mi"
    cpu: "250m"
  limits:
    memory: "2Gi"
    cpu: "1"
env:
- name: JOB_CONCURRENCY
  value: "10"  # 2GB / 0.15GB per API check

Node Affinity and Scheduling

# Pin agents to specific nodes
spec:
  template:
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: checkly.io/agent-node
                operator: In
                values:
                - "true"
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - checkly-agent
              topologyKey: kubernetes.io/hostname
      tolerations:
      - key: "checkly.io/dedicated"
        operator: "Equal"
        value: "agent"
        effect: "NoSchedule"

Pod Disruption Budget

# pod-disruption-budget.yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: checkly-agent-pdb
  namespace: checkly
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: checkly-agent

Horizontal Pod Autoscaler

# hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: checkly-agent-hpa
  namespace: checkly
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: checkly-agent
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60

Advanced Configuration

ConfigMap for Agent Settings

# agent-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: checkly-agent-config
  namespace: checkly
data:
  JOB_CONCURRENCY: "5"
  LOG_LEVEL: "INFO"
  USE_OS_DNS_RESOLVER: "true"
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: checkly-agent
spec:
  template:
    spec:
      containers:
      - name: checkly-agent
        envFrom:
        - configMapRef:
            name: checkly-agent-config
        env:
        - name: API_KEY
          valueFrom:
            secretKeyRef:
              name: checkly-api-key
              key: api-key

Network Policies

# network-policy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: checkly-agent-netpol
  namespace: checkly
spec:
  podSelector:
    matchLabels:
      app: checkly-agent
  policyTypes:
  - Egress
  egress:
  # Allow DNS resolution
  - to: []
    ports:
    - protocol: UDP
      port: 53
  # Allow HTTPS to Checkly API
  - to: []
    ports:
    - protocol: TCP
      port: 443
  # Allow HTTP/HTTPS for checks
  - to: []
    ports:
    - protocol: TCP
      port: 80
    - protocol: TCP
      port: 443

Monitoring and Observability

# service-monitor.yaml (for Prometheus)
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: checkly-agent
  namespace: checkly
spec:
  selector:
    matchLabels:
      app: checkly-agent
  endpoints:
  - port: metrics
    interval: 30s
    path: /metrics
---
apiVersion: v1
kind: Service
metadata:
  name: checkly-agent-metrics
  namespace: checkly
  labels:
    app: checkly-agent
spec:
  ports:
  - name: metrics
    port: 9090
    targetPort: 9090
  selector:
    app: checkly-agent

Helm Chart Customization

Custom Values File

# values-production.yaml
replicaCount: 3

image:
  repository: checkly/agent
  tag: "6.0.3"
  pullPolicy: IfNotPresent

apiKeySecret:
  apiKey: "pl_your_production_key"
  
env:
  JOB_CONCURRENCY: "5"
  LOG_LEVEL: "INFO"
  USE_OS_DNS_RESOLVER: "true"

resources:
  requests:
    memory: "2Gi"
    cpu: "1"
  limits:
    memory: "6Gi"
    cpu: "3"

nodeSelector:
  checkly.io/agent-node: "true"

tolerations:
- key: "checkly.io/dedicated"
  operator: "Equal"
  value: "agent"
  effect: "NoSchedule"

affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
      podAffinityTerm:
        labelSelector:
          matchExpressions:
          - key: app.kubernetes.io/name
            operator: In
            values:
            - checkly-agent
        topologyKey: kubernetes.io/hostname

podDisruptionBudget:
  enabled: true
  minAvailable: 1

monitoring:
  enabled: true
  serviceMonitor:
    enabled: true

Deploy with custom values:

helm install checkly-agent -f values-production.yaml ./helm-chart

Upgrade Strategy

# Upgrade with rolling deployment
helm upgrade checkly-agent \
  --set image.tag="6.1.0" \
  --reuse-values \
  ./helm-chart

# Rollback if needed
helm rollback checkly-agent 1

Security Best Practices

Pod Security Standards

# pod-security-policy.yaml
apiVersion: v1
kind: Pod
metadata:
  name: checkly-agent
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    runAsGroup: 1000
    fsGroup: 1000
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: checkly-agent
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: false
      runAsNonRoot: true
      runAsUser: 1000
      runAsGroup: 1000
      capabilities:
        drop:
        - ALL

Secret Management with External Secrets

# external-secret.yaml (using External Secrets Operator)
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: checkly-api-key-external
  namespace: checkly
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: vault-secret-store
    kind: SecretStore
  target:
    name: checkly-api-key
    creationPolicy: Owner
  data:
  - secretKey: api-key
    remoteRef:
      key: secret/checkly
      property: api-key

Troubleshooting

Pods Not Starting

Common issues and solutions:

# Check pod status and events
kubectl describe pod -l app=checkly-agent -n checkly

# Check resource constraints
kubectl top pods -n checkly

# Verify secret exists
kubectl get secret checkly-api-key -n checkly -o yaml

# Check image pull issues
kubectl get events -n checkly --sort-by=.metadata.creationTimestamp

Common solutions:

Verify API key is correctly base64 encoded in secret
Check resource requests vs available cluster capacity
Ensure image tag exists and is accessible
Verify namespace exists and has proper RBAC

Agent Connectivity Issues

Diagnostic commands:

# Test connectivity from pod
kubectl exec -it deployment/checkly-agent -n checkly -- \
  curl -I https://agent.checklyhq.com

# Check DNS resolution
kubectl exec -it deployment/checkly-agent -n checkly -- \
  nslookup agent.checklyhq.com

# View agent logs
kubectl logs deployment/checkly-agent -n checkly --tail=100 -f

# Check network policies
kubectl describe networkpolicy -n checkly

Common solutions:

Verify egress network policies allow HTTPS to agent.checklyhq.com
Check if corporate firewall blocks outbound connections
Ensure DNS resolution works for external domains
Verify proxy configuration if required

Performance Issues

Monitoring and optimization:

# Monitor resource usage
kubectl top pods -n checkly

# Check HPA status
kubectl describe hpa checkly-agent-hpa -n checkly

# View detailed metrics
kubectl exec -it deployment/checkly-agent -n checkly -- \
  cat /proc/meminfo

# Check for OOMKilled pods
kubectl get events -n checkly | grep OOMKilled

Optimization strategies:

Adjust JOB_CONCURRENCY based on available memory
Increase resource limits if pods are being throttled
Scale horizontally with more replicas
Use node affinity to place agents on appropriate nodes

Monitoring and Alerting

Prometheus Metrics

Monitor Checkly Agents with custom metrics:

# prometheus-rules.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: checkly-agent-rules
  namespace: checkly
spec:
  groups:
  - name: checkly-agent
    rules:
    - alert: ChecklyAgentDown
      expr: up{job="checkly-agent"} == 0
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "Checkly Agent is down"
        description: "Checkly Agent {{ $labels.instance }} has been down for more than 5 minutes"
    
    - alert: ChecklyAgentHighMemory
      expr: container_memory_usage_bytes{pod=~"checkly-agent-.*"} / container_spec_memory_limit_bytes{pod=~"checkly-agent-.*"} > 0.9
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: "Checkly Agent high memory usage"
        description: "Checkly Agent {{ $labels.pod }} memory usage is above 90%"

Logging Configuration

# fluent-bit-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-config
  namespace: checkly
data:
  fluent-bit.conf: |
    [INPUT]
        Name tail
        Path /var/log/containers/checkly-agent-*.log
        Parser docker
        Tag checkly.agent.*
        Refresh_Interval 5
        
    [OUTPUT]
        Name forward
        Match checkly.agent.*
        Host log-aggregator.company.com
        Port 24224

Next Steps

Scaling Guide

Learn capacity planning and scaling strategies for production workloads

Agent Configuration

Advanced agent configuration options and environment variables

Proxy Setup

Configure HTTP proxies for enterprise Kubernetes environments

Monitoring as Code

Integrate Private Locations with your monitoring-as-code workflow

Repository: Find complete examples and the latest Helm chart at github.com/checkly/checkly-k8s

Getting Started

Detect

Communicate

Resolve

Integrations

Prerequisites

Quick Start with Helm

Get the Helm Chart

Deploy with Helm

Verify Deployment

Manual Kubernetes Manifests

1. Create Namespace (Optional)

2. Create API Key Secret

3. Create Deployment

Production Configuration

Resource Planning

Node Affinity and Scheduling

Pod Disruption Budget

Horizontal Pod Autoscaler

Advanced Configuration

ConfigMap for Agent Settings

Network Policies

Monitoring and Observability

Helm Chart Customization

Custom Values File

Upgrade Strategy

Security Best Practices

Pod Security Standards

Secret Management with External Secrets

Troubleshooting

Monitoring and Alerting

Prometheus Metrics

Logging Configuration

Next Steps

Scaling Guide

Agent Configuration

Proxy Setup

Monitoring as Code

Getting Started

Detect

Communicate

Resolve

Integrations

​Prerequisites

​Quick Start with Helm

​Get the Helm Chart

​Deploy with Helm

​Verify Deployment

​Manual Kubernetes Manifests

​1. Create Namespace (Optional)

​2. Create API Key Secret

​3. Create Deployment

​Production Configuration

​Resource Planning

​Node Affinity and Scheduling

​Pod Disruption Budget

​Horizontal Pod Autoscaler

​Advanced Configuration

​ConfigMap for Agent Settings

​Network Policies

​Monitoring and Observability

​Helm Chart Customization

​Custom Values File

​Upgrade Strategy

​Security Best Practices

​Pod Security Standards

​Secret Management with External Secrets

​Troubleshooting

​Monitoring and Alerting

​Prometheus Metrics

​Logging Configuration

​Next Steps

Scaling Guide

Agent Configuration

Proxy Setup

Monitoring as Code

Prerequisites

Quick Start with Helm

Get the Helm Chart

Deploy with Helm

Verify Deployment

Manual Kubernetes Manifests

1. Create Namespace (Optional)

2. Create API Key Secret

3. Create Deployment

Production Configuration

Resource Planning

Node Affinity and Scheduling

Pod Disruption Budget

Horizontal Pod Autoscaler

Advanced Configuration

ConfigMap for Agent Settings

Network Policies

Monitoring and Observability

Helm Chart Customization

Custom Values File

Upgrade Strategy

Security Best Practices

Pod Security Standards

Secret Management with External Secrets

Troubleshooting

Monitoring and Alerting

Prometheus Metrics

Logging Configuration

Next Steps