Autoscaling - Checkly Docs

Scale Checkly Agent pods automatically in relation to live load. This page covers the KEDA-based recipe; for static capacity planning, see Scaling and Redundancy.

Prerequisites

Prometheus V2 metrics are being ingested for your account — the only source for this gauge. See Exporting Metrics & Data via Prometheus V2.
Checkly Agents are deployed via the Checkly agent Helm chart (or an equivalent Deployment). See Kubernetes Deployment.
KEDA is installed in the cluster.

The signal

Checkly exposes the checkly_private_location_check_runs gauge through the Prometheus V2 exporter. Filtered by state and a private_location_slug_name, it provides the count of pending and currently-executing check runs in a single Private Location — the signal you drive replica count from. The relevant state values are:

queued — the check run has been scheduled but not yet picked up by an agent.
inflight — the check run is currently being executed by an agent.

The gauge is aggregated on a ~1 minute interval, so checks that start and finish within that window may be excluded — their impact on Private Location capacity is negligible.

KEDA `ScaledObject`

The ScaledObject below provides sensible defaults — adjust the bounds and scaling behavior to match your check workload.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: checkly-agent-autoscaler
spec:
  scaleTargetRef:
    namespace: <namespace_for_agent_deployment>
    name: <agent_deployment_name>
  minReplicaCount: 2
  maxReplicaCount: 10
  advanced:
    horizontalPodAutoscalerConfig:
      behavior:
        scaleUp:
          policies:
            - type: Pods
              value: 1
              periodSeconds: 60
        scaleDown:
          selectPolicy: Min
          policies:
            - type: Pods
              value: 1
              periodSeconds: 60
  triggers:
    - type: prometheus
      metadata:
        serverAddress: http://prometheus-k8s.monitoring.svc.cluster.local:9090
        metricName: checkly_private_location_check_runs
        threshold: "1"                # Match the agent's JOB_CONCURRENCY.
        query: sum(checkly_private_location_check_runs{state=~"queued|inflight", private_location_slug_name="<slug>"})

The query is scoped to a single Private Location by private_location_slug_name, so create one ScaledObject per Private Location.

If you deploy agents with the Checkly agent Helm chart, template the ScaledObject alongside your chart values so the autoscaler ships with the deployment.

For a Prometheus instance outside the cluster, add an authenticationRef pointing at a TriggerAuthentication resource with the appropriate credentials.

How many pods you’ll get

KEDA queries Prometheus on its polling interval and turns the result into a target pod count. With threshold: "1", that target is roughly the number of queued plus in-flight check runs — one pod per check. The pod count is then kept within minReplicaCount and maxReplicaCount. For example, with threshold: "1", minReplicaCount: 2, maxReplicaCount: 10:

Queued + in-flight check runs	Resulting pods
0	2 (idle floor)
1	2
3	3
7	7
20	10 (capped)

Tuning the bounds

threshold — set it to match the agent’s JOB_CONCURRENCY. The default JOB_CONCURRENCY is 1, so leave threshold: "1". A higher value packs more checks per pod and can cause scheduling delays for long-running checks.
minReplicaCount — keep at 2 or higher so a single agent failure doesn’t take the Private Location offline. See Scaling and Redundancy.
maxReplicaCount — must exceed your expected peak queued + in-flight check runs. If the cap is too low, queued check runs accumulate above it and are dropped after the 6-minute queue TTL.

If you set minReplicaCount: 0 to scale to zero when idle, cooldownPeriod becomes important — it controls how long KEDA waits after the trigger goes inactive before scaling the deployment down to zero.

Graceful termination

In-flight checks on a terminating pod are rerun on another agent after a 300-second timeout. Set terminationGracePeriodSeconds above this on the agent pod spec so an evicted pod has room to drain before SIGKILL:

spec:
  template:
    spec:
      terminationGracePeriodSeconds: 330    # Set to your longest-running check type; up to 1800 for Playwright Check Suites.

Maximum runtime by check type:

Check type	Maximum runtime
API, TCP, DNS, ICMP	30 seconds
Browser	4 minutes
Multistep	4 minutes
Playwright Check Suite	60 minutes

Verify

Confirm KEDA created the HPA and is reading the metric:

kubectl get scaledobject,hpa -n <namespace_for_agent_deployment>

Probe the signal directly:

sum(checkly_private_location_check_runs{state=~"queued|inflight", private_location_slug_name="<slug>"})

Schedule a burst of checks against the Private Location and watch the replica count climb toward maxReplicaCount, then settle back to minReplicaCount once the burst clears.

​The signal

​KEDA ScaledObject

​How many pods you’ll get

​Tuning the bounds

​Graceful termination

​Verify

​See also

The signal

KEDA `ScaledObject`

How many pods you’ll get

Tuning the bounds

Graceful termination

Verify

See also