Tools

Deep Health Checks Without the Risk: HMAC Signing in Guardian

Jared Brook

6 Minute Read

Health checks are one of those things that seem simple until they aren't. A basic /health endpoint that returns 200 OK tells you the process is running, but not much else. Is the database connection pool healthy? Are downstream dependencies responding? Is the cache warm? To answer those questions, your health endpoint needs to do real work and return real data - and that's where things get uncomfortable.

The more useful a health check response is, the more it reveals about your system's internals. Database connection strings, dependency latency figures, queue depths, cache hit rates - all valuable for diagnosing issues, all dangerous in the wrong hands. If your health endpoint is publicly reachable (and for external monitoring, it usually has to be), you've effectively published a reconnaissance guide for anyone who knows where to look.

This is the tension we kept running into with customers using our CfnGuardian, our open source AWS monitoring tool, for HTTP health checks. They wanted deep, informative checks but couldn't justify the exposure. So we built HMAC signed requests into Guardian's HTTP check Lambda.

The Problem with Deep Health Checks

Guardian has supported HTTP health checks for a long time - hit an endpoint, verify the status code, optionally match a regex against the response body. For most services, that's enough. But for production systems where you need to understand why something is degraded, not just that it's degraded, a shallow check leaves you flying blind.

The obvious answer is to add authentication to the health endpoint. But traditional approaches come with their own headaches. API keys in headers work, but a static key that never changes and travels in every request isn't much better than no key at all. OAuth tokens add complexity that's hard to justify for a monitoring endpoint. IP whitelisting breaks the moment your monitoring runs from a new Lambda execution environment or a different availability zone.

What we needed was a signing scheme that proves the request came from Guardian, resists replay attacks, and requires zero interactive authentication. HMAC fits that model precisely.

How It Works

When you enable HMAC signing on a Guardian HTTP check, the Lambda computes a signature for every request it sends. The signature covers the HTTP method, the URL path, a timestamp, a random nonce, the query string, and a SHA-256 hash of the request body. The shared secret used for signing lives in AWS Systems Manager Parameter Store as a SecureString, and Guardian's generated IAM role is automatically granted ssm:GetParameter access to that path.

Each request carries four additional headers (using a configurable prefix, defaulting to X-Health):

  • X-Health-Signature - the HMAC-SHA256 hex digest of the canonical string
  • X-Health-Key-Id - an identifier for the signing key
  • X-Health-Timestamp - the Unix epoch time when the request was signed
  • X-Health-Nonce - a random UUID to prevent replay

On the application side, verification is straightforward. Reconstruct the same canonical string from the incoming request, compute the HMAC with your copy of the shared secret, and compare. If you want replay protection (and you should), reject requests where the timestamp is more than a few minutes old and track nonces you've already seen.

The canonical string format is deliberately simple:

METHOD\nPATH\nTIMESTAMP\nNONCE\nQUERY\nBODY_HASH

No complex canonicalisation rules, no header sorting, no edge cases around URL encoding. It's designed to be easy to implement in any language your application happens to use.

One Endpoint, Two Behaviours

One of our customers took an approach we particularly liked. Rather than creating separate endpoints for shallow and deep checks, they built a single /health path that behaves differently based on whether the HMAC headers are present and valid.

An unsigned request gets the standard shallow response - a 200 OK with a minimal body. The same endpoint, when it receives valid HMAC headers from Guardian, returns a detailed JSON payload with database pool status, dependency latencies, recent error rates, and queue depths. From Guardian's perspective, it can match against both the status code and specific patterns in that rich response body using the existing BodyRegexMatch feature.

This pattern is clean because it doesn't require any routing changes to your application. Load balancers, CDNs, and uptime monitors that already hit /health keep working exactly as before. Only Guardian, with the shared secret, sees the full picture.

Configuring It

On the Guardian side, the configuration is three lines in your YAML:

Resources:
  Http:
  - Id: https://api.example.com/health
    StatusCode: 200
    HmacSecretSsm: /guardian/myapp/hmac-secret
    HmacKeyId: default
    HmacHeaderPrefix: X-Health

HmacSecretSsm is the only required field - it points to the SSM parameter holding the shared secret.

HmacKeyId defaults to default and is included in the headers so your application can support key rotation (serve two keys simultaneously during a transition window).

HmacHeaderPrefix defaults to X-Health but can be changed if those header names conflict with something in your stack.

The same configuration works for internal VPC-based checks:

Resources:
  InternalHttp:
  - Environment: Prod
    VpcId: vpc-1234
    Subnets: [subnet-abcd]
    Hosts:
    - Id: http://api.internal/health
      StatusCode: 200
      HmacSecretSsm: /guardian/myapp/hmac-secret

The Lambda caches the SSM secret in memory for ten minutes across warm invocations, so you're not paying for an SSM API call on every health check cycle.

Verifying Signatures in Your Application

Your application needs to verify the signature using the same shared secret stored in SSM. The verification logic is intentionally minimal - here's the core of it in Python:

import hmac, hashlib, time

def verify_guardian_request(request, secret, prefix="X-Health", max_age=300):
    signature = request.headers.get(f"{prefix}-Signature")
    key_id    = request.headers.get(f"{prefix}-Key-Id")
    timestamp = request.headers.get(f"{prefix}-Timestamp")
    nonce     = request.headers.get(f"{prefix}-Nonce")

    if not all([signature, key_id, timestamp, nonce]):
        return False

    if abs(int(timestamp) - time.time()) > max_age:
        return False

    body_hash = hashlib.sha256(request.body or b"").hexdigest()
    canonical = "\n".join([
        request.method,
        request.path,
        timestamp,
        nonce,
        request.query_string or "",
        body_hash,
    ])
    expected = hmac.new(secret.encode(), canonical.encode(), hashlib.sha256).hexdigest()
    return hmac.compare_digest(expected, signature)

Optionally, you can also track nonces (a short-lived cache or database table) to reject duplicates within the timestamp window. The timestamp check alone prevents the most common replay scenarios, but nonce tracking closes the gap entirely if your threat model calls for it. The compare_digest call is important - it performs a constant-time comparison that prevents timing attacks against the signature.

Why Not Just Use a WAF or IP Restrictions?

It's a fair question. AWS WAF can restrict access to health endpoints, and security groups can lock down internal checks. But both approaches have practical limitations for monitoring.

WAF rules operate at the edge, which means they work well for blocking external traffic but add complexity when your monitoring Lambda runs inside the same AWS account. You end up maintaining allow-lists of IP ranges that change as Lambda execution environments rotate. Security groups help for VPC-internal checks but don't apply to public endpoints at all.

HMAC signing works regardless of network topology. The Lambda could be running in any subnet, any availability zone, any region - the signature is what proves identity, not the source IP. It's also self-contained: no external dependencies beyond SSM, no additional AWS services to configure, no firewall rules to keep in sync.

Getting Started

If you're already running Guardian through our managed service, just reach out to your account team. We'll handle the Guardian configuration update and guide you through the endpoint changes needed on your side.

If you're managing Guardian yourself, the feature is available in CfnGuardian v0.12.1 and the corresponding aws-lambda-http-check update. Enabling it is a configuration change - no infrastructure migration required. Add your shared secret as a SecureString in SSM, point HmacSecretSsm at that parameter in your Guardian YAML, and implement the verification logic on your health endpoint.

New to Guardian? It's open source and free to use. Head to our Github repository to get started, and follow the setup instructions to deploy it into your AWS environment.

Deep health checks give you the diagnostic detail you actually need when something goes wrong - HMAC signing means you no longer have to choose between visibility and security.

Contact us if you'd like help designing deep health check strategies for your applications, or if you're interested in what Guardian can do for your monitoring stack.

Further Reading

Start with our introduction to Guardian's capabilities Streamline AWS Monitoring with Guardian and how to Enhance AWS notifications with Autonomous Guardian Stacks in Slack.



More Blog Posts