Webhooks
Intro
A webhook is an HTTP callback: when an event occurs in a producer system, it sends an HTTP POST with event data to a URL the consumer registered in advance. This inverts the communication direction compared to polling — instead of the consumer repeatedly asking "anything new?", the producer pushes notifications in near real-time. You reach for webhooks when you need low-latency, push-based integration between systems that communicate over HTTP, especially across organizational boundaries where shared message brokers are impractical (payment providers, source control platforms, SaaS integrations).
The mechanism is straightforward: the consumer registers a callback URL with the producer, optionally negotiating a shared secret for signature verification. When the producer detects a relevant event, it serializes a payload (usually JSON), signs it with HMAC-SHA256 using the shared secret, and sends an HTTP POST to the registered URL. The consumer validates the signature, processes the payload, and returns 200 OK to acknowledge receipt. If the producer receives no success response within a timeout, it retries with exponential backoff.
sequenceDiagram
participant Consumer as Consumer Service
participant Producer as Producer Service
Consumer->>Producer: Register callback URL + shared secret
Note over Producer: Event occurs internally
Producer->>Consumer: POST /webhook with JSON payload + HMAC signature header
Consumer->>Consumer: Verify HMAC signature
Consumer->>Consumer: Check idempotency key and process event
Consumer-->>Producer: 200 OK
Note over Producer: If no 2xx within timeout retry with exponential backoffWebhooks complement Event-Driven Architecture — they are the HTTP-native way to deliver events between systems that do not share a message broker. For internal service-to-service communication within the same platform, Message Queues are usually a better fit because they provide built-in durability, fan-out, and back-pressure.
Receiver Example (ASP.NET Core)
A production-quality webhook receiver needs three things: signature verification, idempotency, and fast acknowledgment. This example receives GitHub-style webhooks signed with HMAC-SHA256.
using System.Security.Cryptography;
using System.Text;
using Microsoft.AspNetCore.Mvc;
var builder = WebApplication.CreateBuilder(args);
var app = builder.Build();
app.MapPost("/webhooks/github", async (
HttpContext ctx,
[FromServices] IWebhookProcessor processor,
CancellationToken ct) =>
{
// 1. Read raw body (need exact bytes for HMAC verification)
ctx.Request.EnableBuffering();
var body = await new StreamReader(ctx.Request.Body).ReadToEndAsync(ct);
// 2. Verify HMAC-SHA256 signature
var signature = ctx.Request.Headers["X-Hub-Signature-256"].FirstOrDefault();
var secret = builder.Configuration["Webhooks:GitHubSecret"]!;
if (!VerifySignature(body, signature, secret))
return Results.Unauthorized();
// 3. Extract idempotency key and event type
var deliveryId = ctx.Request.Headers["X-GitHub-Delivery"].FirstOrDefault();
var eventType = ctx.Request.Headers["X-GitHub-Event"].FirstOrDefault();
if (string.IsNullOrEmpty(deliveryId) || string.IsNullOrEmpty(eventType))
return Results.BadRequest();
// 4. Acknowledge fast, process async
// In production: enqueue to durable storage, return 200 immediately
await processor.EnqueueAsync(eventType, deliveryId, body, ct);
return Results.Ok();
});
app.Run();
static bool VerifySignature(string payload, string? signatureHeader, string secret)
{
if (string.IsNullOrEmpty(signatureHeader) || !signatureHeader.StartsWith("sha256="))
return false;
var expected = signatureHeader["sha256=".Length..];
var keyBytes = Encoding.UTF8.GetBytes(secret);
var payloadBytes = Encoding.UTF8.GetBytes(payload);
var hash = HMACSHA256.HashData(keyBytes, payloadBytes);
var computed = Convert.ToHexStringLower(hash);
return CryptographicOperations.FixedTimeEquals(
Encoding.UTF8.GetBytes(computed),
Encoding.UTF8.GetBytes(expected));
}
Key implementation decisions:
EnableBuffering— allows reading the raw request body for HMAC computation before model binding.CryptographicOperations.FixedTimeEquals— constant-time comparison prevents timing attacks on signature verification.- Enqueue, then acknowledge — return
200quickly and process the event asynchronously from a durable queue. This prevents producer timeouts and decouples processing latency from delivery acknowledgment.
Tradeoffs: Webhooks vs Polling vs SSE vs WebSockets
| Approach | Direction | Latency | Complexity | Connection | Best fit |
|---|---|---|---|---|---|
| Webhooks | Push (server to server) | Near real-time | Moderate (retry, signatures, idempotency) | Per-event HTTP request | Cross-org integrations, SaaS event delivery |
| Polling | Pull (consumer to producer) | Interval-bound delay | Low | Stateless per-request | Simple integrations, systems without webhook support |
| SSE | Push (server to browser) | Real-time | Low | Long-lived HTTP stream | One-directional browser notifications, dashboards |
| WebSockets | Bidirectional | Real-time | High (connection management, scaling) | Persistent TCP | Chat, collaborative editing, real-time bidirectional flows |
Decision heuristic:
- Pick webhooks for server-to-server push across trust boundaries (Stripe payment events, GitHub repository events, CRM notifications).
- Pick polling when the producer has no webhook support, or when near real-time latency is not required and simplicity matters most.
- Pick SSE when you need server-to-browser push without the complexity of WebSockets.
- Pick WebSockets when you need bidirectional real-time communication (client sends and receives).
Pitfalls
1) At-Least-Once Delivery Without Idempotency
- What goes wrong: the producer retries after a timeout and the consumer processes the same event twice — double-charging a payment, sending duplicate notifications, or creating duplicate records.
- Why it happens: network timeouts, producer retries, and load balancer replays mean the same webhook may arrive more than once. Exactly-once delivery over HTTP is not achievable.
- Mitigation: use the delivery ID (e.g.,
X-GitHub-Delivery, Stripeevent.id) as an idempotency key. Store processed IDs in durable storage and check before processing. Make state transitions conditional (UPDATE ... WHERE status = 'pending').
2) Slow Processing Causes Timeout and Retry Storm
- What goes wrong: the consumer does heavy work synchronously in the webhook handler, exceeds the producer's timeout (typically 5-30 seconds), and triggers retries that compound the load.
- Why it happens: inline database writes, external API calls, or computation in the request path.
- Mitigation: acknowledge the webhook immediately (return
200) and enqueue the payload for async processing. Use a background worker (hosted service, queue consumer) for the actual business logic.
3) Missing or Weak Signature Verification
- What goes wrong: an attacker sends forged webhook payloads to the endpoint, triggering unauthorized actions (fraudulent refunds, fake deployment triggers, data manipulation).
- Why it happens: the endpoint accepts any POST without verifying the HMAC signature, or uses a non-constant-time comparison vulnerable to timing attacks.
- Mitigation: always verify HMAC signatures using constant-time comparison. Reject requests with missing or invalid signatures. Add timestamp validation to prevent replay attacks (reject payloads older than 5 minutes).
4) Endpoint Availability and Missed Events
- What goes wrong: if the consumer is down during delivery and the producer exhausts its retry budget, events are permanently lost.
- Why it happens: webhook producers typically retry for a limited window (hours to days) and then give up.
- Mitigation: implement a reconciliation mechanism — periodically poll the producer's event API to detect gaps. Monitor webhook delivery metrics. Some producers offer event replay or dead-letter inspection (use them).
Questions
Expected answer:
- Verify HMAC signature on every incoming webhook to reject forged payloads.
- Use the provider's event ID as an idempotency key; store processed IDs durably.
- Return
200immediately and process asynchronously from a durable queue. - Implement reconciliation: periodically call the provider's event list API to detect any missed deliveries.
- Make state transitions conditional (e.g., only transition an order to "paid" if it is currently "pending").
Why this matters: Tests understanding of at-least-once semantics, idempotency, and defense-in-depth for webhook reliability.
Expected answer:
- Webhooks when systems cross organizational boundaries and cannot share infrastructure (SaaS integrations, third-party providers).
- Webhooks when the producer is an external system you do not control.
- Message broker when both services are internal, you need durable fan-out, back-pressure, and replay.
- Message broker when you need ordering guarantees per partition/key.
- Hybrid is common: receive external webhooks at the edge and republish to an internal broker for further processing.
Why this matters: Tests whether the candidate understands the operational boundary between push-over-HTTP and broker-based messaging.
Expected answer:
- Include a timestamp in the signed payload or as a signed header.
- Verify the timestamp is within an acceptable window (e.g., reject events older than 5 minutes).
- Use the delivery ID as an idempotency key to reject reprocessing even within the time window.
- Verify the HMAC signature covers both the payload and the timestamp so neither can be tampered with independently.
Why this matters: Tests depth of security thinking beyond basic signature verification.
References
- Validating webhook deliveries (GitHub Docs) — Official guide to HMAC-SHA256 signature verification for GitHub webhooks, with language-specific examples.
- Webhook best practices (GitHub Docs) — Production checklist: respond fast, use async processing, handle redeliveries, monitor failures.
- Check webhook signatures (Stripe Docs) — Stripe's approach to webhook security including timestamp-based replay protection.
- Building reliable webhooks (Stripe Engineering Blog) — Practitioner deep-dive on designing webhook infrastructure at scale: retries, idempotency, and failure modes.
- CloudEvents HTTP Webhook Specification (CNCF) — Vendor-neutral CNCF standard for webhook delivery, including the OPTIONS-based subscription handshake and abuse protection.
- Standard Webhooks — Open specification for webhook signing, verification, and delivery semantics backed by Svix, aiming to standardize webhook behavior across providers.
- Asynchronous messaging options (Microsoft Learn) — Azure architecture guide comparing webhooks, queues, event hubs, and event grid for event delivery.
- Retry pattern (Azure Architecture Center) — Canonical guidance on retry strategies with exponential backoff, directly applicable to webhook sender retry loops.