How We Built ClickzProtect: A Real-Time Ad Fraud Detection Engine With Postgres and Edge Functions

A client showed me his Q1 Google Ads invoice last March. A meaningful chunk of his quarter's spend had gone to clicks from the same three IP ranges, hitting the landing page at 3am, never converting, never even scrolling. He wasn't angry. Just tired. That's when we started building ClickzProtect.

This post is the actual architecture. Not a sanitized retro. Dead ends, the latency war, the schema, the edge function — all of it. Competitors (ClickCease, Lunio, the rest) never publish architecture detail, so I figured we'd be the ones who did.

Fair warning: I'm telling you about the three weeks we burned on bloom filters too.

The Problem We Were Solving

Click fraud is boring on paper and brutal in practice. Bots, click farms, competitor sabotage, and the long tail of accidentally-malicious browser extensions all generate clicks that look real to Google's billing system. The advertiser pays. Nothing converts. Repeat.

The detection problem isn't "are bots clicking?" — that part's easy. The hard problem is doing it fast enough that you can block the click before the advertiser pays for it, accurate enough that you don't false-positive a real customer on a coffee shop wifi, and cheap enough that the protection isn't more expensive than the fraud itself.

Our target from day one: under 200ms end-to-end, false-positive rate under 0.5%, and infra cost low enough that the protection isn't more expensive than the fraud it stops. Hard to hit all three at once.

What We Tried First (And Why It Was Wrong)

The first instinct was bloom filters. Every fraud-detection blog post on the internet mentions bloom filters. They're memory-efficient, they're fast, they sound smart at parties. We burned three weeks on bloom filters before realizing they weren't the right tool. Honestly a bit embarrassing in hindsight.

Here's the problem. Bloom filters are great when you have a known bad set and you want to check membership cheaply. Click fraud isn't that problem. The "bad set" is constantly mutating — IPs rotate, fingerprints shift, attack patterns change hourly. We weren't trying to ask "is this IP on the badlist?" We were asking "given this IP plus this fingerprint plus this click velocity plus this referrer pattern, what's the fraud probability?"

That's a scoring problem. Not a membership problem. Bloom filters don't score. They answer yes or no.

We also briefly tried a managed fraud API from a vendor I won't name. Latency averaged 380ms. We were over budget before we'd added a single line of our own logic. Killed it after a week.

The real architecture came together once we stopped trying to be clever and just wrote down the actual signals on a whiteboard. IP reputation. Device fingerprint repetition. Click-to-conversion velocity. Referrer authenticity. Time-of-day distribution. Five signals. Each one cheap. Combined into a single 0-100 score.

The Architecture

Three layers. That's it.

Ingestion layer — Cloudflare Workers. Every click hits a Worker first. The Worker parses the request, extracts headers, generates a fingerprint hash, and writes a raw click event to a queue. No scoring at this layer. The job is to swallow the click in under 30ms and get out of the way.

Storage layer — Postgres on Neon. One Postgres database holds everything: raw click events, fingerprint history, IP reputation cache, scoring rules, and the per-advertiser config. We considered splitting this — a hot store for fingerprints, a warm store for analytics. We didn't. One database is one place to debug, one place to back up, one place to scale. We'll split it when we have to. We don't have to yet.

Strong opinion: most early-stage teams would save six months by picking Postgres and not looking around. Skip Redis. Skip DynamoDB. Skip the multi-database split from a Big Tech engineering blog. You're not Big Tech.

Scoring layer — edge function on the same Worker. After ingestion writes the event, the same Worker invocation calls a scoring function. That function reads the fingerprint history from Postgres (via Neon's serverless driver, which keeps the connection alive across Worker invocations), runs the additive scoring, and returns a 0-100 score plus a verdict: allow, shadow-flag, or block.

The schema is small enough to fit in this post:

-- The core fingerprints table. One row per (ip, fp_hash) seen.
create table fingerprints (
  id            bigserial primary key,
  advertiser_id uuid not null,
  ip            inet not null,
  fp_hash       text not null,
  user_agent    text,
  first_seen    timestamptz not null default now(),
  last_seen     timestamptz not null default now(),
  click_count   int  not null default 1,
  convert_count int  not null default 0,
  reputation    smallint not null default 50, -- 0..100
  unique (advertiser_id, ip, fp_hash)
);

create index fingerprints_lookup
  on fingerprints (advertiser_id, ip, fp_hash)
  include (reputation, click_count, convert_count, last_seen);

-- Raw click events. Append-only. Partitioned monthly.
create table click_events (
  id            bigserial,
  advertiser_id uuid not null,
  ip            inet not null,
  fp_hash       text not null,
  ts            timestamptz not null default now(),
  score         smallint,
  verdict       text check (verdict in ('allow','flag','block')),
  raw_headers   jsonb
) partition by range (ts);

The covering index on fingerprints_lookup is doing most of the heavy lifting. Postgres reads the row out of the index without touching the heap, which gets the lookup to 4-7ms inside the Worker. We didn't design this on the first try — we shipped a naive index, watched p95 explode at 1M clicks/day, and rebuilt it on a Sunday.

The actual fingerprint hash is the boring part everyone overthinks. We hash a normalized concatenation of user-agent, accept-language, accept-encoding, screen dimensions (when a JS callback fires on the landing page), timezone, and the canvas fingerprint we collect on first visit. SHA-256, truncated to 16 bytes. Done. The novelty isn't in the fingerprint — it's in what you do with it.

The Edge Function

The scoring logic lives in TypeScript on the Worker. Slightly simplified:

export async function scoreClick(req: ClickRequest, db: Sql): Promise<Verdict> {
  const fp = await db<FpRow[]>`
    select reputation, click_count, convert_count, last_seen
    from fingerprints
    where advertiser_id = ${req.advertiserId}
      and ip = ${req.ip}
      and fp_hash = ${req.fpHash}
    limit 1
  `;

  const row = fp[0];
  let score = 50; // start neutral

  if (row) {
    // Repeat clicks from same fp with no conversions = bad
    const cvr = row.convert_count / Math.max(row.click_count, 1);
    if (row.click_count > 5 && cvr < 0.01) score += 25;
    if (row.click_count > 20 && cvr === 0) score += 15;

    // Reputation pulled from prior scoring runs
    score += (50 - row.reputation) * 0.4;
  }

  // Velocity check: more than N clicks from this IP in last 60s = bad
  const burst = await db<{ c: number }[]>`
    select count(*)::int as c from click_events
    where ip = ${req.ip} and ts > now() - interval '60 seconds'
  `;
  if (burst[0].c > 8) score += 20;

  score = Math.max(0, Math.min(100, score));
  const verdict = score >= 85 ? 'block' : score >= 60 ? 'flag' : 'allow';
  return { score, verdict };
}

That's the whole thing. No ML model. No ensemble. No transformer. Just additive rules with weights we tune from real data. We tried an XGBoost classifier in month two and it scored worse than the rules — turns out when you only have five signals and they're already well-chosen, a tree model is just an expensive way to get to the same answer.

Take it or leave it: most fraud detection doesn't need ML. It needs honest signals and a human willing to tune weights weekly.

The Latency Fight: 320ms to 180ms

First version came in at 320ms p95. Above budget. Here's how we got it down.

Cut 1 (320ms → 260ms): Connection pooling. We were creating a new Postgres connection per Worker invocation. Switching to Neon's serverless driver with a pooled connection saved 60ms instantly. Embarrassing miss. The kind of thing a senior engineer would've caught in review, except we don't have senior engineers — we have us.

Cut 2 (260ms → 220ms): Covering index. The original fingerprints table had separate indexes on each lookup column. Changed to a single covering index with INCLUDE on the columns the scoring function reads. Postgres now answers the lookup from the index alone. 40ms gone.

Cut 3 (220ms → 195ms): Killed an unnecessary roundtrip. We were writing the raw click event before scoring it, then updating the row with the score. Two writes per click. Now we score first and write once. 25ms recovered.

Cut 4 (195ms → 180ms): Worker warmup. Cloudflare Workers cold-start in 5-15ms when they're cold. We added a Cron Trigger that pings every Worker every 90 seconds. Hacky? Sure. Effective? Also yes. 15ms off the p95 because the cold-start tail flattens out.

We're stuck at 180ms now. The remaining budget is mostly TLS handshake and network round-trip — physics, basically. Could probably squeeze to 150ms with a custom protocol, but 180ms is good enough that I don't want to spend the time. Pareto principle. Move on.

Results

After several months in production:

180ms p95 latency end-to-end on the scoring path, down from 320ms at first launch
Sub-0.5% false-positive rate measured against advertiser-confirmed legitimate conversions
Postgres instance peaks well under capacity during traffic spikes — nowhere near needing to split the database

When we hit a meaningfully larger click volume, we may revisit the architecture. Not before.

We also wrote about how this fits into the broader ops setup in our post on running a multi-product studio with a 1 founder + 1 manager team — short version: the same Cloudflare/Postgres/Railway stack runs every product, including JustAnalytics and VeloCalls.

What We'd Change

Three things, if I were starting over.

Skip the bloom filter detour. Three weeks of engineering time we never got back. The lesson: when the problem is scoring, don't reach for a membership data structure just because it sounds smart in a Hacker News thread.

Make the rules engine data, not code. Right now the scoring weights are hardcoded in TypeScript. Every tweak is a deploy. We're migrating to a scoring_rules table that the Worker reads on each invocation (cached for 60 seconds). Once that ships, my non-engineer co-founder can tune thresholds at 11pm on a Sunday without paging me.

Push scoring into Postgres. The TypeScript scoring function works fine, but a Postgres function (PL/pgSQL) doing the same job would save the 30-40ms we currently spend round-tripping the result back to the Worker for the verdict. The Worker would just call select score_click(...) and forward the verdict. We'll get there.

The architecture is right. The path we took to get here was longer than it needed to be. That's usually how it goes when you're learning the problem at the same time you're building the solution.

If you want to build click fraud detection system logic on top of ours — or you're curious about the rest of how we run things — there's more in the Velocity Digital Labs blog, and ClickzProtect has a 14-day free tier you can point at your real ad spend.

Frequently Asked Questions

How do you build a click fraud detection system without false-positive churn?

Pick three signals you actually trust — IP reputation, device fingerprint repetition, and click-to-conversion velocity — and weight them additively, not multiplicatively. We score on a 0-100 scale and only auto-block above 85. The 60-85 band gets shadow-flagged for review, never auto-blocked. Most teams trying to build click fraud detection systems faceplant because they over-trust a single signal. Layered scoring with a human-review band is what makes the thing survivable in production.

Why did you pick Postgres over Redis for fingerprint storage?

Because Postgres handled the read pattern fine and we didn't want a second stateful system to babysit. We benchmarked Redis at sub-millisecond reads versus Postgres at 4-7ms with a covering index, and 4-7ms was well inside our 180ms budget. Skip Redis for this — Postgres handles it fine, especially with Neon's connection pooler. One database, one backup story, one place to debug. Redis would've added ops cost without buying us anything users could feel.

What's the latency budget for an edge fraud-scoring function?

Under 200ms end-to-end. The click happens, the redirect fires, the user lands on the advertiser's page — add more than a couple hundred ms anywhere in that chain and conversion rates drop visibly. We sit at 180ms p95 today: roughly 30ms for Cloudflare Workers ingestion, 90ms for the Postgres fingerprint lookup and scoring, and 60ms of network and TLS overhead. Cold starts were the hardest part to tame.

What would you change if you rebuilt ClickzProtect from scratch?

Three things. Skip the bloom filter detour — three weeks of work we ripped out and threw away. Ship the rules engine as data, not code, from day one so non-engineers can tune thresholds. And put the scoring logic in a Postgres function instead of TypeScript, which would shave another 30-40ms off p95. The architecture is right. The path we took to get here was longer than it needed to be.

Follow the Studio

Velocity Digital Labs is a multi-product studio building 8 active SaaS products with a 1-founder + 1-manager + N-AI-agents structure. Receipts, dollar-signs, cap-table-honest. No VC platform-play — just shipping.

See the products → · Browse all VDL blog posts