All example engagements
Machine Learning

Real-time fraud detection for transaction systems

Bringing anomaly-detection models into a high-volume payment flow without breaking latency budgets or analyst trust.

Typical duration
6-week discovery, then 12-16 weeks to production with a shadow pilot
Team shape
1 ML lead + 1 data engineer + a tech lead from your side

What good looks like

Latency target
Sub-50ms p99 on the auth path
False-positive direction
Materially lower than rule-only baseline at parity recall
Retraining cadence
Weekly with drift checks and shadow-mode promotion

The problem this addresses

Rule-based fraud systems hit a wall once attackers learn the rules. Adding ML to the loop is the obvious next step, getting it to run inside the request path with a sub-50ms budget, calibrated false-positive rates, and a defensible retraining story is where most teams get stuck. This is the kind of engagement we take on: an existing rules engine that has been patched for years, an analyst team drowning in review queues, and a risk committee that needs to understand every score the model produces.

How we'd approach it

We start by reproducing the existing rule outcomes on a recent transaction sample so there's an honest baseline to measure against. The problem is usually framed as ranking, not classification, analysts want to control the operating point rather than have a model decide for them. A typical build pairs a gradient-boosted model on aggregated behavioural features (velocity, geographic dispersion, device, merchant category drift) with a lightweight autoencoder for unseen-pattern detection. We deliberately avoid deep sequence models in v1 when the explainability story matters more than a marginal lift. The scoring service sits behind the existing decisioning engine as a feature, not a replacement, so the team can fall back to rules for any segment where they want to.

What we'd build

A scoring service deployed in the bank's own cloud account, called synchronously during card authorisation with a sub-50ms latency budget. Reason codes are returned alongside the score so analysts and the customer-facing team can explain holds. We also build a shadow-mode harness so the risk team can A/B new model versions against production before promoting them, that's the artefact we treat as the durable deliverable. Account-takeover detection, AML monitoring, and merchant-side fraud queues are natural follow-ons rather than v1 scope.

Honest considerations

If you don't have transaction-level event logs going back at least six months, retraining isn't going to work, start with rule tuning and instrumentation instead. If your decisioning engine can't call out to an external service in the auth path, this becomes an integration project before it becomes an ML project. And if the risk committee won't sign off on a model in the loop without a fully interpretable per-decision explanation, a tree-ensemble with reason codes is the realistic upper bound, frontier sequence models aren't the right fit yet.