You might be interested in
Backtesting, in our context, is the age-old practice of validating any new or modified fincrime prevention method on past, known, and labeled data in order to judge the impact it may have upon deployment. Backtesting is and will remain indispensable, but its usefulness will be far more limited in the future.
Backtesting is based on the critical assumption that criminal and legitimate behaviors observed in the past are representative of future behaviours. However, that assumption is breaking down in an increasing number of fraud and fincrime domains.
Criminal behavior is no longer stationary and defined by a broad population. Because the professional evolution of criminals is accelerating and the statistical properties of fraud are shifting much faster than our historical datasets can capture, backtesting increasingly provides biased and misleading results.
The fincrime game is getting faster, though not uniformly and not everywhere. Criminals operate globally. They outsource elements of their business, like document forgery, account opening, or money muling.
On top of that, they have started using AI to innovate rapidly. This completely changes how we design fraud detection systems and measure their effectiveness.
Backtesting was a really good approach to system evaluation when the criminal behavior was carried out by individuals, stayed repetitive, and was thus stable. Traditional financial criminals were and remain individuals, operating and learning independently and locally.
If they were really good and went global, they might find themselves on the silver screen, like Frank Abagnale in Catch me if you can. But, by and large, individual criminals operated similarly to each other and repeated mistakes others had made in the past. Because they were learning independently of each other, their mistakes were repetitive and predictable.
Machine learning and risk management techniques designed for approximately i.i.d. data worked reasonably well because fraud exhibited time stationarity, relatively stable population characteristics, and slow-learning dynamics. If we drew a very abstract graph of the traditional criminal behavior likelihood projected on the sophistication axis, we would get a continuous distribution.

On a large scale, we can mentally approximate it with a Gaussian-like shape for intuition, although in reality it is an empirical distribution specific to each company and each time period.
Each criminal contributes marginally to the overall fraud distribution, and only the best ones progressively climb the distribution to the right. Most criminals are “full stack”, independent operators with a limited number of frauds committed.
Backtesting makes perfect sense in this context. It is a great way to evaluate the defenses against a relatively stable distribution of criminal behavior. It lets you measure the overlap between the distribution of criminal behavior and legitimate behavior.
Once you have the overlap, you can place the decision boundary with a specific ratio of false positives and false negatives, thus creating the simplest supervised detector1.
Sadly, today’s fraud is very different and breaks all the assumptions behind the distribution. While financial crime still exists on an individual level, it is rapidly outpaced by criminal professionals, whose output now dwarfs traditional crime in high-risk domains.
It is critical to understand that criminal innovation is now both rapid and done simultaneously on different levels of the criminal business. Criminals have a broad library of operational procedures, and can select an approach and then modify it rapidly to respond to any changes you make in your defensive posture.
The steps a criminal gang takes today to open a bank account, to name but one example, can be very different from the ones the same gang used yesterday. The process will likely be different again from the one the same organization will use tomorrow.
Backtesting fundamentally fails against professional criminals because its estimation of your ability to detect fraud is highly unstable and dependent on your past defenses. Backtesting measures the intersection of very narrow, dynamic fraud behaviours of highly scalable and automated criminals, shaped by your past defenses.
“Shaped by your past defenses” is the key phrase here.
Criminals respond to your past controls. They are economical. Their attacks aren’t more sophisticated than strictly necessary. They often sell their products (like business identities or onboarded accounts) at very efficient black markets, and the reputation and escrow mechanisms imposed by black markets create strong short-term incentives.
If you successfully off-board the newly created account within hours, they often deliver another one to protect their reputation even at cost to themselves. If you block the same account after two months, they are off the hook.
These incentives are behind a hoarding phenomenon, where your past defenses (resulting in failed onboardings and off-boarded accounts), combined with the cost factors on the attacker side, create an environment where:
The effect is the distribution we show below. Instead of a “statistically reasonable” distribution of individual criminals, modern fraud is a set of narrow, Dirac-like distributions. We are hunting highly concentrated and constantly shifting behavioral clusters2.

Backtesting fails on a technical level for four main reasons:
The decreasing predictive value of backtesting also helps explain why naive supervised learning approaches often miss new criminals and new behaviors of existing criminals. They can over-index past crimes and lean on correlated features not directly tied to the fraud mechanism.
As a result, models may become very confident at detecting people similar to past fraudsters, while the up-to-date criminal mechanisms are already shifting elsewhere.
Given the current pace of criminal innovation, deploying a fraud detection system is not a one-and-done proposition. It is a commitment to both an initial investment and to a constant daily investment afterward to keep it effective against ever-changing criminal activity.
The answer is in assessing the vendor’s (including your in-house vendor’s) maturity. Measure where they are now, how deep their system is, how good their ability to detect the new attacks is, and how fast their reaction is.
Fincrime, like fintech, is always changing. Criminals can use vibe coding to build new attacks much faster, and generative AI to increase the diversity of the attacks they produce. On the other side, the defenders can also accelerate their response using this technology.
In reality, the progress is uneven. There are very few real-world consequences to failed fraud attempts. Therefore, the criminal’s loss function is much less punishing, and the criminals are able to leverage generative AI much faster. So we can expect future frauds to be:
The game keeps growing on us. It demands we grow with it. Resistant AI is the way to keep pace.
Scroll down to book a demo.
1 In this article, we detect criminals (positives), and therefore false positives are the legitimate users classified as criminals and false negatives are the criminals classified as legitimate users, who avoided being detected.
2 All this is not unprecedented. Historically, computer security was a small field with few products: antivirus, firewall, maybe email security for the overachievers. Products were aimed at catching a small number of viruses and some esoteric threats. As the stakes increased, criminals started to try harder, the security industry has responded and a typical financial company now has over 80+ security tools covering a far broader and extremely more sophisticated set of attacks.