A guide to machine learning and fraud detection

A guide to machine learning and fraud detection

Fraud detection is an essential aspect of digital transactions. Fraud can have a significant impact on financial institutions, their customers and third parties such as online retailers or apps that accept payments.

20 Oct 2023

Article Index 

Fraud detection is an essential aspect of digital transactions. Fraud can have a significant impact on financial institutions, their customers and third parties such as online retailers or apps that accept payments.

Before machine learning was introduced in fraud detection, the practice relied on rules-based systems and manual review and updates.

Fraud detection with machine learning uses models and algorithms that leverage data to identify the probability of fraud based on a range of factors.

Those factors are updated in real-time as the system analyses more transactions. With machine learning fraud detection, the effectiveness of the solution develops as quickly as fraudsters develop their tactics.

Machine learning defined

Vass Foto Blog interior - 1

Machine learning is considered a subfield of artificial intelligence. In fact, machine learning allows computer systems to learn and adapt based on experience without the need to be reprogrammed.

machine learning solution involves algorithms and models designed to analyse data and predict outcomes. Algorithms can be trained using different learning models such as supervised, semi-supervised, reinforcement or unsupervised, depending on the type of problem being addressed.

The ability to learn and improve over time is what makes machine learning an ideal solution for fraud detection. In addition, its ability to find patterns in a brutal amount of internal and external parameters makes it far more effective than traditional modes of detection.

The difference between artificial intelligence and machine learning

Vass Foto Blog interior - 1

The difference between Artificial Intelligence (AI) and Machine Learning (ML) lies in their scope and objectives.

AI is a broad domain that aims to emulate human intelligence across various applications, ranging from intelligent assistants to autonomous vehicles. It encompasses a myriad of strategies and technologies designed to allow machines to perform tasks that traditionally require human intelligence.

On the other hand, Machine Learning is a specialized subset of AI. It centers on the development of algorithms and statistical models that enable computers to undertake specific tasks without explicit instructions, relying instead on recognizing data patterns and inferences.

While every ML system can be classified as AI, not all AI systems employ ML. AI’s primary goal spans achieving human-like task completion across diverse applications, while ML specifically focuses on processing vast data volumes to identify patterns and produce data-driven outcomes.

AI can use a variety of methods, including but not limited to ML, whereas ML primarily differentiates its methods into supervised and unsupervised learning, and to a lesser extent in semi-supervised, reinforcement learning.

The process of implementing an ML solution typically revolves around data selection and refining models, while AI solutions might entail more intricate processes.

Types of Online Fraud

With the increased use of the internet for day-to-day services, and the number of connected devices and users, fraud attempts have increased in recent years.

Santiago Cordero, Head of Design and Development of Cybersecurity and Cloud Services at VASS says: “The factor driving this demand for our services are more sophisticated fraudsters that are leveraging the same machine learning and neural network technology our clients use to improve their tactics. It’s a continuous game of cat and mouse and we help our clients remain ahead.”

Online fraud can take many forms. It can include phishing scams, business email compromise, shopping fraud, money laundering, stolen credentials – it’s any attempt to digitally defraud an organisation or individual.

How many different types of fraud were attempted at your organisation during the last 12 months? Are they homogenous or different every time?

Machine learning fraud detection solutions are trained to identify potential fraud and can learn in real time, enabling them to keep up with changing tactics of fraudsters and detect fraud not just on actions but also on behaviours. This moves attention away from defining specific types of fraud to defining typical behaviours of fraudulent and genuine transactions.

Fraud detection before machine learning

Vass Foto Blog interior - 1

Before machine learning was implemented fraud detection relied on rules-based systems that referenced a set of rules and applied them to transactions.

Rules-based systems highlight anything that, according to its predefined rules, appears like fraud. On a rules-based system when fraudsters developed new methods, analysts would review activities and create new rules or update existing ones.

Rules-based fraud detection can detect most obvious fraud attempts that use common methods, but they are prone to false negatives, especially when fraudsters develop new methods before rules that can identify them are applied. The major drawback is they can only identify a fraud attempt after it has been identified.

Discovering new fraud tactics and creating and updating rules to prevent them required a great deal of human resources. As a rules-based system became larger it increased the likelihood of false positives, such as a customer’s card being declined, which negatively impact customer experience.

How machine learning works in fraud detection

Vass Foto Blog interior - 1

Fraud detection and machine learning works with a model and algorithms. A model is provided with a set of historical data that includes both fraudulent and genuine transactions. This data includes financial transaction records, customer behaviour, and personal data, among many others, that can be used to identify fraudulent behaviours and patterns.

Data is labelled and the model learns to distinguish between fraudulent and genuine transactions by extracting relevant features and using the features of the transaction to determine the probability of fraud.

As well as using features, a machine learning solution can compare against similar customers, transactions and behaviour to determine if a transaction is genuine or fraudulent. In addition, machine learning can understand and adapt to other indicators of fraud such as the travel time between two destinations or false IP address being used which place the user in in inhabitable location.

Machine learning algorithms include neural networks, vector machines, and deep learning methods. The way they learn can be supervised or unsupervised. With supervised learning the algorithm is provided with labelled data will learn from that data and make predictions on new, unlabelled data. With unsupervised learning the algorithm is provided with unlabelled data so it can discover patterns and relationships independently.

Benefits of machine learning fraud detection

Automating fraud detection and the development of rules significantly reduces the number of resources and time required. It leads to a more accurate fraud detection solution which is better at keeping up with the pace fraudsters change their methods and tactics.

The outcome of more accurate fraud detection is less false positives and negatives leading to reduced fraud related losses and improved customer experience. In addition, machine learning fraud detection has the ability to identify anomalies that indicate potentially fraudulent transactions, even if the tactics being used have not been identified previously.

Selecting data for a machine learning fraud detection model

Vass Foto Blog interior - 1

The accuracy of a machine learning fraud detection solution depends on the quality and quantity of data used to train it. Most organisations collect and store sufficient data to implement a machine learning fraud detection solution, but most organisations aren’t using the data they collect to its full potential. 

As part of implementing a machine learning solution, we help organisations utilise the data they collect and define which parts will be useful for the model.

When selecting data, it’s essential to have a variety of data sources that provide as complete a picture as possible about transactions, behaviour and account activity. Data should be divided into two sets. One for training and one for testing.

Datasets should be as large as possible and if required, synthetic data added.  The data you train the model with will be used to make future decisions and set a course for learning. Inaccurate or insufficient data risks an ineffective fraud detection solution.

Another task when preparing data for a model is selecting and allocating relevant features (for a supervised model). The features, and combinations of them, need to represent fraudulent and genuine activity.

Implementing a machine learning fraud detection solution

Vass Foto Blog interior - 1

The first step to implement a machine learning fraud detection solution is to define the scope of the solution such as the types of fraud the solution will detect and decide upon a suitable model.

Common models for fraud detection are logistic regression, decision trees, random forests, gradient boosting, and neural networks.

After scoping and defining the solution’s requirements, historic data from various sources needs to be collated into a training set and testing set. Once the data is ready (and labelled if you’re opting for a supervised model) the next step is training the model.

After training, models are tested and refined using new data to improve accuracy.

Scaling a machine learning fraud detection solution

Vass Foto Blog interior - 1

As an organisation grows there may be a need to scale their fraud detection solution to increase its capacity or enable it to detect new types of fraud.

Scaling a fraud detection solution usually leads to a more complex set up that can handle larger amounts of data while maintaining efficiency.

When scaling a machine learning fraud detection solution, a synthetic dataset can be created to ensure models are trained on diverse data or larger datasets. This can be an effective approach to preparing for new types of fraud that may not have happened at the organisation previously or preparing to process a larger number of transactions.

About VASS

VASS has over 20 years’ experience providing digital transformation services to banks, online retailers and other sectors with a significant fraud risk. Contact us to learn about fraud detection solutions we’ve implemented and how we can help with yours.


Let's shape the future of digital innovation together

Get in touch