Synthetic Data Generation for Enhancing Fraud Detection ML Model Training

Author(s): Ravi Kiran Alluri

Publication #: 2508011

Date of Publication: 10.12.2023

Country: United States

Pages: 1-8

Published In: Volume 9 Issue 6 December-2023

DOI: https://doi.org/10.5281/zenodo.16883354

Abstract

The proliferation of digital financial services and e-commerce has offered more convenience for individuals and small businesses, but has also resulted in sophisticated fraud methods. There is a growing threat from financial fraud, synthetic identity theft, and insider threats aimed at financial institutions, payment processors, and regulatory bodies. To mitigate the risks posed by these threats, machine learning (ML) models are widely employed to detect and prevent fraud. However, the name of the game when it comes to building automation fraud models is the data; in fact, that data is the most significant challenge to building trustworthy, resilient, and accurate ML models to support fraud prevention. As fraud naturally occurs infrequently and is varied, formulating datasets with a large proportion of imbalanced data and few positive samples is a challenging task. In addition, there are privacy and regulatory issues that limit the sharing and use of financial data by other researchers, which can hinder model development and collaborative studies.

Keywords: Synthetic Data Generation, Fraud Detection, Machine Learning, Data Augmentation, Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), Privacy-Preserving AI, Anomaly Detection, Imbalanced Datasets, Financial Crime Analytics.

Download/View Paper's PDF

Download/View Count: 1040

Share this Article