Synthetic Data Generation for Enhancing Fraud Detection ML Model Training
Author(s): Ravi Kiran Alluri
Publication #: 2508011
Date of Publication: 10.12.2023
Country: United States
Pages: 1-8
Published In: Volume 9 Issue 6 December-2023
DOI: https://doi.org/10.5281/zenodo.16883354
Abstract
The proliferation of digital financial services and e-commerce has offered more convenience for individuals and small businesses, but has also resulted in sophisticated fraud methods. There is a growing threat from financial fraud, synthetic identity theft, and insider threats aimed at financial institutions, payment processors, and regulatory bodies. To mitigate the risks posed by these threats, machine learning (ML) models are widely employed to detect and prevent fraud. However, the name of the game when it comes to building automation fraud models is the data; in fact, that data is the most significant challenge to building trustworthy, resilient, and accurate ML models to support fraud prevention. As fraud naturally occurs infrequently and is varied, formulating datasets with a large proportion of imbalanced data and few positive samples is a challenging task. In addition, there are privacy and regulatory issues that limit the sharing and use of financial data by other researchers, which can hinder model development and collaborative studies.
Keywords: Synthetic Data Generation, Fraud Detection, Machine Learning, Data Augmentation, Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), Privacy-Preserving AI, Anomaly Detection, Imbalanced Datasets, Financial Crime Analytics.
Download/View Count: 502
Share this Article