INNOVATION | 04.22.2024
Generative models and their role in generating synthetic data
Generative models are AI algorithms designed to learn patterns, and synthetic data, a subset of generative AI, has significant strategic importance for companies. There are several prominent generative models, which are used according to the needs of each project. MAPFRE Insurance is using Conditional Tabular Generative Adversarial Networks (CTGAN) to combat fraud using AI and synthetic data.
Synthetic data has emerged as a key tool for technological development and business innovation, making it possible to simulate complex scenarios and improve Artificial Intelligence (AI) models without compromising the privacy of personal data.
By replicating the statistical profiles of real data, it offers an ethical and legal alternative to the restrictions imposed by regulations like GDPR, allowing for experimentation and information analysis. Its ability to generate diverse and controlled data sets fosters innovation and improves the accuracy and robustness of the systems that rely on it to operate.
Generative models, a critical component
Generative models are AI algorithms designed to learn patterns from large data sets and generate new instances that maintain statistical consistency with the original data. Unlike other AI models, which focus on classifying or predicting data based on specific inputs, generative models aim to capture and replicate the distribution of structured and unstructured data to create something new.
For example, after analyzing thousands of facial images, a generative model could create produce images of non-existent people that resemble real photographs. This process is carried out using techniques based on artificial neural network models, such as Generative Adversarial Networks (GAN), and Autoencoder Models, which are used to learn efficient data representations (machine learning).
GANs, made up of two neural networks —the generator and the discriminator— illustrate how generative models work. While the generator generates new data, the discriminator evaluates them against a real data set, learning to distinguish between them. During training, the generator improves its ability to create increasingly convincing data, attempting to deceive the discriminator, which in turn strives to recognize the imitations.
What sets generative models apart is their ability to conceptualize and create, making them especially valuable in areas where creativity and idea generation are essential. They offer a more flexible and comprehensive approach to exploring solutions, overcoming the limitations of conventional AI methods.
Types of generative models and their applications
Among the most notable generative models are Generative Adversarial Networks (GAN), Conditional Tabular Generative Adversarial Networks (CTGAN), Autoregressive Models (AR), and Autoencoder Models. What does each one entail?
- Generative Adversarial Networks (GAN): they create images, videos, and audios that are surprisingly realistic. One practical example is the creation of faces of non-existent people, which is used in the entertainment industry to create characters for video games or movies.
- Conditional Tabular Generative Adversarial Networks (CTGAN): they generate synthetic tabular data while preserving privacy at all times. For example, in the finance sector, they can simulate bank transaction data to test algorithms without exposing sensitive customer information.
- Autoregressive Models (AR): these models are based on time series and predict the next element in a sequence. They are fundamental in text prediction tools or in the automatic generation of music, where each note is based on the previous ones.
- Autoencoder Models: these models are designed to compress or encode input data to reduce them to their simplest form, then reconstruct or decode the original information from the compressed representation. They are trained from non-supervised Machine Learning. An example of these models are Variational Autoencoders (VAE), applied to image creation.
MAPFRE Insurance: innovating in the application and use of synthetic data
MAPFRE is innovating in its global operations, particularly in the USA through MAPFRE Insurance, where it’s using these types of generative models.
With the help of AI systems that use machine learning and graph analysis, the Advanced Analytics and Technical Claims teams have developed a project to identify patterns of fraud in claims, initially in Motors and later in Home insurance. This approach achieves more efficient claims processing and fraud detection, a significant advancement in the fight against annual economic losses caused by fraud in the industry.
The generative model used was the CTGAN. When a Home insurance claim is filed, the model assesses the likelihood of fraud and flags suspicious cases for further investigation.
For this task, MAPFRE Insurance uses synthetic data to train the AI model. This strategy helps overcome the imbalance and scarcity of historical fraudulent claims, improving the algorithm’s ability to identify fraud patterns more accurately. By generating a more balanced data set, the company makes its Home insurance fraud detection models much more precise.
The adoption of generative models and synthetic data is transforming the business sector, especially when it comes to analyzing complex data and protecting sensitive information. This transformation is manifesting in improvements in efficiency, productivity, and the optimal use of available resources.
These innovative technologies make it possible to simulate realistic scenarios without compromising real data, allowing for more accurate decision-making and facilitating the development of products and services tailored to the specific needs of the market. By facilitating a more in-depth and bias-free data analysis, companies can anticipate market trends, optimize operations, and explore new growth opportunities.
RELATED ARTICLES: