• español
    • English
  • Login
  • español 
    • español
    • English
  • Tipos de Publicaciones
    • bookbook partconference objectdoctoral thesisjournal articlemagazinemaster thesispatenttechnical documentationtechnical report
Ver ítem 
  •   IMDEA Networks Principal
  • Ver ítem
  •   IMDEA Networks Principal
  • Ver ítem
JavaScript is disabled for your browser. Some features of this site may not work without it.

A Synthetic Data Generation System based on the Variational-Autoencoder Technique and the Linked Data Paradigm

Compartir
Ficheros
Articulo (504.1Kb)
Identificadores
URI: https://hdl.handle.net/20.500.12761/1828
ISSN: 2192-6352
DOI: 10.1007/s13748-024-00328-x
Metadatos
Mostrar el registro completo del ítem
Autor(es)
Dos Santos, Ricardo; Aguilar, Jose
Fecha
2024-06-30
Resumen
Currently, the generation of synthetic data has become very fashionable, either due to the need to create data in certain specific contexts or to study unknown scenarios among other reasons. Additionally, synthetic data is a critical component in training machine learning models in the presence of little data. This work proposes a Synthetic Data Generation System (SDGS) architecture to allow synthetic data generation to be fully automated. SDGS is based on the Variational AutoEncoders (VAE) learning technique, and has three main capabilities. The first is related to the ability to extract data samples from multiple sources using the Linked Data (LD) paradigm. The second is linked to the ability to merge data sets to increase the amount of information that can be provided to the VAE-based synthetic data generator. The last one is related to having a Feature Engineering layer to create new features by generating or extracting information from the dataset and then selecting the features that provide the best information for the VAE model. A case study is described in detail to show the new functionalities of the SDGS, such as dataset extraction from different sources using LD, dataset merging using pivots, and the application of different feature engineering methods. Finally, two metrics are used to evaluate the quality of the generated datasets in different case studies. The first one is the accuracy to analyze the performance of the models generated with the new SDGS functionalities, obtaining results above 90%. The second one is the two-Sample Hotelling's T-Squared Test to determine the quality of the synthetic data generated by the system, obtaining synthetic datasets very similar to the original datasets.
Compartir
Ficheros
Articulo (504.1Kb)
Identificadores
URI: https://hdl.handle.net/20.500.12761/1828
ISSN: 2192-6352
DOI: 10.1007/s13748-024-00328-x
Metadatos
Mostrar el registro completo del ítem

Listar

Todo IMDEA NetworksPor fecha de publicaciónAutoresTítulosPalabras claveTipos de contenido

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Difusión

emailContacto person Directorio wifi Eduroam rss_feed Noticias
Iniciativa IMDEA Sobre IMDEA Networks Organización Memorias anuales Transparencia
Síguenos en:
Comunidad de Madrid

UNIÓN EUROPEA

Fondo Social Europeo

UNIÓN EUROPEA

Fondo Europeo de Desarrollo Regional

UNIÓN EUROPEA

Fondos Estructurales y de Inversión Europeos

© 2021 IMDEA Networks. | Declaración de accesibilidad | Política de Privacidad | Aviso legal | Política de Cookies - Valoramos su privacidad: ¡este sitio no utiliza cookies!