GAN-On Tabular Data (Example with code)

AI Maverick
4 min readFeb 3, 2023

--

Generative Adversarial Network

A Generative Adversarial Network (GAN) is a type of deep learning architecture composed of two neural networks: a generator and a discriminator. The generator is trained to generate new data samples that are similar to a given set of actual data samples. In contrast, the discriminator is trained to distinguish between the generated samples and the real samples. The two networks are trained in competition with each other where the generator tries to produce samples that are indistinguishable from the real samples and the discriminator tries to correctly identify whether a sample is real or generated.

Run the Code online here

how to implement a GAN using TensorFlow and generate fake tabular data from real data;

  • Define the generator model: The generator model is a neural network that takes a random noise vector as input and generates a sample of fake data. In this code, the generator model is defined using the Sequential class from TensorFlow and consists of two dense layers with ReLU activation.
  • Define the discriminator model: The discriminator model is another neural network that takes a sample of either real or generated data as input and outputs a scalar indicating the probability that the sample is real. In this code, the discriminator model is also defined using the Sequential class from TensorFlow and consists of two dense layers with ReLU activation, followed by a single dense layer with a sigmoid activation.
  • Define the combined model: The combined model is used to train the generator, and it consists of the generator and the discriminator connected together. The generator model is connected to the discriminator model by setting the discriminator to be non-trainable, which means that the gradients from the discriminator are not used to update its parameters during training.
  • Load the real data: The code loads the real data from a .csv file using Pandas and converts it to a Numpy array.
  • Train the GAN: The GAN is trained using a for-loop that trains the discriminator and the generator alternately. In each iteration, the discriminator is trained on a batch of real and generated data using discriminator.train_on_batch(X, labels), where X is a concatenation of real data and generated data, and labels is a vector indicating whether each sample is real or generated. The generator is then trained using the combined.train_on_batch(noise, np.ones(batch_size)) method, where noise is a random noise vector, and np.ones(batch_size) is a vector of ones with the same length as the batch size.
  • Generate fake samples: After the GAN has been trained, the code generates fake samples by feeding random noise into the generator model using generator.predict(noise).
import os
import logging
import numpy as np
import pandas as pd
import tensorflow as tf

tf.get_logger().setLevel(logging.ERROR)


class Gan():

def __init__(self, data):


self.data = data
self.n_epochs = 200

# Genereta random noise in a latent space
def _noise(self):
noise = np.random.normal(0, 1, self.data.shape)
return noise

def _generator(self):
model = tf.keras.Sequential(name="Generator_model")
model.add(tf.keras.layers.Dense(15, activation='relu',
kernel_initializer='he_uniform',
input_dim=self.data.shape[1]))
model.add(tf.keras.layers.Dense(30, activation='relu'))
model.add(tf.keras.layers.Dense(
self.data.shape[1], activation='linear'))
return model

def _discriminator(self):
model = tf.keras.Sequential(name="Discriminator_model")
model.add(tf.keras.layers.Dense(25, activation='relu',
kernel_initializer='he_uniform',
input_dim=self.data.shape[1]))
model.add(tf.keras.layers.Dense(50, activation='relu'))
# sigmoid => real or fake
model.add(tf.keras.layers.Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])

return model

# define the combined generator and discriminator model,
# for updating the generator
def _GAN(self, generator, discriminator):
discriminator.trainable = False
generator.trainable = True
model = tf.keras.Sequential(name="GAN")
model.add(generator)
model.add(discriminator)
model.compile(loss='binary_crossentropy', optimizer='adam')
return model

# train the generator and discriminator
def train(self, generator, discriminator, gan):

# determine half the size of one batch, for updating the discriminator
# manually enumerate epochs
for epoch in range(self.n_epochs):

# Train the discriminator
generated_data = generator.predict(self._noise())
labels = np.concatenate([np.ones(self.data.shape[0]), np.zeros(self.data.shape[0])])
X = np.concatenate([self.data, generated_data])
discriminator.trainable = True
d_loss , _ = discriminator.train_on_batch(X, labels)

# Train the generator
noise = self._noise()
g_loss = gan.train_on_batch(noise, np.ones(self.data.shape[0]))


print('>%d, d1=%.3f, d2=%.3f' %(epoch+1, d_loss, g_loss))

return generator

Quality assessment

Evaluating how well the GAN is able to generate new data that is similar to the data it was trained on. The quality can be evaluated based on various metrics, such as the visual similarity, diversity, and robustness of the generated data. The goal is to determine if the GAN is able to generate high-quality synthetic data.

Conclusion

Generative Adversarial Networks (GANs) are a powerful deep learning architecture that can be used to generate new data samples that are similar to a given set of real data samples. GANs consist of two neural networks: a generator and a discriminator, that are trained in an adversarial manner to produce realistic data. The code I provided is a simple example of how to implement a GAN in TensorFlow and generate fake tabular data from real data.

https://samanemami.github.io/

--

--

Responses (3)