Reviewing a multi-output regression task with the python code

AI Maverick
3 min readApr 25, 2022

--

In the following article, I study a multi-output regression and train a proper machine-learning model on this dataset. I also, review different metrics and visualization methods to see the model’s performance.

Multi-Output regression problems refer to the regression tabular datasets that have more than one continuous target. There are different data examples and datasets in this field you can work on them, one of the well-known multi-output regression datasets is the ATP1D dataset [1]. The Airline Ticket Price dataset. The sequence of time over several days. [1]. I think it is a good dataset to start the practice on it.

You can download ATP1D dataset, here

The focus of this study

So, what is the reason for my beginning of writing this study and related notebook on Kaggle? I was looking for different metrics and evaluation approaches to measure the performance of the Multi-Output models.

Of course, we have RMSE and R2 and related metrics, but I was wondering, is there any other solution to visualize the model performance!

So, here I am. I am writing, and we will see what we will have for visualizing the model performances.

To review the related Python notebook, refer here.

Model

In this practice, I used the Sklearn package for model training. The model is a Neural Network with two hidden layers and 100 nodes per layer.

Dataset

The dataset I use is the ATP1d, which is related to the flight ticket price (daily price — 1 refers to one day price). This dataset has 337 instances and 411 features and six regression targets without missing and null values.

For the data split, I consider the KFold cross-validation with 5 splits to train the NN model.

Metrics

To evaluate the model performance, I use the Root Mean Square Error and R² Score. The RMSE is calculated for each target. But, here we have a multi-outputs target, so we need another metric to reveal a number as a model performance over all the targets. For this matter, I use R² score which returns the percentage value that is the uniform average of all targets.

RMSE:
array([ 54.35497133, 108.18074775, 82.64076203, 67.5315071 ,
59.47827072, 70.53822332])

Performance visualization

An idea could be scattering the target instances for the first two targets. Plotting the real and predicted values. This experiment illustrates the multi-output model’s performance. And comparing the performance for different targets.

References

[1] Spyromitros-Xioufis, E., Tsoumakas, G., Groves, W. et al. Multi-target regression via input space expansion: treating targets as inputs. Mach Learn 104, 55–98 (2016).

--

--

No responses yet