My experiences about the date and time in R

AI Maverick
3 min readJan 30, 2022

In the following topic, I will share some of my experiences in R regarding the date and time extraction and visualization of the time series.

Household Electric Power Consumption dataset

When I was working with date and time in R and with some related packages including the “lubridate”, I encountered some problems regarding the extraction of the year, month, and day, and each time the method returns NA or wrong values for the year or month. But I managed to solve them one by one. In the following, I talked about the methods, codes and illustrate examples of the EDA.

In this link, you can find the history of attempts and final results in the R.MarkDown script.

Date and Time

The dataset I selected to work with was (Household Electric Power Consumption) which you can find in the UCI repository.
After returning the data frame, I noticed that I had Date and Time in two different columns which the data types were “Chr”.
The first thing I did was concatenate the time and date to extract a new feature as DateTime. For this matter, “paste” was the method I used.

dataFrame$datetime <- paste(dataFrame$Date,dataFrame$Time)

After that, I had to convert DateTime from character to POSIXct

dataFrame$datetime <-as.POSIXct(dt$datetime, format="%d/%m/%Y %H:%M:%S")

You also may add the time zone to this new feature.

Challenges with extracting year and day

I used the lubridate to extract the year and day. And implemented the following code

dataFrame$year <- year(dataFrame$datetime)
dataFrame$week <- week(dataFrame$datetime)
dataFrame$day <- day(dataFrame$datetime)

To visualize the plots such as a scatter plot for the attributes of this dataset which has about 2,075,259 instances, first we need to take a sample from the dataset. In this way, we can have an insight from the plot. For instance, in the following figure, you may find the plot of the filter data for the year 2006.

sub_metering1 feature for year 2006

Moreover, you can subset your data and use “plot_ly” to have a nice plot in the different periods for the important features such as sub_metering. In the following, we have a graph regarding the sub_metering1 and year 2008, month one and day 10.

Reducing the observation to minutes to have a better look at the plot

The best approach I learned from the previous plot was that I have to compare the results of the sub_metering1 with two other sub_metering and maybe reduce the observation to minutes. In this way, I can see the power consumption in different areas of the houses. First, I had to extract the minutes from my data time feature.

dataFrame$minute <- minute(dataFrame$datetime)

Latter, filter the data for a specific year, month, day, and some concrete minutes of the observation.

Power consumption 2009

I added all the related code in this notebook.

I will talk about different aspects of this experience in the future.

--

--