Home Data Visualization: Using R to compute monthly mean average for irregular time series

The irregular time series data comes in a formatted csv file with the following columns and data:

"Datum";"Zeit";"Stand HT";"Stand NT";"Gesamt kWh";"ø Woche kWh";"ø Tag kWh";"Level"
"11.10.14";"16:52";"15165,00";"15603,00";"57,0";"64,2";"9,2";"B"
"05.10.14";"11:40";"15140,00";"15571,00";"40,0";"60,5";"8,6";"B"
"30.09.14";"20:41";"15124,00";"15547,00";"93,0";"59,3";"8,5";"B"
"19.09.14";"21:22";"15085,00";"15493,00";"44,0";"76,3";"10,9";"C"

The information in the columns contains the following data:
Datum: Date
Zeit: Time
Stand HT: High tariff electricity meter reading
Stand NT: Low tariff electricity meter reading
Gesamt kWh: electricity consumption since last reading
ø Woche kWh: Average weekly electricity consumption
ø Tag kWh: Average daily electricity consumption
Level: Efficiency label

R's library zoo provides the infrastructure for my irregular time series. Lubridate makes it easier to work with date and time. Read.zoo gives me a couple of options to parse the csv file and to use the first (date) and the second column (time) as one single index:

 library(zoo)  
 library(lubridate)  
 f <- function(d, t) parse_date_time(paste(d, t), "%d.%m.%Y %H:%M")  
 energy <- read.zoo("https://dl.dropboxusercontent.com/u/23293644/export-12-10-2014-20-45.csv",   
           header = TRUE, sep=";", dec=",",   
           index = 1:2,  
                          FUN = function(d, t) parse_date_time(paste(d, t, sep = " "), "dmy hM")  
                          )

The following command gives the subsets of the data we are interested in for further processing (High and low tariff electricity meter reading):

 energy_subset <- subset(energy, select = c(Stand.HT, Stand.NT))

As can be seen here:

1:  > energy  
2:            Stand.HT Stand.NT Gesamt.kWh ø.Woche.kWh ø.Tag.kWh Level  
3:  2012-07-11 21:47:00 11084  11202  -     -      -     -    
4:  2013-01-27 16:17:00 12213  12343  2270,0   79,5    11,4   C    
5:  2013-02-02 14:58:00 12266  12380  90,0    106,0    15,1   D    
6:  2013-02-17 14:04:00 12355  12466  175,0   81,9    11,7   B    
7:  2013-02-24 12:11:00 12421  12504  104,0   105,2    15,0   D    
8:  2013-03-01 08:18:00 12460  12532  67,0    96,9    13,8   C    
9:    
10:    
11:  > energy_subset  
12:            Stand.HT Stand.NT  
13:  2012-07-11 21:47:00 11084  11202    
14:  2013-01-27 16:17:00 12213  12343    
15:  2013-02-02 14:58:00 12266  12380    
16:  2013-02-17 14:04:00 12355  12466    
17:  2013-02-24 12:11:00 12421  12504    
18:  2013-03-01 08:18:00 12460  12532

Now a monthly grid has to be created which starts at least a month before the first entry and ends at least 30 days after the last entry with the following commands:

monthly_grid_start <- as.Date(start(energy_subset) - days(16) - days(day(start(energy_subset) - 1))) + hours (1)  
monthly_grid_end  <- as.Date(end(energy_subset) + days(30)) + hours (1)  
monthly_grid    <- seq(monthly_grid_start, monthly_grid_end, by="1 month")

The monthly_grid is in the of the month (to have the average over the month) - at night when nobody ready a sample (I had some issue when the grid and a sample is at the same time):

 [1] "2012-06-14 01:00:00 UTC" "2012-07-14 01:00:00 UTC" "2012-08-14 01:00:00 UTC"  
  [4] "2012-09-14 01:00:00 UTC" "2012-10-14 01:00:00 UTC" "2012-11-14 01:00:00 UTC"  
  [7] "2012-12-14 01:00:00 UTC" "2013-01-14 01:00:00 UTC" "2013-02-14 01:00:00 UTC"  
 [10] "2013-03-14 01:00:00 UTC" "2013-04-14 01:00:00 UTC" "2013-05-14 01:00:00 UTC"  
 [13] "2013-06-14 01:00:00 UTC" "2013-07-14 01:00:00 UTC" "2013-08-14 01:00:00 UTC"  
 [16] "2013-09-14 01:00:00 UTC" "2013-10-14 01:00:00 UTC" "2013-11-14 01:00:00 UTC"  
 [19] "2013-12-14 01:00:00 UTC" "2014-01-14 01:00:00 UTC" "2014-02-14 01:00:00 UTC"  
 [22] "2014-03-14 01:00:00 UTC" "2014-04-14 01:00:00 UTC" "2014-05-14 01:00:00 UTC"  
 [25] "2014-06-14 01:00:00 UTC" "2014-07-14 01:00:00 UTC" "2014-08-14 01:00:00 UTC"  
 [28] "2014-09-14 01:00:00 UTC" "2014-10-14 01:00:00 UTC"

The grid is then used to approximate the energy_subset

 > monthly_energy <- na.approx(merge(xts(,monthly_grid),energy_subset))[monthly_grid]  
 > monthly_energy  
           Stand.HT Stand.NT  
 2012-07-14 01:00:00 11096.06 11214.19  
 2012-08-14 01:00:00 11271.26 11391.25  
 2012-09-14 01:00:00 11446.45 11568.30  
 2012-10-14 01:00:00 11616.00 11739.65  
 2012-11-14 01:00:00 11791.19 11916.71  
 2012-12-14 01:00:00 11960.74 12088.05  
 2013-01-14 01:00:00 12135.93 12265.11  
 2013-02-14 01:00:00 12333.92 12445.63  
 2013-03-14 01:00:00 12549.50 12602.58  
 2013-04-14 01:00:00 12757.15 12784.74  
 2013-05-14 01:00:00 12917.62 12958.23  
 2013-06-14 01:00:00 13075.25 13123.25

Then a new column "Total" is added which sums up the Stand.HT and Stand.NT

 > monthly_energy <- transform(monthly_energy, Total = Stand.HT + Stand.NT)   
 > monthly_energy  
           Stand.HT Stand.NT  Total  
 2012-07-14 01:00:00 11096.06 11214.19 22310.25  
 2012-08-14 01:00:00 11271.26 11391.25 22662.50  
 2012-09-14 01:00:00 11446.45 11568.30 23014.76  
 2012-10-14 01:00:00 11616.00 11739.65 23355.65  
 2012-11-14 01:00:00 11791.19 11916.71 23707.90

me computes the difference between two subsequent readings - so we get the usage. me_lag is the same as me - but lagged by 12 month (1 year) - so can compare against the previous year. me_save computes the relative savings against the previous year m is the merge of all this data

 me   <- diff(monthly_energy)  
 me_lag <- lag(me, -12)  
 me_save <- ((me - me_lag) / me) * 100  
 m <- zoo(merge(me, me_lag, me_save))

the following command writes it to a *.csv file (in case somebody needs to analyze the data with Excel


 write.zoo(m, file = "c:/temp/test.csv", index.name = "Index", col.names = TRUE, na="", quot = TRUE, sep=";", dec=",")

 # ?plot.zoo  
 my.panel <- function(x, y, ..., pf = parent.frame()) {  
   fmt <- "%b-%y" # format for axis labels  
   lines(x, y, ...)  
   # if bottom panel  
   if (with(pf, length(panel.number) == 0 ||   
     panel.number %% nr == 0 || panel.number == nser)) {   
    # create ticks at x values and then label every second tick  
    axis(side = 1, at = x, labels = FALSE)  
    ix <- seq(1, length(x), 2)  
    labs <- format(x, fmt)  
    axis(side = 1, at = x[ix], labels = labs[ix], tcl = -0.7, cex.axis = 0.7)  
   }  
 }  
 plot(m, panel = my.panel, xaxt = "n", screens=c(1,2,3,1,2,3), lty=c(1,1,1,3,3,3), lwd= c(2,2,2, 0.5, 0.5, 0.5), xlab= "Date", main = "[kWh/month]", ylab=colnames(m))

creates the following plot which compars the total and the two tarrifs against the values from the previous years:

The following code

 my.panel2 <- function(x, y, ..., pf = parent.frame()) {  
   fmt <- "%b-%y" # format for axis labels  
   lines(x, y, ...)  
   # if bottom panel  
    # create ticks at x values and then label every second tick  
    axis(side = 1, at = x, labels = FALSE)  
    ix <- seq(1, length(x), 2)  
    labs <- format(x, fmt)  
    axis(side = 1, at = x[ix], labels = labs[ix], tcl = -0.7, cex.axis = 0.7)  
 }  
 plot (me_save$Total, panel = my.panel2, xaxt = "n", xlab= "Date", main = "Energy saving [%] compared with previous year", ylab="Total")

shows the energy savings compared to the previous year in a simpler diagram:

Home Data Visualization

Monday, October 13, 2014

Using R to compute monthly mean average for irregular time series

No comments:

Post a Comment

Translate

Popular Posts