R: Fórmula para la media móvil de datos de panel

CorePress2024-01-25  10

Tengo un marco de datos con datos de panel.

Un ejemplo:

date    code_ID name_ID new_value
2021-03-10T17:00:00 13  Alpha   372
2021-03-11T17:00:00 13  Alpha   608
2021-03-12T17:00:00 13  Alpha   515
2021-03-13T17:00:00 13  Alpha   320
2021-03-14T17:00:00 13  Alpha   323
2021-03-15T17:00:00 13  Alpha   329
2021-03-16T17:00:00 13  Alpha   212
2021-03-17T17:00:00 13  Alpha   304
2021-03-18T17:00:00 13  Alpha   462
2021-03-10T17:00:00 17  Beta    115
2021-03-11T17:00:00 17  Beta    151
2021-03-12T17:00:00 17  Beta    141
2021-03-13T17:00:00 17  Beta    137
2021-03-14T17:00:00 17  Beta    106
2021-03-15T17:00:00 17  Beta    67
2021-03-16T17:00:00 17  Beta    166
2021-03-17T17:00:00 17  Beta    126
2021-03-18T17:00:00 17  Beta    179
2021-03-10T17:00:00 8   eta-firm    2155
2021-03-11T17:00:00 8   eta-firm    2845
2021-03-12T17:00:00 8   eta-firm    3477
2021-03-13T17:00:00 8   eta-firm    2950
2021-03-14T17:00:00 8   eta-firm    3023
2021-03-15T17:00:00 8   eta-firm    2822
2021-03-16T17:00:00 8   eta-firm    2184
2021-03-17T17:00:00 8   eta-firm    2026
2021-03-18T17:00:00 8   eta-firm    2531
2021-03-10T17:00:00 6   phi hotel   866
2021-03-11T17:00:00 6   phi hotel   991
2021-03-12T17:00:00 6   phi hotel   971
2021-03-13T17:00:00 6   phi hotel   953
2021-03-14T17:00:00 6   phi hotel   604
2021-03-15T17:00:00 6   phi hotel   398
2021-03-16T17:00:00 6   phi hotel   672
2021-03-17T17:00:00 6   phi hotel   986
2021-03-18T17:00:00 6   phi hotel   1058

¿Cómo hago una fórmula que calcule la media móvil, por code_ID y fecha, posiblemente usando el paquete base de R?

La fórmula es (formato látex):

 {\hat{y}_{t}} = \frac{y_{t-6} + y_{t-5} + y_{t-4} + y_{t-3} + y_{t-2} + y_{t-1} + y_{t}}{7}


------------------------------------

¿Qué tal esto?

dat <- tibble::tribble(~date,    ~code_ID, ~name_ID, ~new_value,
"2021-03-10 17:00:00",  13,   "Alpha",   372, 
"2021-03-11 17:00:00",  13,   "Alpha",   608, 
"2021-03-12 17:00:00",  13,   "Alpha",   515, 
"2021-03-13 17:00:00",  13,   "Alpha",   320, 
"2021-03-14 17:00:00",  13,   "Alpha",   323, 
"2021-03-15 17:00:00",  13,   "Alpha",   329, 
"2021-03-16 17:00:00",  13,   "Alpha",   212, 
"2021-03-17 17:00:00",  13,   "Alpha",   304, 
"2021-03-18 17:00:00",  13,   "Alpha",   462, 
"2021-03-10 17:00:00",  17,   "Beta",    115, 
"2021-03-11 17:00:00",  17,   "Beta",    151, 
"2021-03-12 17:00:00",  17,   "Beta",    141, 
"2021-03-13 17:00:00",  17,   "Beta",    137, 
"2021-03-14 17:00:00",  17,   "Beta",    106, 
"2021-03-15 17:00:00",  17,   "Beta",    67, 
"2021-03-16 17:00:00",  17,   "Beta",    166, 
"2021-03-17 17:00:00",  17,   "Beta",    126, 
"2021-03-18 17:00:00",  17,   "Beta",    179, 
"2021-03-10 17:00:00",  8 ,   "eta-firm",    2155, 
"2021-03-11 17:00:00",  8 ,   "eta-firm",    2845, 
"2021-03-12 17:00:00",  8 ,   "eta-firm",    3477, 
"2021-03-13 17:00:00",  8 ,   "eta-firm",    2950, 
"2021-03-14 17:00:00",  8 ,   "eta-firm",    3023, 
"2021-03-15 17:00:00",  8 ,   "eta-firm",    2822, 
"2021-03-16 17:00:00",  8 ,   "eta-firm",    2184, 
"2021-03-17 17:00:00",  8 ,   "eta-firm",    2026, 
"2021-03-18 17:00:00",  8 ,   "eta-firm",    2531, 
"2021-03-10 17:00:00",  6 ,   "phi hotel",   866, 
"2021-03-11 17:00:00",  6 ,   "phi hotel",   991, 
"2021-03-12 17:00:00",  6 ,   "phi hotel",   971, 
"2021-03-13 17:00:00",  6 ,   "phi hotel",   953, 
"2021-03-14 17:00:00",  6 ,   "phi hotel",   604, 
"2021-03-15 17:00:00",  6 ,   "phi hotel",   398, 
"2021-03-16 17:00:00",  6 ,   "phi hotel",   672, 
"2021-03-17 17:00:00",  6 ,   "phi hotel",   986, 
"2021-03-18 17:00:00",  6 ,   "phi hotel",   1058)

dat$date <- anytime::anytime(dat$date)
id <- dat$code_ID
s <- split(dat, id)
l <- lapply(s, function(x)cbind(x, ma=rowMeans(sapply(0:6, function(t)lag(x$new_value, t)))))
out <- do.call(bind_rows, l)
out
#                   date code_ID   name_ID new_value        ma
# 1  2021-03-10 17:00:00       6 phi hotel       866        NA
# 2  2021-03-11 17:00:00       6 phi hotel       991        NA
# 3  2021-03-12 17:00:00       6 phi hotel       971        NA
# 4  2021-03-13 17:00:00       6 phi hotel       953        NA
# 5  2021-03-14 17:00:00       6 phi hotel       604        NA
# 6  2021-03-15 17:00:00       6 phi hotel       398        NA
# 7  2021-03-16 17:00:00       6 phi hotel       672  779.2857
# 8  2021-03-17 17:00:00       6 phi hotel       986  796.4286
# 9  2021-03-18 17:00:00       6 phi hotel      1058  806.0000
# 10 2021-03-10 17:00:00       8  eta-firm      2155        NA
# 11 2021-03-11 17:00:00       8  eta-firm      2845        NA
# 12 2021-03-12 17:00:00       8  eta-firm      3477        NA
# 13 2021-03-13 17:00:00       8  eta-firm      2950        NA
# 14 2021-03-14 17:00:00       8  eta-firm      3023        NA
# 15 2021-03-15 17:00:00       8  eta-firm      2822        NA
# 16 2021-03-16 17:00:00       8  eta-firm      2184 2779.4286
# 17 2021-03-17 17:00:00       8  eta-firm      2026 2761.0000
# 18 2021-03-18 17:00:00       8  eta-firm      2531 2716.1429
# 19 2021-03-10 17:00:00      13     Alpha       372        NA
# 20 2021-03-11 17:00:00      13     Alpha       608        NA
# 21 2021-03-12 17:00:00      13     Alpha       515        NA
# 22 2021-03-13 17:00:00      13     Alpha       320        NA
# 23 2021-03-14 17:00:00      13     Alpha       323        NA
# 24 2021-03-15 17:00:00      13     Alpha       329        NA
# 25 2021-03-16 17:00:00      13     Alpha       212  382.7143
# 26 2021-03-17 17:00:00      13     Alpha       304  373.0000
# 27 2021-03-18 17:00:00      13     Alpha       462  352.1429
# 28 2021-03-10 17:00:00      17      Beta       115        NA
# 29 2021-03-11 17:00:00      17      Beta       151        NA
# 30 2021-03-12 17:00:00      17      Beta       141        NA
# 31 2021-03-13 17:00:00      17      Beta       137        NA
# 32 2021-03-14 17:00:00      17      Beta       106        NA
# 33 2021-03-15 17:00:00      17      Beta        67        NA
# 34 2021-03-16 17:00:00      17      Beta       166  126.1429
# 35 2021-03-17 17:00:00      17      Beta       126  127.7143
# 36 2021-03-18 17:00:00      17      Beta       179  131.7143

Lo anterior es la solución base R. Si estuviera dispuesto a utilizar dplyr y zoo, podría hacerlo de la siguiente manera:

dat %>% 
  group_by(code_ID) %>% 
  mutate(ma = zoo::rollmean(new_value, k=7, fill=NA, align="right"))



------------------------------------

Es un poco más fácil si está dispuesto a usar paquetes, pero como la pregunta solicita la base R, usando solo eso tenemos lo siguiente. ave aplica roll a new_value mediante code_ID y roll se implementa tomando los medios de fila de incrustar o usando un filtro o una combinación de cumsum y diff

roll <- function(x, n = 7) c(rep(NA, n-1), rowMeans(embed(x, n)))
dat2 <- transform(dat, mean7 = ave(new_value, code_ID, FUN = roll))

o utilice una de estas alternativas para rodar:

roll2 <- function(x, n = 7) stats::filter(x, rep(1, n) / n, sides = 1)

roll3 <- function(x, n = 7) c(rep(NA, n-1), diff(cumsum(c(0, x)), n)/n)

roll4 <- function(x, n = 7) c(rep(NA, n-1), apply(embed(x, n), 1, mean))

4

Gracias. Otra pregunta: ¿Cómo puedo calcular este tiempo (nuevamente por code_ID), ((media7 del día t / media7 del día t-7) -1)?

- Último nacido

28/03/2021 a las 18:41

Si dat2 es como en la respuesta, entonces div <- function(x, n = 7) x / c(rep(NA, n), head(x, -n)); transformar(dat2, div7 = ave(mean7, code_ID, FUN = div))

-G. Grothendieck

29/03/2021 a las 0:10

Gracias de nuevo. En general, cuando los datos son de panel, ¿es posible insertarlos?directamente una fórmula (como la que publiqué) usando code_ID (y tiempo), para poder hacer algún cálculo.

- Último nacido

29 de marzo de 2021 a las 8:08

Se ha agregado roll4. En ese sentido, un medio puede ser reemplazado por cualquier otra función similar.

-G. Grothendieck

29/03/2021 a las 12:40



------------------------------------

Aquí hay dos formas de utilizar la función rollmeanr del paquete zoo.

El primero no asigna la salida a una nueva columna,el segundo sí.

library(zoo)

by(df1$new_value, df1$code_ID, function(x)
  rollmeanr(x, k = 7, fill = NA)
)

df1$mean6 <- with(df1, ave(new_value, code_ID, FUN = function(x) rollmeanr(x, k = 7, fill = NA)))
head(df1, 10)
#                  date code_ID name_ID new_value    mean6
#1  2021-03-10 17:00:00      13   Alpha       372       NA
#2  2021-03-11 17:00:00      13   Alpha       608       NA
#3  2021-03-12 17:00:00      13   Alpha       515       NA
#4  2021-03-13 17:00:00      13   Alpha       320       NA
#5  2021-03-14 17:00:00      13   Alpha       323       NA
#6  2021-03-15 17:00:00      13   Alpha       329       NA
#7  2021-03-16 17:00:00      13   Alpha       212 382.7143
#8  2021-03-17 17:00:00      13   Alpha       304 373.0000
#9  2021-03-18 17:00:00      13   Alpha       462 352.1429
#10 2021-03-10 17:00:00      17    Beta       115       NA
Datos en formato dput.
df1 <-
structure(list(date = structure(c(1615395600, 1615482000, 1615568400, 
1615654800, 1615741200, 1615827600, 1615914000, 1616000400, 1616086800, 
1615395600, 1615482000, 1615568400, 1615654800, 1615741200, 1615827600, 
1615914000, 1616000400, 1616086800, 1615395600, 1615482000, 1615568400, 
1615654800, 1615741200, 1615827600, 1615914000, 1616000400, 1616086800, 
1615395600, 1615482000, 1615568400, 1615654800, 1615741200, 1615827600, 
1615914000, 1616000400, 1616086800), class = c("POSIXct", "POSIXt"
), tzone = ""), code_ID = c(13L, 13L, 13L, 13L, 13L, 13L, 13L, 
13L, 13L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 8L, 8L, 
8L, 8L, 8L, 8L, 8L, 8L, 8L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L
), name_ID = c("Alpha", "Alpha", "Alpha", "Alpha", "Alpha", "Alpha", 
"Alpha", "Alpha", "Alpha", "Beta", "Beta", "Beta", "Beta", "Beta", 
"Beta", "Beta", "Beta", "Beta", "eta-firm", "eta-firm", "eta-firm", 
"eta-firm", "eta-firm", "eta-firm", "eta-firm", "eta-firm", "eta-firm", 
"phi hotel", "phi hotel", "phi hotel", "phi hotel", "phi hotel", 
"phi hotel", "phi hotel", "phi hotel", "phi hotel"), new_value = c(372L, 
608L, 515L, 320L, 323L, 329L, 212L, 304L, 462L, 115L, 151L, 141L, 
137L, 106L, 67L, 166L, 126L, 179L, 2155L, 2845L, 3477L, 2950L, 
3023L, 2822L, 2184L, 2026L, 2531L, 866L, 991L, 971L, 953L, 604L, 
398L, 672L, 986L, 1058L), mean6 = c(NA, NA, NA, NA, NA, NA, 382.714285714286, 
373, 352.142857142857, NA, NA, NA, NA, NA, NA, 126.142857142857, 
127.714285714286, 131.714285714286, NA, NA, NA, NA, NA, NA, 2779.42857142857, 
2761, 2716.14285714286, NA, NA, NA, NA, NA, NA, 779.285714285714, 
796.428571428571, 806)), row.names = c(NA, -36L), class = "data.frame")

0

Su guía para un futuro mejor - libreflare
Su guía para un futuro mejor - libreflare