Tengo un marco de datos con datos de panel.
Un ejemplo:
date code_ID name_ID new_value
2021-03-10T17:00:00 13 Alpha 372
2021-03-11T17:00:00 13 Alpha 608
2021-03-12T17:00:00 13 Alpha 515
2021-03-13T17:00:00 13 Alpha 320
2021-03-14T17:00:00 13 Alpha 323
2021-03-15T17:00:00 13 Alpha 329
2021-03-16T17:00:00 13 Alpha 212
2021-03-17T17:00:00 13 Alpha 304
2021-03-18T17:00:00 13 Alpha 462
2021-03-10T17:00:00 17 Beta 115
2021-03-11T17:00:00 17 Beta 151
2021-03-12T17:00:00 17 Beta 141
2021-03-13T17:00:00 17 Beta 137
2021-03-14T17:00:00 17 Beta 106
2021-03-15T17:00:00 17 Beta 67
2021-03-16T17:00:00 17 Beta 166
2021-03-17T17:00:00 17 Beta 126
2021-03-18T17:00:00 17 Beta 179
2021-03-10T17:00:00 8 eta-firm 2155
2021-03-11T17:00:00 8 eta-firm 2845
2021-03-12T17:00:00 8 eta-firm 3477
2021-03-13T17:00:00 8 eta-firm 2950
2021-03-14T17:00:00 8 eta-firm 3023
2021-03-15T17:00:00 8 eta-firm 2822
2021-03-16T17:00:00 8 eta-firm 2184
2021-03-17T17:00:00 8 eta-firm 2026
2021-03-18T17:00:00 8 eta-firm 2531
2021-03-10T17:00:00 6 phi hotel 866
2021-03-11T17:00:00 6 phi hotel 991
2021-03-12T17:00:00 6 phi hotel 971
2021-03-13T17:00:00 6 phi hotel 953
2021-03-14T17:00:00 6 phi hotel 604
2021-03-15T17:00:00 6 phi hotel 398
2021-03-16T17:00:00 6 phi hotel 672
2021-03-17T17:00:00 6 phi hotel 986
2021-03-18T17:00:00 6 phi hotel 1058
¿Cómo hago una fórmula que calcule la media móvil, por code_ID y fecha, posiblemente usando el paquete base de R?
La fórmula es (formato látex):
{\hat{y}_{t}} = \frac{y_{t-6} + y_{t-5} + y_{t-4} + y_{t-3} + y_{t-2} + y_{t-1} + y_{t}}{7}
¿Qué tal esto?
dat <- tibble::tribble(~date, ~code_ID, ~name_ID, ~new_value,
"2021-03-10 17:00:00", 13, "Alpha", 372,
"2021-03-11 17:00:00", 13, "Alpha", 608,
"2021-03-12 17:00:00", 13, "Alpha", 515,
"2021-03-13 17:00:00", 13, "Alpha", 320,
"2021-03-14 17:00:00", 13, "Alpha", 323,
"2021-03-15 17:00:00", 13, "Alpha", 329,
"2021-03-16 17:00:00", 13, "Alpha", 212,
"2021-03-17 17:00:00", 13, "Alpha", 304,
"2021-03-18 17:00:00", 13, "Alpha", 462,
"2021-03-10 17:00:00", 17, "Beta", 115,
"2021-03-11 17:00:00", 17, "Beta", 151,
"2021-03-12 17:00:00", 17, "Beta", 141,
"2021-03-13 17:00:00", 17, "Beta", 137,
"2021-03-14 17:00:00", 17, "Beta", 106,
"2021-03-15 17:00:00", 17, "Beta", 67,
"2021-03-16 17:00:00", 17, "Beta", 166,
"2021-03-17 17:00:00", 17, "Beta", 126,
"2021-03-18 17:00:00", 17, "Beta", 179,
"2021-03-10 17:00:00", 8 , "eta-firm", 2155,
"2021-03-11 17:00:00", 8 , "eta-firm", 2845,
"2021-03-12 17:00:00", 8 , "eta-firm", 3477,
"2021-03-13 17:00:00", 8 , "eta-firm", 2950,
"2021-03-14 17:00:00", 8 , "eta-firm", 3023,
"2021-03-15 17:00:00", 8 , "eta-firm", 2822,
"2021-03-16 17:00:00", 8 , "eta-firm", 2184,
"2021-03-17 17:00:00", 8 , "eta-firm", 2026,
"2021-03-18 17:00:00", 8 , "eta-firm", 2531,
"2021-03-10 17:00:00", 6 , "phi hotel", 866,
"2021-03-11 17:00:00", 6 , "phi hotel", 991,
"2021-03-12 17:00:00", 6 , "phi hotel", 971,
"2021-03-13 17:00:00", 6 , "phi hotel", 953,
"2021-03-14 17:00:00", 6 , "phi hotel", 604,
"2021-03-15 17:00:00", 6 , "phi hotel", 398,
"2021-03-16 17:00:00", 6 , "phi hotel", 672,
"2021-03-17 17:00:00", 6 , "phi hotel", 986,
"2021-03-18 17:00:00", 6 , "phi hotel", 1058)
dat$date <- anytime::anytime(dat$date)
id <- dat$code_ID
s <- split(dat, id)
l <- lapply(s, function(x)cbind(x, ma=rowMeans(sapply(0:6, function(t)lag(x$new_value, t)))))
out <- do.call(bind_rows, l)
out
# date code_ID name_ID new_value ma
# 1 2021-03-10 17:00:00 6 phi hotel 866 NA
# 2 2021-03-11 17:00:00 6 phi hotel 991 NA
# 3 2021-03-12 17:00:00 6 phi hotel 971 NA
# 4 2021-03-13 17:00:00 6 phi hotel 953 NA
# 5 2021-03-14 17:00:00 6 phi hotel 604 NA
# 6 2021-03-15 17:00:00 6 phi hotel 398 NA
# 7 2021-03-16 17:00:00 6 phi hotel 672 779.2857
# 8 2021-03-17 17:00:00 6 phi hotel 986 796.4286
# 9 2021-03-18 17:00:00 6 phi hotel 1058 806.0000
# 10 2021-03-10 17:00:00 8 eta-firm 2155 NA
# 11 2021-03-11 17:00:00 8 eta-firm 2845 NA
# 12 2021-03-12 17:00:00 8 eta-firm 3477 NA
# 13 2021-03-13 17:00:00 8 eta-firm 2950 NA
# 14 2021-03-14 17:00:00 8 eta-firm 3023 NA
# 15 2021-03-15 17:00:00 8 eta-firm 2822 NA
# 16 2021-03-16 17:00:00 8 eta-firm 2184 2779.4286
# 17 2021-03-17 17:00:00 8 eta-firm 2026 2761.0000
# 18 2021-03-18 17:00:00 8 eta-firm 2531 2716.1429
# 19 2021-03-10 17:00:00 13 Alpha 372 NA
# 20 2021-03-11 17:00:00 13 Alpha 608 NA
# 21 2021-03-12 17:00:00 13 Alpha 515 NA
# 22 2021-03-13 17:00:00 13 Alpha 320 NA
# 23 2021-03-14 17:00:00 13 Alpha 323 NA
# 24 2021-03-15 17:00:00 13 Alpha 329 NA
# 25 2021-03-16 17:00:00 13 Alpha 212 382.7143
# 26 2021-03-17 17:00:00 13 Alpha 304 373.0000
# 27 2021-03-18 17:00:00 13 Alpha 462 352.1429
# 28 2021-03-10 17:00:00 17 Beta 115 NA
# 29 2021-03-11 17:00:00 17 Beta 151 NA
# 30 2021-03-12 17:00:00 17 Beta 141 NA
# 31 2021-03-13 17:00:00 17 Beta 137 NA
# 32 2021-03-14 17:00:00 17 Beta 106 NA
# 33 2021-03-15 17:00:00 17 Beta 67 NA
# 34 2021-03-16 17:00:00 17 Beta 166 126.1429
# 35 2021-03-17 17:00:00 17 Beta 126 127.7143
# 36 2021-03-18 17:00:00 17 Beta 179 131.7143
Lo anterior es la solución base R. Si estuviera dispuesto a utilizar dplyr y zoo, podría hacerlo de la siguiente manera:
dat %>%
group_by(code_ID) %>%
mutate(ma = zoo::rollmean(new_value, k=7, fill=NA, align="right"))
Es un poco más fácil si está dispuesto a usar paquetes, pero como la pregunta solicita la base R, usando solo eso tenemos lo siguiente. ave aplica roll a new_value mediante code_ID y roll se implementa tomando los medios de fila de incrustar o usando un filtro o una combinación de cumsum y diff
roll <- function(x, n = 7) c(rep(NA, n-1), rowMeans(embed(x, n)))
dat2 <- transform(dat, mean7 = ave(new_value, code_ID, FUN = roll))
o utilice una de estas alternativas para rodar:
roll2 <- function(x, n = 7) stats::filter(x, rep(1, n) / n, sides = 1)
roll3 <- function(x, n = 7) c(rep(NA, n-1), diff(cumsum(c(0, x)), n)/n)
roll4 <- function(x, n = 7) c(rep(NA, n-1), apply(embed(x, n), 1, mean))
4
Gracias. Otra pregunta: ¿Cómo puedo calcular este tiempo (nuevamente por code_ID), ((media7 del día t / media7 del día t-7) -1)?
- Último nacido28/03/2021 a las 18:41
Si dat2 es como en la respuesta, entonces div <- function(x, n = 7) x / c(rep(NA, n), head(x, -n)); transformar(dat2, div7 = ave(mean7, code_ID, FUN = div))
-G. Grothendieck29/03/2021 a las 0:10
Gracias de nuevo. En general, cuando los datos son de panel, ¿es posible insertarlos?directamente una fórmula (como la que publiqué) usando code_ID (y tiempo), para poder hacer algún cálculo.
- Último nacido29 de marzo de 2021 a las 8:08
Se ha agregado roll4. En ese sentido, un medio puede ser reemplazado por cualquier otra función similar.
-G. Grothendieck29/03/2021 a las 12:40
Aquí hay dos formas de utilizar la función rollmeanr del paquete zoo.
El primero no asigna la salida a una nueva columna,el segundo sí.
library(zoo)
by(df1$new_value, df1$code_ID, function(x)
rollmeanr(x, k = 7, fill = NA)
)
df1$mean6 <- with(df1, ave(new_value, code_ID, FUN = function(x) rollmeanr(x, k = 7, fill = NA)))
head(df1, 10)
# date code_ID name_ID new_value mean6
#1 2021-03-10 17:00:00 13 Alpha 372 NA
#2 2021-03-11 17:00:00 13 Alpha 608 NA
#3 2021-03-12 17:00:00 13 Alpha 515 NA
#4 2021-03-13 17:00:00 13 Alpha 320 NA
#5 2021-03-14 17:00:00 13 Alpha 323 NA
#6 2021-03-15 17:00:00 13 Alpha 329 NA
#7 2021-03-16 17:00:00 13 Alpha 212 382.7143
#8 2021-03-17 17:00:00 13 Alpha 304 373.0000
#9 2021-03-18 17:00:00 13 Alpha 462 352.1429
#10 2021-03-10 17:00:00 17 Beta 115 NA
Datos en formato dput.
df1 <-
structure(list(date = structure(c(1615395600, 1615482000, 1615568400,
1615654800, 1615741200, 1615827600, 1615914000, 1616000400, 1616086800,
1615395600, 1615482000, 1615568400, 1615654800, 1615741200, 1615827600,
1615914000, 1616000400, 1616086800, 1615395600, 1615482000, 1615568400,
1615654800, 1615741200, 1615827600, 1615914000, 1616000400, 1616086800,
1615395600, 1615482000, 1615568400, 1615654800, 1615741200, 1615827600,
1615914000, 1616000400, 1616086800), class = c("POSIXct", "POSIXt"
), tzone = ""), code_ID = c(13L, 13L, 13L, 13L, 13L, 13L, 13L,
13L, 13L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L
), name_ID = c("Alpha", "Alpha", "Alpha", "Alpha", "Alpha", "Alpha",
"Alpha", "Alpha", "Alpha", "Beta", "Beta", "Beta", "Beta", "Beta",
"Beta", "Beta", "Beta", "Beta", "eta-firm", "eta-firm", "eta-firm",
"eta-firm", "eta-firm", "eta-firm", "eta-firm", "eta-firm", "eta-firm",
"phi hotel", "phi hotel", "phi hotel", "phi hotel", "phi hotel",
"phi hotel", "phi hotel", "phi hotel", "phi hotel"), new_value = c(372L,
608L, 515L, 320L, 323L, 329L, 212L, 304L, 462L, 115L, 151L, 141L,
137L, 106L, 67L, 166L, 126L, 179L, 2155L, 2845L, 3477L, 2950L,
3023L, 2822L, 2184L, 2026L, 2531L, 866L, 991L, 971L, 953L, 604L,
398L, 672L, 986L, 1058L), mean6 = c(NA, NA, NA, NA, NA, NA, 382.714285714286,
373, 352.142857142857, NA, NA, NA, NA, NA, NA, 126.142857142857,
127.714285714286, 131.714285714286, NA, NA, NA, NA, NA, NA, 2779.42857142857,
2761, 2716.14285714286, NA, NA, NA, NA, NA, NA, 779.285714285714,
796.428571428571, 806)), row.names = c(NA, -36L), class = "data.frame")
0