R: Resuma y devuelva el número de ocurrencias únicas en la columna B mientras que group_by la columna A

CorePress2023-11-30  1

Tengo los siguientes datos, dfs_alltasks:

    by_hour task
1   0       Apple Receiving
2   0       Apple Receiving
3   0       Orange Receiving
4   0       Banana Receiving
5   0       Banana Receiving
6   0       Orange Receiving
7   1       Orange Receiving
8   1       Banana Receiving
9   1       Banana Receiving
10  1       Banana Receiving
11  1       Banana Receiving
12  1       Banana Receiving
13  1       Orange Receiving
14  2       Banana Receiving
15  3       Banana Receiving

Me gusta agrupar por la columna "por_hora" y al mismo tiempo resumir y devolver el número. de tareas que ocurren durante el grupo, debería obtener algo como esto:

    by_hour task              count
1   0       Apple Receiving   2
2   0       Orange Receiving  2
3   0       Banana Receiving  2
4   1       Orange Receiving  2
5   1       Banana Receiving  5
6   2       Banana Receiving  1
7   3       Banana Receiving  1

He probado: dfs_alltasks %>% group_by(by_hour) %>% summarise_all(no_rows = length(tarea))

pero aparece el error "Error en lista2(...): objeto 'tarea' no encontrado"

  • Parece que solo quieres dplyr::count(dfs_alltasks, by_hour, task). &Dakota del Norteceniza; Ritchie Sacramento 6 de junio de 2020 a las 15:27
  • Ya que desea agrupar por 'por_hora' y 'tarea' debes incluir ambos en el argumento group_by. Tampoco es necesario resumir_todo. resumen hará el trabajo y en lugar de longitud (tarea), use n() para contar el número de filas en cada segmento. - Ali 6 de junio de 2020 a las 15:38


------------------------

No necesitas agrupar por

library(tidyverse)

df_example <-
  structure(list(
    by_hour = c(0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1,
                1, 2, 3),
    task = c(
      "Apple Remaining",
      "Apple Remaining",
      "Orange Remaining",
      "Banana Remaining",
      "Banana Remaining",
      "Orange Remaining",
      "Orange Remaining",
      "Banana Remaining",
      "Banana Remaining",
      "Banana Remaining",
      "Banana Remaining",
      "Banana Remaining",
      "Orange Remaining",
      "Banana Remaining",
      "Banana Remaining"
    )
  ),
  class = "data.frame",
  row.names = c(NA, -15L))

df_example %>% 
  count(by_hour,task)
#>   by_hour             task n
#> 1       0  Apple Remaining 2
#> 2       0 Banana Remaining 2
#> 3       0 Orange Remaining 2
#> 4       1 Banana Remaining 5
#> 5       1 Orange Remaining 2
#> 6       2 Banana Remaining 1
#> 7       3 Banana Remaining 1

Creado el 6 de junio de 2020 por el paquete reprex (v0.3.0)



------------------------

Prueba esto:

library(tibble)
library(dplyr)
data <- tibble::tribble(
   ~by_hour, ~task,
  0 ,      "Apple Receiving",  
  0 ,      "Apple Receiving", 
  0 ,      "Orange Receiving",
  0 ,      "Banana Receiving",
  0 ,      "Banana Receiving",
  0 ,      "Orange Receiving",
  1 ,      "Orange Receiving",
  1 ,      "Banana Receiving",
  1 ,      "Banana Receiving",
  1 ,      "Banana Receiving",
  1 ,      "Banana Receiving",
  1 ,      "Banana Receiving",
  1 ,      "Orange Receiving",
  2 ,      "Banana Receiving",
  3 ,      "Banana Receiving")
data %>% group_by(by_hour,task) %>% summarize(count=n()) %>% ungroup()


------------------------

Considere proporcionar una muestra de sus datos usando dput()

df <- structure(list(by_hour = c(0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 
1, 2, 3), task = c("Apple Remaining", "Apple Remaining", "Orange Remaining", 
"Banana Remaining", "Banana Remaining", "Orange Remaining", "Orange Remaining", 
"Banana Remaining", "Banana Remaining", "Banana Remaining", "Banana Remaining", 
"Banana Remaining", "Orange Remaining", "Banana Remaining", "Banana Remaining"
)), class = "data.frame", row.names = c(NA, -15L))

Puedes usar el paquete dplyr y group_by para tus variables.

library(dplyr)
df %>% 
  group_by(by_hour, task) %>% 
  count %>% 
  ungroup

Resultado

  by_hour task       n
    <dbl> <chr>  <int>
1       0 Apple      2
2       0 Banana     2
3       0 Orange     2
4       1 Banana     5
5       1 Orange     2
6       2 Banana     1
7       3 Banana     1


------------------------

También podemos usar

library(data.table)
setDT(df)[, .(n = .N), .(by_hour, task)]
Su guía para un futuro mejor - libreflare
Su guía para un futuro mejor - libreflare