Data manipulation with data.table

Data manip

data.table paradigm

Author

Chi Zhang

Published

April 20, 2024

data.table manipulation

Code
# make some fake data
id <- c(
  rep(1, 3), 
  rep(2, 2), 
  rep(3, 6), 
  rep(4, 1)
)

ab_types <- c(
  'ampicillin', 'gentamicin', 'cefalotin', 'metronidazol'
)
ab <- sample(ab_types, size = length(id), replace = T)

dt <- data.table::data.table(
  id = id, ab = ab
)
dt
       id         ab
    <num>     <char>
 1:     1 gentamicin
 2:     1 ampicillin
 3:     1  cefalotin
 4:     2 gentamicin
 5:     2 ampicillin
 6:     3 ampicillin
 7:     3  cefalotin
 8:     3 ampicillin
 9:     3 ampicillin
10:     3 gentamicin
11:     3  cefalotin
12:     4 ampicillin

Subsetting

Remove a column temporarily

dt[, !c('id')]
            ab
        <char>
 1: gentamicin
 2: ampicillin
 3:  cefalotin
 4: gentamicin
 5: ampicillin
 6: ampicillin
 7:  cefalotin
 8: ampicillin
 9: ampicillin
10: gentamicin
11:  cefalotin
12: ampicillin

By, .SD

Get number of records per person

dt[, .N, by = id]
      id     N
   <num> <int>
1:     1     3
2:     2     2
3:     3     6
4:     4     1

Number of unique records

# if use lapply, the results will be a list in the column
dt[, .(n_uab = sapply(.SD, function(x){length(unique(x))})), 
   .SDcols = 'ab', 
   by = id]
      id n_uab
   <num> <int>
1:     1     3
2:     2     2
3:     3     3
4:     4     1