Data manipulation with data.table

Data manip

data.table paradigm

Author

Chi Zhang

Published

April 20, 2024

data.table manipulation

Code
# make some fake data
id <- c(
  rep(1, 3), 
  rep(2, 2), 
  rep(3, 6), 
  rep(4, 1)
)

ab_types <- c(
  'ampicillin', 'gentamicin', 'cefalotin', 'metronidazol'
)
ab <- sample(ab_types, size = length(id), replace = T)

dt <- data.table::data.table(
  id = id, ab = ab
)
dt
       id           ab
    <num>       <char>
 1:     1   gentamicin
 2:     1   ampicillin
 3:     1    cefalotin
 4:     2 metronidazol
 5:     2    cefalotin
 6:     3   gentamicin
 7:     3 metronidazol
 8:     3   gentamicin
 9:     3   ampicillin
10:     3 metronidazol
11:     3   gentamicin
12:     4   gentamicin

Subsetting

Remove a column temporarily

dt[, !c('id')]
              ab
          <char>
 1:   gentamicin
 2:   ampicillin
 3:    cefalotin
 4: metronidazol
 5:    cefalotin
 6:   gentamicin
 7: metronidazol
 8:   gentamicin
 9:   ampicillin
10: metronidazol
11:   gentamicin
12:   gentamicin

By, .SD

Get number of records per person

dt[, .N, by = id]
      id     N
   <num> <int>
1:     1     3
2:     2     2
3:     3     6
4:     4     1

Number of unique records

# if use lapply, the results will be a list in the column
dt[, .(n_uab = sapply(.SD, function(x){length(unique(x))})), 
   .SDcols = 'ab', 
   by = id]
      id n_uab
   <num> <int>
1:     1     3
2:     2     2
3:     3     3
4:     4     1