Course review: making DS work for clinical reporting

Reporting
Data science
RSE

A review of the Coursera course provivded by Genentech and Roche, on “Making data science work for clinical reporting”.

Author

Chi Zhang

Published

March 1, 2023

Overview

This is a course provided by Genentech (part of Roche) on Coursera (course link). It is not necessary to have a paid coursera membership to view the course, everyone could access it.

It is a 4 part course released one month ago (Jan/Feb 2023), and it seems that a follow-up will be released in the future.

Overall I think it strikes a good balance between high-level introduction of the good practices, and examples with how they are implemented. Even though the course focuses on clinical reporting in the pharmaceutical industry, the practices are highly relevant in other sectors as well (e.g. public health, academia, other industries that use open-source software).

Specific statistical methods, packages are introduced only at a high-level; which means the course is not for learning how to use this or that packages; but good practice guidelines.

In my opinion,

  • it would be useful if the learner has some experience with software development and/or statistics; otherwise learners might not know how to practice them.
  • most of the examples are related to R packages (understandable), so some experience with R package (use or develop) is useful.
  • it could be a very good study material for university students in related subjects.

Each module

  • Module 1 (notes): what the requirements are regarding clinical reporting, what should be done to meet the quality standards;
  • Module 2 (notes): DevOps and Agile
  • Module 3 (notes): version Control, git workflows, reproducible clinical reporting
  • Module 4 (notes): code quality, robust and reusable code, R packages
  • Module 5 (notes): risk management with open source software

Highlight

I have a few years of experience as an R developer and academic researcher in related fields, so not all concepts are new to me. Nevertheless, I still learned quite a bit. For example,

  • (Module 1) Data and results sharing needs to follow certain standards, such as CDISC; there are different industry standards to follow when it comes to data acquisition, tabulation and analysis (e.g. ADaM)
  • (Module 2) Data scientists not only need hard skills, but also soft skills - they need to be able to wear many hats, and be more flexible and resilient.
  • (Module 4, 5) Tests are extremely important. Think afar, develop your package so that they can be extended in the future. Design your package first, don’t start making your package immediately.

Ways forward as a practitioner

There are things I know I have started doing (like write function documentation and vignette, use small functions rather than large and long ones); yet there are lots of things I should be making an active effort on.

I have beenn using CI/CD “unconsciously”: my team had been using it, GitHub Action builds my pkgdown website automatically; I’ve learned the basics of CI/CD and GitHub Actions at one occasion; and that is it. This should be done in a better and more consistent way.

More careful package planning should be done; tests should definitely be prioritized in my future code. Use object oriented design so that I can extend the functionalities in the future.