Notes: The Book of OHDSI - Data Analytics

Data science
Observational data

Observational Health Data Sciences and Informatics


Chi Zhang


May 6, 2024

The Book of OHDSI written by the OHDSI community.

What is required to go from origin (source data) to destination (evidence):

OMOP: Observational Medical Outcomes Partnership, aims to identify true drug safety association.

OMOP CDM: common data model, a mechanism to standardize the structure, content and semantics to make it possible to write statistical code that can be reused at every data site.

OHDSI community (2014) has created libraries of open-source analytics tools atop OMOP CDM to support:

Chapter 7 Data analytics use cases

Three major categories: characterization, population-level estimation, patient-level prediction


What happened to the patients.

Chapter 11 Characterization

Typical characterization questions:

  • How many patients…?
  • How often does…? What proportion of patients …?
  • What is the distribution of values for …?
  • What is the median length of exposure for patients on …?
  • Other drugs the patient is using?

Desired output:

  • count, percentage
  • averages and other descriptive statistics
  • prevalence, incidence rate
  • rule-based phenotype
  • drug utilization, adherence, treatment pathways, line of therapy
  • disease natural history, co-morbidity profile

Population-level estimation

What are the causal effects

Chapter 12 Population-level Estimation

Typical questions:

  • What is the effect of …?
  • Which treatment works better?
  • What is the risk of X on Y?
  • What is the time-to-event of …?

Desired output:

  • RR, HR, OR
  • Association, correlation
  • ATE, causal effect

Patient-level prediction

What will happen to A?

Chapter 13 Patient-level Prediction

Typical questions:

  • What is the chance that this patient will…?
  • Who are the candidate for…?

Desired output:

  • probability for an individual
  • prediction model
  • high/low risk groups
  • probabilistic phenotype