Pointers for analytics of healthcare data

A start

In recent time I've been asked for pointers on analysing complex healthcare data. This is a difficult issue. Healthcare analytics / health informatics / medical informatics / etc. range over a wide area, driven a wide variety of interests and outcomes, overlapping hugely in some areas and not at all in others. The area is rapidly evolving but there's not a lot of formal, standardised learning. It's in around the same place bioinformatics was 20 years ago. So this is an attempt to sift the chaff, and list some useful pointers for those wanting to know more.

This is expected to be a living evolving document

What is this about?

Secondary use of healthcare data, for identifying patient populations, stratification, pharmaceuticals development, scientific research etc.
Real World Evidence and observational studies

It is explicitly not about:

Primary use of healthcare data, for direct improvements of patient care
Healthcare IT
Hospital software and IT systems

Skills & people

Skill-lists for scientific & technical disciplines often end up as sprawling wish-lists (c.v. the NIH list of critical skills for bioinformatics that includes several subjects at degree levels of mastery ...). So the following is given lightly and with the full intention that most is learnt on the job:

Analytics: scripting (e.g. R or Python), visualisation, data handling
Background & domain knowledge: comfort with biomedical and healthcare terms
Database access, some SQL
Some statistics
Awareness of standards and terminologies (e.g. CDSIC, ontologies, etc.)

People who are usefully good at health informatics often are or had had job titles like:

health informaticians
bioinformaticians
biomedical data scientists
clinicians, pharmacologists etc. who have got into programming

Books & papers

There's some interesting reading out there but you often have to pick out the relevant nuggets amidst material that intended for hospital staff or administrators:

O'Reilly has a surprisings number of relevant titles:
- Anonymising Health Data is focused but has good coverage of privacy and governance
I was on a paper Opportunities and obstacles for deep learning in biology and medicine. Although some of it is unashamedly molecular, there is much that is about patients and higher-level data.
There's a slightly older book from 2016, "Secondary Analysis of Electronic Health Records" that's still quite useful
The Book of OHDSI, as below.
Analytics in Healthcare is once again mixed bag but very recent

Short courses

ClassCentral has a list of online classes in bioinformatics and healthcare

Longer courses

There's a lot of "health informatics" courses out there, but some are more about making people familiar with the technology and landscape, or talking about the IT plumbing. Some possibly relevant ones in the UK include:

https://www.mastersportal.com/search/#q=ci-30|di-283|lv-master
https://www.prospects.ac.uk/jobs-and-work-experience/job-sectors/healthcare/how-to-get-started-in-health-informatics
UEdinburgh https://www.ed.ac.uk/bayes/about-us/our-work/education/workforce-development/courses/health-data-science
ULeeds MSc Precision Medicine: Genomics & Analytics
London School of Hygiene and Tropical Medicine MSc Health Data Science
UCL AI Enabled Healthcare
U Manchester Health Data Science, Clinical Bioinformatics, Health Informatics
UCL courses: https://www.ucl.ac.uk/health-informatics/node/787/health-informatics-mscpgdippgcert
Kings College London Applied Statistical Modelling & Health Informatics PgCert
Bournemouth Digital Health and Artificial Intelligence MSc
University of West London: Health Informatics
City & Guilds Health Informatics
Imperial College Cancer Informatics (MRes), Data Science (Biomedical Research MRes), Health Data Analytics and Machine Learning (MSc)

Miscellaneous

Much is made of Real World Data and Real World Evidence and Real World Analytics. Tell you a secret - these are effectively all the same thing, despite the protests of experts.

AMIA (the American Medical Informatics Association) covers all the various flavours of healthcare and medical informatics, including our use cases, and hosts some excellent courses.

OHDSI (the Observational Health Data Sciences and Informatics program) is heading the standardisation of healthcare data into a common model called OMOP. This looks like becoming the dominant model for federated analysis of healthcare data. Read the Book of OHDSI for more.