Havas Health presentation at the Chief Analytics Officer, Fall 2016
- Changing Lives in Healthcare through Machine Learning VP, Innovation RedBird, a Havas Health Company @1Geek By Douglas Barr
- ML Checklist • Step 1. Pick an industry. Any industry. • Step 2. Find a problem that can be formulated as a function. • Step 3. Is that function non-trivial? If not, go back to Step 1. • Step 4. List all the input parameters for that function. • Step 5. Is any of them accurately observable? If not, focus on that parameter and go back to Step 2. • Step 6. Apply some ML to it. (Choice of the tech wouldn't matter that much.) • Step 7. Had some improvement?
- Why Am I On-stage? • Primary focus ~4 yrs has been on ML algos • Bachelor of Science, CS/EE M.I.T(IFHTP) • Developed some very good models for tackling issues in healthcare • Developed a super-cool conversational AI bot to help patients with diabetes (shameful plug)
- ABOUT US MEASUREMENT, ANALYSIS, & IMPROVEMENT. “Trust us, it worked” is something you will never hear at REDBIRD. Providing rich data analysis is invaluable in today's environment. At REDBIRD, we help brands break through the clutter to understand not only what to measure and why, but what to do with the information as well We Are REDBIRD
- Yay! ML Library Buzzwords! • Python, C++ • Libraries • My own ConvoNet for NLP torch Keras
- Life is short. Do stuff that matters. - Paul Graham, Y Cominbator
- What A $%*# Mess!
- Patient Life
- We Are More Connected Than Ever
- Machine Learning & Healthcare • Medtronic partnered with IBM for Sugar.IQ app • Adherence: AiCure • AI Coach: RedBird HealthBot (shameful plug…AGAIN!) • Healthy Behavior: Welltok partnered with IBM Watson
- Predictive Medicine
- Don’t Be Evil • ML can be used for good, hopefully not bad • ANNs in a regulated industry? Hmmm…..
- The most important commodity I know of is information. - Gordon Gekko, Wall Street (1987)
- CASE STUDY: PREDICT READMISSION CLASSES OF PATIENTS DISCHARGED TO HOME
- Problem Readmission rate is one of the key indicators for the hospitals to maintain their quality. In 2014, Medicare fined a record number of 2,610 hospitals for having too many patients return within a month. Source: http://khn.org/news/medicare-readmissions-penalties-2015/
- Objective Predict readmission classes of patients discharge to home: 1. Readmitted within 30 days after discharge 2. Readmitted after 30 days after discharge 3. No readmission (between 1999-2008) Predicting readmission within 30 days is very critical for not only the hospitals but patients as well
- About The Data • 101,766 patients hospitalization records • Health Facts data was an extract representing 10 years (1999-2008) of clinical care at 130 hospitals http://www.hindawi.com/journals/bmri/2014/781670/
- Feature Extraction • To obtain a high degree of predictive accuracy, our model learned and identified the following 24 features for training: race', 'gender', 'ages', 'admission', 'discharge', 'admsource', 'time in hospital’, 'payer code', 'num lab procedures', 'num procedures', 'num medications’, 'number outpatient', 'number emergency', 'number impatient’, 'diag1', 'diag2' 'number diagnoses', 'max glu serum', 'A1Cresult', 'insulin', 'change', 'diabetesMed'
- Model • Model was categorized into 40 categories based on ICD-9 codes. • Data was split 4:1:5 (training: validation: testing set) • Tested the following classifiers: • Random Forest • KNN • LR (Lasso and Ridge regularization) • Naïve Bayes • SVM
- Classifiers
- Random Forest – F1
- Top 18 Important Features 1. num_lab_procedures: 0.0463 2. Num_medications: 0.0454 3. Number_inpatient: 0.0442 4. Time_in_hospital: 0.0400 5. Ages: 0.0391 6. Number_diagnoses: 0.0365 7. Num_procedures: 0.0325 8. Gender_male: 0.0239 9. Number_outpatient: 0.0203 10. Number_emergency: 0.0186 11. Insulin_steady: 0.0161 12. Payer_code_MC: 0.0150 13. Race_caucasian: 0.0150 14. Diag2_circulatory: 0.0142 15. Medication change: 0.0137 16. Admission_urgent: 0.0118 17. Diag3_neoplasms: 0.0104 18. Diag2_diabetes: 0.0103
- Conclusion With 98% accuracy, our model is a good indicator as to what could be done with more data
- CASE STUDY: DUCHENE MUSCULAR DYSTROPHY Predicting carrier diagnosis using Machine Learning
- Objective Inform females of their chances of being a carrier of Duchenne Muscular Dystrophy (DMD) based on serum markers and family pedigree
- About The Data • The data was obtained from M.Percy, Vanderbilt University, 1985 • 209 observations corresponded to blood samples on 192 patients (17 patients have two samples) • Collected as part of a screening program for female relatives of boys with DMD
- The Data (cont.) • Enzyme levels were measured in known carriers (75 samples) and in a group of non-carriers (134 samples) • Of note: The first two serum markers, creatine kinase and hemopexin (ck,h) are inexpensive to obtain, while pyruvate kinase and lactate dehydroginase (pk,ld) are more expensive • It is of interest to measure how much pk and ld add toward predicting the carrier status
- Result •Using a Two-Class Decision Forest algorithm, we obtained a 95% accuracy in our predictive model with 87% precision •Further stats: • True Positives: 14 • False Positives: 2 • False Negatives: 0 • True Negative: 26
- Improving Future Care through Machine Learning • For healthcare, the problem really isn’t regulation, it’s data • Can we truly base health decisions on some black- box computations? • We need to really begin thinking about ramifications • With great power, comes great responsibility
- Thank you.
Topics:
Presentation,
Machine Learning,
CAO,
CDAO,
Data,
Data Analytics