[Remote] Principal Data Scientist, Health Informatics
Note: The job is a remote job and is open to candidates in USA. Waymark is a team of healthcare providers, technologists, and builders whose mission is to bring the best healthcare to people with Medicaid benefits. They are seeking a Principal Data Scientist to own clinical data quality and bring senior ML/AI and health economics judgment to their core data science products.
Responsibilities
- Own clinical data quality across claims, EHR, and ADT: Define standards for how clinical data is structured, normalized, and validated as modeling inputs across payer claims (medical, pharmacy, eligibility), EHR data (Epic, Cerner, Athena), and real-time ADT feeds. Bring deep familiarity with EHR data formats (FHIR, HL7, C-CDA) and how data from systems like Epic, Cerner, and Athena maps to clinical reality. Hold the bar for clinical accuracy and completeness across all three sources
- Build and ship production ML/AI models: Develop, validate, and deploy risk stratification, care gap prediction, treatment effect estimation, and LLM/foundation model applications — with rigor around leakage, calibration, fairness, and clinical face validity
- Apply health economics and outcomes methods: Translate raw clinical and claims data into decision-grade evidence through risk adjustment, utilization measurement, cost attribution, quasi-experimental evaluation, and outcomes measurement aligned with CMS, NCQA, and MCO reporting standards
- Advance machine and AI products: Bring senior modeling judgment to the product roadmap, owning the clinical and methodological soundness of what ships
- Set standards and mentor: Make architectural trade-offs, drive alignment across data science, engineering, product, and clinical stakeholders, and mentor junior data scientists to raise the technical bar of the team
Skills
- Healthcare Data Expertise: Deep, hands-on fluency with claims, EHR, and ADT data, and strong command of clinical terminologies (ICD-10, SNOMED CT, LOINC, RxNorm, CPT/HCPCS) and value set curation
- Standards Fluency: Working experience with healthcare data standards and exchange formats — FHIR, HL7v2, and C-CDA
- Education: Master's degree in Data Science, Biostatistics, Health Informatics, Computer Science, or a related field
- Python Proficiency: 7-8+ years of hands-on experience in Python, including data science and ML libraries
- Applied ML/AI Experience: Demonstrated ability to build, validate, and deploy production ML models on healthcare data, with end-to-end ownership from development through deployment and maintenance in a live environment. Experience with ML pipelines, model versioning, and reproducible workflows at scale
- Project Ownership: Proven ability to manage complex technical projects independently, align multiple stakeholders, and deliver on timelines
- PhD in health informatics, statistics, data science, or computer science
- Experience integrating EHR/HIE data via TEFCA, CommonWell, or comparable networks
- Health Economics & Outcomes Methods: Experience with risk adjustment, utilization and cost measurement, and quasi-experimental evaluation
- Familiarity with MLOps best practices including experiment tracking and model registry (e.g. MLflow), CI/CD for ML pipelines, feature stores, and workflow orchestration tools such as SageMaker Pipelines
- Prior experience building on Medicaid or dual-eligible populations
- Peer-reviewed publications in healthcare ML, AI, biostatistics, or health economics
Benefits
- Stock Options:Opportunity to invest in the company’s growth.
- Work-from-Home Stipend:A dedicated stipend for your first year to help set up your home office.
- Medical, Vision, and Dental Coverage:Comprehensive plans to keep you and your family healthy.
- Life Insurance: