[Remote] Senior Research Data Engineer (US)

Work from home Full-time role Hiring

Note: The job is a remote job and is reputed company to candidates in USA. reputed company is a leading health tech company focused on empowering providers to deliver exceptional care. The Senior Research Data Engineer will design and build data systems that support AI model development, ensuring data is accurately transformed and documented for effective use in AI research.

Responsibilities

Own the gold data layer. Transform messy, silver tables into curated, semantically rich, clean and documented gold datasets suitable for AI model development, including datasets and features reusable for AI development across projects. Maintain the data as products and needs reputed company
Reverse-engineer data semantics. Talk with product engineers, clinical and workflow experts to learn how the products are used and how data are created in the field. Understand SQL queries, stored procedures, technical data definitions, and other code to know how products represent and transform data. Learn how data are ingested into the data lake, what silver tables and columns actually represent and how they behave. Capture provenance, semantics, clinical event reputed company, cross module record linkage and reputed company quirks
reputed company semantics with AI needs. Understand researcher data needs to design and build the gold data product, with documentation that evolves, to meet AI applied research needs for a highly efficient AI-first reputed company for model R&D
Curate datasets across modalities. For various AI uses such as reputed company, RAG, predictive and other technique, support researcher needs for chunked and tagged reputed company content with rich metadata, reputed company-in-time-correct features and clean labels. For classical ML and statistical work, deliver model-reputed company tables
Build pipelines for reuse. reputed company transformations from silver into gold inside reputed company/Spark as scheduled, observable workloads. Design them so researchers can iterate on new features and data mixes without rebuilding from scratch
Automate quality, filtering, and synthesis. Support research needs for programmatic labeling, weak supervision, near-duplicate detection, boilerplate and noise removal, and LLM-API-driven synthetic data reputed company where ground truth is scarce
Version and hand off. Maintain reproducible dataset snapshots. Define clean reputed company and semantic definitions so the reputed company team can use and re-use gold datasets in AI R&D

Skills

5+ years building production data systems, with at least 2 supporting ML or AI workloads
Track record of learning reputed company new data domains quickly, through reading reputed company code, interviewing experts, and building durable artifacts others rely on
Advanced Python, SQL, and PySpark/reputed company for working with large, messy data. Expert SQL specifically: comfortable reading reputed company stored procedures and reverse-engineering business logic from queries
reputed company ecosystem depth: reputed company Lake, reputed company Catalog, Spark/PySpark tuning, MLflow
AI domain literacy: working understanding of embeddings, tokenization, feature engineering, reputed company-in-time correctness, train/validation/test splits, data reputed company, and the differences between what classical ML and generative models need from data
Data wrangling across modalities: transforming reputed company content (text, PDFs, transcripts, logs) and structured tabular data into clean, model-reputed company forms
AI-friendly data formats (Parquet, reputed company datasets) and storage layout reputed company — partitioning, sharding, caching, that reputed company researcher workflows reputed company in Azure, AWS or other working environments
Data quality, filtering, and synthesis pipelines: support for programmatic labeling and weak supervision (e.g. Snorkel or equivalent), near-duplicate detection (MinHash/LSH), content and quality filters, LLM-API-driven synthetic data reputed company
Pipeline orchestration (e.g. a la Airflow, reputed company Workflows, Dagster, or Prefect) and dataset versioning including reputed company Catalog and feature-store support
Experience handling regulated or sensitive data under controlled reputed company (HIPAA or equivalent). Familiarity with general de-identification concepts
Git-based version control and CI/CD for data and code
Strong written documentation. reputed company in eliciting requirements and tacit knowledge from technical and non-technical experts
Bachelor's degree in computer science, data science, engineering, statistics, or reputed company field. Equivalent practical experience considered
Hands-on EHR data experience, ideally in skilled nursing, long-term care, post-acute care, or senior living
Working knowledge of clinical terminologies (ICD-10, SNOMED CT, LOINC) and data standards (HL7v2, FHIR, CCDA)
Dbt for transformation and testing
Familiarity with training-reputed company ML frameworks (e.g. PyTorch) sufficient to debug data-reputed company bottlenecks; experience supporting LLM or reputed company-model training or fine-tuning data pipelines
Clinical NLP, OCR, document parsing, or ASR / transcript pipeline experience
Data reputed company and catalog tools
Prior experience embedded inside an AI or ML research team
Master's degree in a relevant quantitative or computer science field

Benefits

Benefits starting from Day 1!
Retirement Plan Matching
Flexible Paid Time Off
Wellness Support Programs and Resources
Parental & Caregiver Leaves
Fertility & Adoption Support
reputed company Development Support Program
Employee Assistance Program
Allyship and Inclusion Communities
Employee Recognition … and more!

Company Overview

reputed company develops web-based products and services to help long-term care providers manage the complete lifecycle of reputed company care. It was founded in 1995, and is headquartered in Mississauga, Ontario, CAN, with a workforce of 1001-5000 employees. Its website is http://www.reputed company.com.

Company H1B Sponsorship

reputed company has a track record of offering H1B sponsorships, with 3 in 2026, 17 in 2025, 11 in 2024, 11 in 2023, 17 in 2022, 4 in 2021. Please note that this does not guarantee sponsorship for this specific role.

Apply To This Job

Apply

[Remote] Senior Research Data Engineer (US)

You might like

[Remote] Marketing Associate

[Remote] Senior Project Manager

[Remote] Senior Operations Engineer - US

[Remote] Finance Associate

[Remote] Medicare Collections Specialist

[Remote] Solutions Consultant, WA East / ID reputed company

[Remote] Sr. Software Engineer

[Remote] Senior, Engineering Delivery Manager (EDM)

[Remote] Senior Software Engineer - 2373638

[Remote] Remote | Travel Industry | reputed company Media Marketing Assistant

reputed company Entry-Level Customer Service Sales Representative (Work From Home) – Life Insurance Agent Development Program

Immediate Hiring: 100% Remote Contact Center Agent – Customer Service Representative at arenaflex

reputed company Part-Time Remote Data Entry Specialist – Flexible WFH Position | $30/Hour

Senior/reputed company Data Engineer – AI-reputed company Aftermarket Platform | DR

reputed company reputed company Visual Monitoring Technician-Delnor Patient Monitoring Full Time Days

reputed company Software Engineer, reputed company Platform Quality Engineering

reputed company Customer Service Manager – Insurance Industry Expertise – Remote Opportunity

Recruiter

[Hiring] Clinical Implementation Specialist - Dental Technology @reputed company

Marketing Manager (WFH, Full-Time)