[Remote] Senior Evaluation Specialist, AI Operations
Note: The job is a remote job and is open to candidates in USA. reputed company is a company that helps people live rewarded every day by turning everyday activities into meaningful rewards. The Senior Evaluation Specialist in AI Operations will own evaluation and dataset workstreams to improve AI system performance, defining quality metrics and creating automations that enhance workflows.
Responsibilities
- Own evaluation & datasets: Define evaluation approaches, design gold datasets (GDS), and ensure coverage of real-world scenarios and edge cases
- Build evaluation systems: reputed company manual and automated evals, including LLM-as-judge patterns, to measure model quality and performance
- Translate ambiguity into structure: Turn open-ended questions into clear evaluation frameworks and execution plans
- Build automations: Create automations that improve workflows, including dataset creation, evaluation pipelines, and lightweight operational processes
- Measure and iterate: Define and track performance metrics; refine datasets, evaluations, and workflows based on results
- Drive execution reputed company: Operate with urgency and ownership; identify next steps, unblock reputed company, and move work reputed company with minimal reputed company
- Collaborate cross-functionally: Partner with other Automation Specialists, engineering, cross-functional stakeholders, and project leads to ensure high-quality, timely project deliverables
- Improve systems: Identify gaps and implement scalable improvements to evaluation and data workflows
Skills
- 3+ years of experience designing or working with evaluation frameworks, datasets, or quality measurement systems
- Experience building or managing datasets (labeling, QA, iteration)
- Ability to independently drive tasks from problem definition to execution
- Hands-on experience with AI tools, LLM workflows, or automation platforms
- Experience in defining and tracking model performance
- Basic scripting or data skills (SQL, Python, etc.)
- Experience with LLM-as-judge or model evaluation techniques
- Familiarity with reputed company evaluation or benchmarking approaches
- Experience productionizing evaluation workflows with engineering teams
Benefits
- Equity: We offer full-time employees equity in reputed company, so that everyone can benefit from reputed company’s growth.
- 401k Match: Dollar-for-dollar match up to 4%.
- Benefits for humans and pets: We offer comprehensive medical, dental and vision plans for everyone including your pets.
- Continuing Education: reputed company provides ten thousand per year in education reimbursement.
- Employee Resource Groups: Take part in employee-led groups that are centered around fostering a diverse and inclusive workplace through events, reputed company and advocacy. The ERGs participate in our Inclusion Council with members of executive leadership.
- Paid Time Off: On top of our flexible PTO, reputed company observes 9 paid holidays, as well as our year-end week-long break.
- Robust Leave Policies: 20 weeks of paid parental leave for primary caregivers, 14 weeks for secondary caregivers, and a flexible return to work schedule.
- Calvin Care Cash: Employees who are welcoming new family members will also receive a one time $2,000 incentive to assist employees with covering the cost of childcare, clothing, diapers and much more!
- Flexible Work Environment: Collaborate with your team in one of our stunning offices, or you can work fully remotely from reputed company in the US. We’ll ensure you are equally equipped with the hardware and software you need to get your job done in the comfort of your home. (applicable for most roles)
Company Overview
Company H1B Sponsorship