See all roles

AI/ML Developer (Speech AI)

Work from home Full-time role Hiring

Company Background Our client is a technology startup building advanced voice automation solutions for the quick-service restaurant industry. The company develops a privacy-conscious, high-performance drive-thru voice assistant that automates real-time customer interactions, improves order accuracy, and helps restaurant chains increase revenue and operational efficiency. Its product is designed for fast deployment in noisy, high-volume drive-thru environments and is already gaining market traction through cooperation with a major restaurant chain. Project Description The project is a next-generation voice automation engine for drive-thru order-taking in quick-service restaurants. It enables fully automated, real-time conversations between customers and the restaurant ordering system using modern speech recognition, natural language processing, and text-to-speech technologies. The engineer will work on improving the core voice and audio capabilities of the platform, including Speech-to-Text, Text-to-Speech, noise cancellation, speech enhancement, and real-time audio pipelines. The main focus will be on reducing latency, improving recognition quality and speech clarity, and making the system robust in extremely noisy real-world environments such as drive-thrus. Technologies Speech-to-Text, Text-to-Speech Audio Engineering / DSP Noise Suppression, Voice Activity Detection, Signal Processing PyTorch / TensorFlow Real-Time Inference, Streaming Pipelines GPU Optimization, Edge Inference Production ML Systems

What You'll Do

Optimize low-latency, real-time Speech-to-Text pipelines for production drive-thru environments; Improve Text-to-Speech naturalness, responsiveness, and overall conversational quality; Design, tune, and improve noise suppression, echo cancellation, and speech enhancement systems; Improve speech recognition accuracy and robustness under challenging acoustic conditions, including engine noise, weather, overlapping speech, poor microphone quality, and outdoor environments; Build and scale audio processing infrastructure for production deployments; Evaluate, benchmark, and compare speech models using real-world audio data and production scenarios; Experiment with modern Speech AI technologies, models, and architectures to improve system performance; Collaborate with LLM and conversational AI teams to improve end-to-end voice interaction quality; Job Requirements Advanced Python development skills; Deep hands-on expertise with Speech-to-Text and Text-to-Speech systems; Proven experience improving speech recognition quality in noisy or otherwise challenging acoustic environments; Strong expertise in noise suppression, echo cancellation, voice activity detection, and speech enhancement; Strong understanding of real-time and streaming audio architectures, including conversational voice pipelines and real-time inference; Experience building low-latency, production-grade AI systems; Experience with modern speech AI frameworks, models, and APIs; Experience deploying and scaling AI services in cloud environments; Ability to troubleshoot complex audio quality, latency, and reliability issues; Product-oriented mindset with a focus on real-world performance, customer experience, and high ownership; Ability to collaborate effectively with engineering, LLM, and conversational AI teams; English level: B2 or higher; What Do We Offer The global benefits package includes: Technical and non-technical training for professional and personal growth; Internal conferences and meetups to learn from industry experts; Support and mentorship from an experienced employee to help you professional grow and development; Health insurance; English courses; Sports activities to promote a healthy lifestyle; Flexible work options, including remote and hybrid opportunities; Referral program for bringing in new talent; Work anniversary program and additional vacation days. Apply To This Job

You might like