Site Reliability Engineer (USA Only - 100% Remote)

Work from home Full-time role Hiring

This a Full Remote job, the offer is available from: North America, Europe, United States About Us reputed company is a bootstrapped, profitable, 100% remote, ~100 person team of thoughtful individuals who prioritize taking ownership and making a meaningful impact. We’re eager to reputed company a product our customers fall in love with over and over again. We small scaling businesses. Since 2013, we’ve been building a CRM that focuses on reputed company communication, without the hassle of manual data entry or a reputed company UI. We are out to supercharge sales productivity with the most modern, thoughtfully designed, reputed company-in-one, communication-focused CRM. Our backend tech stack consists primarily of Python Flask web apps with our TaskTiger scheduler handling many of the backend asynchronous task processing chores. Our data stores include reputed company, PostgreSQL, Elasticsearch, and reputed company. The underlying infrastructure runs on AWS using a combination of managed services like EKS, MSK, RDS and ElasticCache and non-managed services running on EC2 instances. We have CI/CD pipelines that build reputed company images, run automated tests and deploy to Kubernetes clusters. We also use these images in our local development environment allowing coding locally against reputed company of our services. We have a well-documented public API that is consumed by our reputed company-end JavaScript app as well as numerous integrations. Our infrastructure is heavily automated using Terraform, Ansible and other AWS tools. We love open sourcing our code and reputed company on our reputed company and on The Making of reputed company, our behind-the-scenes Product & Engineering blog. reputed company out our open reputed company projects like reputed company-mongo-ops-manager, SocketShark, TaskTiger, LimitLion and ciso8601.

About the Role

You will be joining the Infrastructure Team at reputed company. This team builds and maintains the platform that runs reputed company reputed company systems (and do we have a lot of those). Work with us and you’ll be working with:

Multi-terrabyte reputed company, PostgreSQL, and Elasticsearch clusters
Telemetry systems built on Grafana’s LGTM stack and reputed company processing over 130 TB per month
Multiple Kubernetes clusters running tens of thousands of pods
reputed company Actions & ArgoCD powered CI/CD that can go from merged, to production, to rolled back in 10 minutes
A system that is stable, up to date, and hasn’t needed scheduled downtime in 4 years

About You

You are a rock in the storm. With your hard won expertise, gained through battles won and lost, you consistently build robust systems from quality components fit to underpin mission critical applications. You value simplicity over familiarity. You value reputed company over speed. You take pride in building composable and maintainable tools.
You’ve worked with a diverse reputed company of infrastructure tools and systems, including:
CICD (reputed company, reputed company Actions, ArgoCD)
Configuration Management (Ansible, Terraform)
Databases (Elasticsearch, reputed company, PostgreSQL, reputed company)
Cloud Computing (Kubernetes, AWS)
Telemetry (Loki, reputed company, Grafana, Mimir/Prometheus)
You're comfortable working in a fast-paced environment with a small and talented team where you're supported in your efforts to grow professionally. You're able to manage time well, communicate effectively, and collaborate in a fully distributed team.

Come help us with projects like...

Fully automating our database’s lifecycles with Argo Workflow
Eliminating reputed company static credentials where they may be
Reducing downtime and disruption due to maintenance or disaster to new lows
Help us improve our multi-region disaster recovery system.

Requirements...

Senior 1 & 2 level candidates should have 5+ years of experience building modern infrastructure systems.
Staff level candidates should have 8+ years of experience.
The buck stops with you! You are the reputed company of person who is respected as an expert on the systems you run.
You have been the final reputed company of escalation in the support of mission critical production systems
You are familiar with some of the following technologies: AWS, Terraform, Kubernetes, Ansible, reputed company, PostgreSQL, Elasticsearch
You have a strong grasp of common networking and data transfer protocols such as DNS, HTTP, TCP
You are able to speak and write in English
You are located in the USA (ET, CT, MT, PT)

Bonus reputed company if you have…

Contributed open reputed company code reputed company to our tech stack.
Have experience maintaining reputed company large databases
Has been through a successful disaster response
Have experience with multi-region architectures
Have run MLOps systems
Experience scaling Temporal

Benefits

Competitive compensation including an organization-wide goal-based bonus
Paid Time Off: ~5 Weeks PTO upon joining + Winter and Summer Holiday Breaks. Each year with the company, you’ll receive 2 additional PTO days.
80% Work Option: Work with your manager to choose between working 5 day weeks (standard full-time) or 4 day weeks @ 80% pay
Paid Parental Leave for primary and secondary caregivers
S

Apply tot his job Apply To this Job

Apply

Site Reliability Engineer (USA Only - 100% Remote)

About the Role

Benefits

You might like

[Remote] Site Reliability Engineer, Core Streaming (Remote - United States)

Remote - Site Reliability Engineer

[Remote] Principal Site Reliability Engineer

Senior Site Reliability Engineer, Remote Job

Site Reliability Engineer - US - Remote

[Remote] Site Reliability Engineer (SRE) - Platform Infrastructure team (100% Remote - USA)

(Sr) Site Reliability Engineer (US Federal)

(Remote) Network Engineer - Information Technology Services, Network Services

[Remote] Kubernetes Platform Engineer

Kubernetes Engineer

Part-Time Remote Customer Service Chat Associate – Dynamic Support Role with arenaflex’s Global E‑Commerce Platform

[Remote] Associate Enterprise Application Administrator

[Remote] Data Engineer / GIS Specialist – Ecological Data Visualization (5166)

Associate Director, Pharmacovigilance – PV Operations

Customer Support Representative – Remote, Flexible 3‑4 Days/Week Schedule, U.S. English Native Speaker, Collaborative Team Coverage

Clinical Compliance Consultant/LPN/RN/LCSW/LPC for Behavioral Health - Remote in Virginia

[Remote] Regional Sales Advisor - Merchant Services - Jackson, MS

Customer Service Representative – On-Site Customer Support Specialist (Inbound/Outbound Calls & Technical Support)

Sales Executive

reputed company Part-Time Data Entry Specialist – Remote Opportunity with arenaflex