[Remote] Staff Site Reliability Engineer, Security
Note: The job is a remote job and is open to candidates in USA. Stord is The Consumer Experience Company, focused on enhancing checkout experiences for leading brands. They are seeking a Staff Site Reliability Engineer with a focus on security to build and scale security programs, integrate automation, and establish continuous posture monitoring in their GCP environment.
Responsibilities
- Assess and harden Stord's GCP footprint (GKE, IAM, Cloud Armor), and codify the baseline in Terraform and policy-as-code where it makes sense
- Build continuous posture monitoring against that baseline, with a published gap list and remediation schedule
- Drive the evaluation, integration, and rollout of new security tooling as the program matures
- Establish and automate the vulnerability and dependency remediation workflow across engineering teams: triage cadence, ownership model, severity-based SLAs, and the tracking infrastructure that drives closure
- Own Dependabot configuration and triage workflows across our GitHub organization, plus secret scanning, push protection, and response workflows for any secrets that surface
- Build supply-chain controls into CI/CD: provenance, dependency review, lockfile policies, build attestation where it pays off
- Wire container image scanning and DAST/network scanning programs into the same workflow so vulnerabilities don't slip through the cracks between layers
- Build security capabilities that the broader SRE team can run as part of their normal operating model: Terraform modules, Cloud Armor rules, Istio authorization policies, Cloudflare configuration, scanner pipelines, and custom automation that fills gaps in off-the-shelf tooling
- Ship documentation, runbooks, and self-service tooling that make your designs portable to the rest of the team, so the program continues to function smoothly through handoffs and rotations
- Set the engineering bar for security work inside SRE: code review standards, IaC patterns, "secure by default" templates for new services
- Partner cross-functionally with engineering teams on app security questions, IT on identity and endpoint boundaries, and IT/compliance on occasional SOC 2 evidence pulls, without owning those domains
Skills
- Deep GCP and GKE security experience. You've hardened production Kubernetes on GCP: workload identity, RBAC, network policies, Pod Security Standards, image provenance. You know where the sharp edges are and which knobs actually matter
- Dependabot and secret scanning at scale. Hands-on with Dependabot configuration, triage workflows, and remediation tracking. Comfortable rolling out GitHub secret scanning organization-wide, including push protection and response workflows for found secrets
- CI/CD supply chain hardening. You've designed or operated controls against the threat model that produced Shai-Hulud, XZ, and SolarWinds. Familiar with SLSA, provenance, sigstore, and the trade-offs between rigor and developer friction
- Cloud security posture management in practice. You've stood up CSPM (built-in, commercial, or open source), defined a baseline, and driven remediation, with an eye for separating real signal from dashboard noise
- Infrastructure-as-code and automation fluency. Comfortable with Terraform for cloud resources and writing code (Python, Go, shell, or similar) to automate security workflows, integrate tools, and build in-house capabilities when off-the-shelf options fall short
- Systems-level technical fluency. You can reason about how the platform pieces fit together (GKE workloads, networking, edge, CI/CD) and debug security-relevant infrastructure problems alongside the broader SRE team
- Track record of designing for operability. You've shipped tools and workflows that other engineers actually adopt and rely on day-to-day
- Ownership & Accountability. You own features end-to-end and take pride in what you ship. You follow through from design to production and don't drop things
- Strong Communication. You can explain technical decisions and trade-offs to engineers, PMs, and stakeholders. You ask good questions and listen well
- Collaborative Approach. You work well with others, give constructive code review feedback, and actively seek input from teammates
- Production Mindset. You prioritize reliability and user impact. You think about failure modes, monitoring, and operational concerns as part of your design process
- Learning Agility. You're comfortable with rapidly evolving AI/ML technologies and tools. You stay current without chasing hype
- Directed AI-Assisted Development. You know how to use AI coding tools as a productivity multiplier while maintaining quality and your own technical judgment
- Container and image scanning. Production experience integrating image scanners into CI/CD and registry workflows, with thoughtful handling of vulnerability data freshness and triage
- DAST and network scanning programs. OWASP ZAP, nmap, or commercial equivalents, built into a repeatable internal audit cadence rather than one-off exercises
- Cloudflare edge security. WAF rules, rate limiting, bot management, and how that fits with origin-side Cloud Armor
- Detection engineering on GCP. Log Explorer, BigQuery-backed security analytics, and alert tuning that keeps the on-call experience humane
Company Overview
Company H1B Sponsorship