US
0 suggestions are available, use up and down arrow to navigate them
What job do you want?

Apply to this job.

Think you're the perfect candidate?

Lead Site Reliability Engineer

Bayone Solutions Inc Beverly Hills, CA (Onsite) Full-Time

Job Description:

  • As a Senior/Lead Site Reliability Engineer, you ll take ownership of the reliability, performance, and scalability of high-traffic retail platforms.
  • This role demands deep experience in cloud-native environments, a strong observability mindset (with New Relic as a must), and the ability to lead both incident response and system design discussions with client teams.
  • You ll serve as a technical leader and mentor, partnering with engineering, DevOps, and product teams to build resilient systems for real-time retail operations including eCommerce platforms like Shopify (bonus).
Key Responsibilities:
  • Lead reliability and observability strategy for large-scale retail systems.
  • Architect and implement robust monitoring using New Relic dashboards, SLOs, alerts, synthetic monitoring, etc.
  • Guide incident response processes and run blameless postmortems.
  • Own availability, performance, and scalability of customer-facing apps and services.
  • Design infrastructure for high availability using Kubernetes, Docker, and IAC tools (Terraform, CloudFormation).
  • Collaborate with client engineering teams to optimize system behavior during retail surges (e.g., Black Friday).
  • Mentor junior SREs and set operational best practices.
  • Partner with dev and QA to integrate performance testing and failure injection into CI/CD workflows.
  • Advocate for DevOps/SRE best practices (shift-left monitoring, chaos testing, performance budgets).
Required Qualifications:
  • 8+ years in Site Reliability Engineering, DevOps, or Platform Engineering.
  • Expertise with New Relic must be able to architect observability end-to-end.
  • Proven experience supporting retail or eCommerce platforms at scale.
  • Strong coding/scripting (Python, Bash, or Go).
  • Production experience with AWS/GCP/Azure and Kubernetes.
  • Deep understanding of infrastructure automation (Terraform, Ansible, or Pulumi).
  • Strong communication skills, client-facing presence, and leadership ability.
Nice to Have:
  • Experience with Shopify or headless commerce stacks.
  • Experience leading distributed teams.
  • Familiarity with traffic-heavy retail events and strategies (caching, autoscaling, edge optimization).
  • Experience integrating monitoring into microservices, APIs, and frontend apps
Get job alerts by email. Join Our Talent Network!

Job Snapshot

Employee Type

Full-Time

Location

Beverly Hills, CA (Onsite)

Job Type

Other

Experience

Not Specified

Date Posted

06/30/2025

Job ID

25811228

Apply to this job.

Think you're the perfect candidate?