Site Reliability Engineer

Overview

Hybrid
$120,000 - $130,000
Full Time
Able to Provide Sponsorship

Skills

Documentation
Computer science
Continuous delivery
Continuous integration
DevOps
Amazon Web Services
Apache Velocity
AppDynamics
Cloud computing
Database
Docker
CHAOS
FMEA
High availability
Hosting
Jenkins
Kubernetes
Operational excellence
Java
Root cause analysis
Performance monitoring
Productivity
Programming languages
Python
Ruby
Scalability
Scripting
Splunk
Testing
Optimization

Job Details

Site Reliability Engineer (SRE) Location: Mountain View, CA (Hybrid)
Job Type: Full-time

Job Description:

As a Site Reliability Engineer (SRE), you will design, implement, and maintain complex data systems that support millions of customers. You will apply Cloud Native principles and best practices to ensure high availability, security, performance, and scalability of database systems. This is a hands-on role that involves working with cutting-edge technologies and maintaining critical infrastructure.

Key Responsibilities:

  • Design, build, and maintain CI/CD pipelines in Jenkins.
  • Deploy services in Kubernetes clusters using Helm, Kustomize, and similar tools.
  • Implement infrastructure changes in AWS with a deep understanding of AWS services.
  • Participate in on-call duties for pre-production and production systems, supporting multi-million users.
  • Write and review RCA (Root Cause Analysis) documentation to prevent the recurrence of incidents and share learnings.
  • Contribute to system upgrades, deployment automation, monitoring enhancements, and production changes.
  • Create operational playbooks, write how-to articles, and gain domain knowledge to drive team improvements.
  • Participate in FMEA (Failure Mode and Effects Analysis) testing, chaos testing, and security remediation efforts.
  • Share best practices for operational excellence and cost optimization.
  • Automate processes to reduce manual efforts and increase efficiency.
  • Continuously look for opportunities to increase developer velocity and productivity.

Qualifications:

  • Bachelor s or master s degree in Computer Science or a related technical field, or equivalent experience.
  • 4+ years of hands-on experience with development and operations in AWS environments.
  • Expertise in performance monitoring, troubleshooting, and tuning.
  • Experience with AWS services and Cloud hosting.
  • Proficiency in DevOps automation using scripting languages.
  • Experience with programming languages such as Java, Python, or Ruby.
  • Knowledge of Docker, Kubernetes, and ArgoCD.
  • Experience with monitoring and observability tools such as Splunk, Wavefront, AppDynamics, Prometheus, and Tracing.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

About The Wolf Works