Overview
On Site
Full Time
Skills
Value Engineering
Quality Assurance
Reliability Engineering
IT Operations
IT Infrastructure
System Monitoring
Apache Velocity
Incident Management
Software Development
FOCUS
Scalability
Root Cause Analysis
Capacity Management
Service Level
Management
Continuous Integration and Development
Continuous Integration
Continuous Delivery
Emerging Technologies
Mentorship
Computer Science
Information Technology
Python
Java
C#
Systems Design
Computer Networking
Amazon Web Services
Google Cloud Platform
Google Cloud
Orchestration
Docker
Kubernetes
Terraform
Ansible
Puppet
Progress Chef
Scripting
Windows PowerShell
Command-line Interface
Bash
Disaster Recovery
High Availability
Cloud Computing
DevOps
Performance Analysis
Analytics
Microsoft Azure
Splunk
Grafana
Problem Solving
Conflict Resolution
Communication
Organizational Skills
Collaboration
Recruiting
Privacy
DICE
Job Details
Since 2012, we've grown to become one of the leading single-family rental companies and homebuilders in the country, recently recognized as a top employer by Fortune and Great Place To Work . At AMH, our goal is to simplify the experience of leasing a home through professional management and maintenance support, so our residents can focus on what really matters to them, wherever they are in life.
The Site Reliability Engineer will work at the intersection of SecOps, DevOps, Quality Assurance, and IT operations teams by leveraging technical and interpersonal skills to design, build, and maintain scalable and resilient systems. Strikes a balance between development velocity and system reliability. Leverages engineering and IT operations expertise to identify and execute solutions to remediate blind spots, performance, velocity, cost issues, and structural weaknesses in infrastructure and systems. Selects and utilizes software tools to automate IT infrastructure tasks such as system management and application monitoring. Responsible for the systems monitoring/observability platform, enabling rapid incident response, remediation, and service restoration. Owns the end-to-end postmortem process, including Root Cause Analysis and, most importantly, defining and implementing preventative action plans to prevent incident recurrence. Continuously looks across and assesses the technology ecosystem to discover solution opportunities to improve and optimize performance, operation effectiveness, cross-team collaboration, security posture, and delivery velocity.
Responsibilities:
Requirements:
Build your career with us:
At AMH, we know what it takes to feel at home. That's not just our product; it's also our culture. We work to maintain a people-first culture of trust, belonging, and inclusion, where our employees are empowered to collaborate and take initiative. If you're ready to elevate your career, we hope you'll consider making your home with us. Apply today and a member of our Talent Acquisition team will reach out soon! To learn more about our workplace, please visit amh.com/careers.
CA Privacy Notice: To learn more about what information we collect when you apply for a job, and how we use that information, please see our CA Job Applicant Privacy Notice found at ;br>
#LI-PH1 #DICE_PHO #DICE
The Site Reliability Engineer will work at the intersection of SecOps, DevOps, Quality Assurance, and IT operations teams by leveraging technical and interpersonal skills to design, build, and maintain scalable and resilient systems. Strikes a balance between development velocity and system reliability. Leverages engineering and IT operations expertise to identify and execute solutions to remediate blind spots, performance, velocity, cost issues, and structural weaknesses in infrastructure and systems. Selects and utilizes software tools to automate IT infrastructure tasks such as system management and application monitoring. Responsible for the systems monitoring/observability platform, enabling rapid incident response, remediation, and service restoration. Owns the end-to-end postmortem process, including Root Cause Analysis and, most importantly, defining and implementing preventative action plans to prevent incident recurrence. Continuously looks across and assesses the technology ecosystem to discover solution opportunities to improve and optimize performance, operation effectiveness, cross-team collaboration, security posture, and delivery velocity.
Responsibilities:
- Design, develop, streamline, and deploy automation tools and frameworks to enhance the velocity, reliability, and efficiency of Azure-hosted services.
- Implement and maintain monitoring, alerting, and incident response processes to ensure timely detection, resolution, and proactive detection of issues before impacting users.
- Collaborate with software development teams to design and implement applications with a strong focus on reliability, scalability, security, and performance.
- Perform root cause analysis of incidents and implement preventive measures to avoid similar issues in the future.
- Work on capacity planning and scaling strategies to accommodate growing user bases and increasing workloads.
- Define service level indicators, objectives, and agreements to continuously measure and manage system performance to ensure service quality meets business needs.
- Continuously improve deployment pipelines and implement best practices for continuous integration and continuous deployment (CI/CD).
- Stay current with industry trends and emerging technologies, integrating relevant ones into the organization's practices.
- Provide mentorship and guidance to junior engineers and actively share knowledge within the team.
Requirements:
- Bachelor's degree in Computer Science, Information Technology, or a related field.
- Minimum of five (5) years of experience in a Site Reliability Engineer, DevOps, or similar role is a plus.
- Proficiency in at least one programming language (e.g., Python, Go, Java, C#) for scripting and automation tasks.
- Strong understanding of system design, networking, and distributed systems principles.
- Familiarity with cloud platforms (e.g., AWS, Azure, Google Cloud).
- Hands-on experience administering Azure, along with strong understanding of core Azure services, workloads, subscriptions, and security.
- Experience with containerization and orchestration technologies (e.g., Docker, Kubernetes).
- Experience with Infrastructure as Code automation technologies (e.g., Terraform, Ansible, Puppet, Chef).
- Experience with scripting tools (e.g., PowerShell, CLI, Bash).
- Experience with developing and implementing disaster recovery and high-availability solutions and processes.
- Certifications related to cloud platforms and DevOps practices are advantageous.
- Azure DevOps Engineer, Solution Architect, and/or Support Engineer certification is highly desired.
- Knowledge of monitoring and logging tools for observability and performance analysis (e.g., Azure Monitor, Log Analytics, Azure Data Explorer, Splunk, Grafana, Opsgenie).
- Excellent problem-solving and troubleshooting skills, with a proactive and solution-oriented mindset.
- Ability to work effectively in cross-functional teams and communicate technical concepts to both technical and non-technical stakeholders.
- Strong collaboration and communication skills (both written and verbal), able to work effectively with cross-functional teams.
- Excellent planning and organizational skills.
- Entrepreneurial spirit and willingness to take prudent risks.
- Ability to interact effectively at all levels.
- Strong customer, quality, and results orientation.
- Ability to be an effective member of project teams.
Build your career with us:
At AMH, we know what it takes to feel at home. That's not just our product; it's also our culture. We work to maintain a people-first culture of trust, belonging, and inclusion, where our employees are empowered to collaborate and take initiative. If you're ready to elevate your career, we hope you'll consider making your home with us. Apply today and a member of our Talent Acquisition team will reach out soon! To learn more about our workplace, please visit amh.com/careers.
CA Privacy Notice: To learn more about what information we collect when you apply for a job, and how we use that information, please see our CA Job Applicant Privacy Notice found at ;br>
#LI-PH1 #DICE_PHO #DICE
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.