Site Reliability Engineer - National Remote

Overview

Remote
On Site
USD 70,200.00 - 137,800.00 per year
Full Time

Skills

DoD
Medicare
Medicaid
IT operations
Change management
Design review
Estimating
Planning
Testing
Information systems
Failure analysis
Design
Performance improvement
Program management
Swift
Incident management
Leadership
Performance tuning
Continuous improvement
Operations
Workflow
Capacity management
Forecasting
Resource allocation
Documentation
Knowledge sharing
Presentations
Training
Management
FOCUS
Dashboard
Real-time
Business requirements
Systems analysis
Optimization
Product management
Security controls
Apache Velocity
Performance management
Release management
Configuration Management
Reliability engineering
Emerging technologies
Electronic engineering
Service level
Root cause analysis
Pega
Appian
Microsoft
Scalability
Software engineering
System administration
Software development
Disaster recovery
ITIL
Computer science
Software architecture
Cloud computing
Customization
Software performance management
Dynatrace
Splunk
AppDynamics
Performance monitoring
Salesforce.com
Problem solving
Collaboration
Internet
Telecommuting
Communication
Policies
Jersey
FAR
IMPACT
Law
PASS
RPO

Job Details

The government services support team at Optum has earned the trust of organizations that our entire country relies on; from the Department of Defense and Veteran's Administration to the teams at Health & Human Services and the Centers for Medicare and Medicaid Services. We're repaying that trust with hard work, new ideas and a commitment to finding better solutions every day. Join us and help create ways for our government services agencies to be more efficient and effective. This will be the next huge step to start Caring. Connecting. Growing together.

As a Site Reliability Engineer (SRE) you will employ software engineering to automate critical IT operations tasks, including production system management, change management, and incident response. You will be responsible for design review and control; prediction, estimation, and apportionment methodology; failure mode effects and analysis; the planning, operation and analysis of reliability testing and field failures, and the ability to develop and administer reliability information systems for failure analysis, design and performance improvement and reliability program management over the entire product life cycle. You will help ensure swift incident response and scalable emergency handling, fostering greater reliability and resilience in managing complex systems. You will support our efforts in optimizing system performance and implementing, ensuring the reliability of our technology ecosystem.

You'll enjoy the flexibility to telecommute* from anywhere within the U.S. as you take on some tough challenges.

Primary Responsibilities:

  • System Reliability and Incident Management: Ensure the reliability, availability, and performance of services. Respond to, troubleshoot, and resolve service outages or degradation. Lead post-incident reviews and drive root cause analysis and mitigation
  • Monitoring and Performance Tuning: Develop and maintain advanced monitoring and alerting systems to detect and mitigate issues proactively. Continuously measure and optimize system performance, identifying bottlenecks and points of failure
  • Continuous Improvement: Advocate for and implement changes to improve system reliability and scalability. Innovate new ways to manage and automate operations tasks
  • Collaboration and Advocacy: Work closely with development teams to incorporate best practices and influence architecture, code health, and operational processes. Promote a culture of shared responsibility for production stability and performance. Integrate SRE principles into the engineering workflow
  • Capacity Planning and Scalability: Forecast and plan for the infrastructure needs. Implement scalable systems and resource allocation strategies to handle growth and peaks in demand
  • Documentation and Knowledge Sharing: Create and maintain detailed documentation of the systems, processes, and procedures. Facilitate knowledge sharing through regular technical presentations and training sessions
  • Configure, implement, and manage /optimize end-to-end APM solutions, with a focus on Dynatrace, AppDynamics, Splunk, or other relevant tools
  • Work closely with IT teams to seamlessly integrate APM solutions into the existing infrastructure and applications
  • Develop and maintain customized dashboards, reports, and alerts to offer real-time insights into the health and performance of the system
  • Collaborate with diverse teams to understand business requirements and configure APM solutions to meet performance monitoring needs
  • Conduct system analysis, troubleshooting, and optimization across various applications and infrastructure components
  • Provide support to internal stake holders and support teams regarding tweaking configurations, troubleshooting, and tool-specific nuances
  • Continuous performance management, measuring performance and working with stake holders to improve the same
  • Build quality frameworks to provide feedback loop to stakeholders to easy and improved APM product management, patching systems and implementing security controls
  • Document automation procedures to improve the velocity and quality of the effort
  • Continuous performance management, Software release management, configuration management and transition to stakeholders
  • Request feedback from teams, perform tool implementation assessments, offering recommendations for improvements to enhance system reliability and responsiveness
  • Stay abreast of industry best practices and emerging technologies in APM, ensuring our monitoring strategies align with the latest advancements

You'll be rewarded and recognized for your performance in an environment that will challenge you and give you clear direction on what it takes to succeed in your role as well as provide development for other roles you may be interested in.

Required Qualifications:

  • Bachelors degree in computer science, electronics engineering or other engineering or technical discipline (6 years of additional relevant experience may be substituted for education)
  • 4+ years of experience as a Site Reliability Engineer or in a related role
  • 4+ years of experience monitoring software performance in terms of service-level agreements (SLAs), service-level indicators (SLIs), and service-level objectives (SLOs)
  • 4+ years of experience with APM features such as real user monitoring, synthetic monitoring, and effective root cause analysis
  • 4+ years of experience with one of more of the following platforms: Salesforce, Pega, Appian, Microsoft power platform
  • Experience working to ensure the scalability, performance, and reliability of large-scale, cloud-based applications and infrastructure
  • Possess knowledge of combining software engineering and systems administration, SREs leverage Coding, Automation, and Engineering principles to build resilient, self-healing systems that could scale seamlessly
  • Able to detect issues, automatically handle failures, prepare disaster recovery plans, keeps systems up and reliable, and mitigates broken systems and prevent them from causing future disruptions

Preferred Qualifications:

  • ITIL Foundation Certification is preferred
  • Bachelor's in computer science or equivalent technical degree
  • Understanding of application architecture, infrastructure, and cloud environments
  • Proficiency in configuring and customizing multiple APM tools like Dynatrace, Splunk, AppDynamics for optimal performance monitoring
  • Additional certifications (e.g. Salesforce Developer, Quality Engineer Certification CQ etc.) are highly desirable
  • Strong problem-solving skills, including the ability to analyze complex systems and identify performance bottlenecks
  • Excellent communication skills to collaborate effectively with cross-functional teams and convey technical concepts to non-technical stakeholders
  • Must have reliable internet service that allows for effective telecommuting.
  • Must be eligible to work in the United States.
  • Must be able to obtain and maintain a government security Public Trust 2 or 4 (level will depend on your role)
  • All work must be conducted in the United States.
  • Must be able to communicate both verbally and in written form.
  • Must be able to conduct work and be available in VA communication channels during (EST business hours)

*All Telecommuters will be required to adhere to UnitedHealth Group's Telecommuter Policy.

California, Colorado, Nevada, Connecticut, New York, New Jersey, Rhode Island, Hawaii, Washington, or Washington D.C Residents Only: The salary range for California, Colorado, Nevada, Connecticut, New York, New Jersey, Rhode Island, Hawaii, Washington, or Washington D.C residents is $70,200 to $137,800 per year. Pay is based on several factors including but not limited to local labor markets, education, work experience, certifications, etc. UnitedHealth Group complies with all minimum wage laws as applicable. In addition to your salary, UnitedHealth Group offers benefits such as, a comprehensive benefits package, incentive and recognition programs, equity stock purchase and 401k contribution (all benefits are subject to eligibility requirements). No matter where or when you begin a career with UnitedHealth Group, you'll find a far-reaching choice of benefits and incentives.

Pursuant to the San Francisco Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.

Application Deadline: This will be posted for a minimum of 2 business days or until a sufficient candidate pool has been collected. Job posting may come down early due to volume of applicants.

At UnitedHealth Group, our mission is to help people live healthier lives and make the health system work better for everyone. We believe everyone-of every race, gender, sexuality, age, location and income-deserves the opportunity to live their healthiest life. Today, however, there are still far too many barriers to good health which are disproportionately experienced by people of color, historically marginalized groups and those with lower incomes. We are committed to mitigating our impact on the environment and enabling and delivering equitable care that addresses health disparities and improves health outcomes - an enterprise priority reflected in our mission.

Diversity creates a healthier atmosphere: UnitedHealth Group is an Equal Employment Opportunity / Affirmative Action employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, age, national origin, protected veteran status, disability status, sexual orientation, gender identity or expression, marital status, genetic information, or any other characteristic protected by law.

UnitedHealth Group is a drug - free workplace. Candidates are required to pass a drug test before beginning employment.

#RPO #Green
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.