Site Reliability Engineer II

Overview

On Site
USD 98,300.00 - 193,200.00 per year
Full Time

Skills

Operations
Innovation
Sensors
Computer hardware
Data centers
IMPACT
Collaboration
Accountability
Screening
PASS
Cloud computing
Mechanical engineering
Materials science
Electrical engineering
Data acquisition
Wireless networking
Microsoft Azure
Writing
C#
Windows PowerShell
Linux
Python
Reliability engineering
IC
Integrated circuit
Internal communications
Legal
Recruiting
Microsoft
Customer experience
Training
Evaluation
Leadership
Automated testing
Management
Incident management
Data
Test equipment
Reporting
QoS
Quality assurance
Network
Scripting
Testing

Job Details

Microsoft's Cloud Operations & Innovation (CO&I) group is looking for a Site Reliability Engineer II to support the Commissioning (Cx) Automation and Global Cx teams to deploy, monitor, and troubleshoot a distributed test platform. The platform is globally deployed and consists of client and cloud-based applications, custom hardware, wired / wireless networks, and sensor networks that automate the measurement and validation of hardware and electrical components and interconnected systems within large datacenters. Our infrastructure supports more than 1 billion customers and 20 million businesses in over 90 countries worldwide.

In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day.

Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond

Qualifications

Required/Minimum Qualifications
  • 7+ years relevant technical engineering experience
    • OR Bachelor's Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field AND 3+ years technical engineering experience
    • OR Master's Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field AND 2+ years technical engineering experience.

Other Requirements:
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to, the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.


Additional or Preferred Qualifications
  • Bachelor's Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field AND 5+ years technical engineering experience
    • OR Master's Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field AND 3+ years technical engineering experience
    • OR Doctorate Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field
  • Experience working on large scale distributed test systems or high-speed data acquisition systems.
  • Experience setting up and troubleshooting wired and wireless networks
  • Demonstrated proficiency in deploying and monitoring Azure based services
  • Experience writing code to automate day-to-day tasks with proficiency in C#, PowerShell, Linux, or Python


Reliability Engineering IC3 - The typical base pay range for this role across the U.S. is USD $98,300 - $193,200 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $127,200 - $208,800 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:

Microsoft will accept applications for the role until September 3, 2024

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form .

Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.

#COICareers

#EPCcareers

#SiteReliability

Responsibilities

  • Configure, monitor, and support the test platform used by the Global Commissioning Team
  • Establish and maintain the Cx Automation lab as the environment for training and testing new applications
  • Perform technical evaluation of new devices and test instruments
  • Lead projects in the lab to add or update test automation or device simulation capabilities
  • Establish and oversee the Incident Management processes for the team
  • Develop an understanding of features and operation of all software products and test equipment
  • Participate in on-call rotations and alert product teams to major customer impacting issues
  • Analyze telemetry data to identify opportunities to improve the reliability and performance of the platform
  • Leverage and contribute to troubleshooting tools for commons problems
  • Evaluate and test new applications and test equipment prior to global deployments
  • Develop reporting for quality of service, and usage of the application / test instruments
  • Troubleshoot and repairing test devices or network equipment that is returned from field
  • Develop code or scripts that reduce the setup and overall testing time
  • Embody our Culture and Values
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.