Dynatrace Admin / Operations Awareness Manager /SRE

Overview

On Site
$80,000 - $120,000
Full Time

Skills

Dynatrace

Job Details

Role: Dynatrace Admin

Location: Fort Worth, TX 76131

Fulltime

An Operational Awareness or Monitoring and Alerting Engineer is a specialized IT professional responsible for the design, implementation, and management of monitoring and alerting systems for an organization's IT infrastructure. Their primary goal is to ensure the continuous availability, reliability, and performance of critical systems and applications. By leveraging various monitoring tools and technologies, they proactively identify and address potential issues before they impact business operations.

Key Responsibilities:

System Monitoring: Implement and maintain monitoring solutions to track the performance, health, and availability of IT systems, applications, and networks.

Alert Management: Configure and manage alerting mechanisms to ensure timely notifications of any anomalies, failures, or performance degradations.

Incident Response: Collaborate with support and operations teams to analyze, resolve, and lead event resolution processes during incidents and outages.

Root Cause Analysis: Conduct thorough investigations to determine the root cause of incidents and implement corrective actions to prevent recurrence.

Optimization: Identify opportunities for system optimization and performance improvements through data analysis and trend identification.

Tool Evaluation and Integration: Evaluate, recommend, and integrate new monitoring and alerting tools and technologies to enhance the organization's monitoring capabilities.

Documentation and Reporting: Develop and maintain comprehensive documentation, including monitoring configurations, incident reports, and performance metrics.

Collaboration and Communication: Work closely with various IT teams, including application, infrastructure, and DevOps teams, to ensure seamless operations and effective communication during incidents.

Skills and Qualifications:

Proficiency in monitoring and alerting tools (e.g., Dynatrace, Datadog, CloudWatch, Splunk).

Strong understanding of IT infrastructure, including servers, networks, databases, and cloud environments.

Some Experience with incident, problem, and change management processes a plus

Ability to analyze complex systems and identify performance bottlenecks.

Excellent troubleshooting and problem-solving skills.

Effective communication and collaboration skills.

Familiarity with ITIL best practices and service management frameworks.

Performance of Duties:

Operate in a 7-day/24-hour environment with after-hours support flexibility.

Collaborate with internal teams and suppliers to resolve and lead event resolution across all mission-critical IT and Telecom service levels.

Protect business system availability through integrated incident, problem, and change management.

Monitor systems for faults and optimization opportunities.

Assist the major incident response team and escalate critical events.

Evaluate and improve monitoring/alerting tools and processes.

Conduct technical root cause analysis and engage with management teams for internal issues.

Identify potential business-impacting events and manage incident processes.

Provide expert guidance during reviews and debriefs.

Analyze problem trends and monitor tools to identify chronic activity.

Communicate effectively with senior management.

Qualifications:

Experience with Dynatrace, AppMon, Zabbix, SCOM, Datadog, CloudWatch, X-Ray, and Splunk.

Self-motivated and able to work in a 7x24 environment.

Experience managing critical system outages and interacting at all organizational levels.

On-call support availability.

Preferred Qualifications:

B.S. degree in Computer Science, Information Systems, or Engineering.

Technical expertise in distributed systems/administration and general scripting/programming (Python, Node.js, Ruby, Perl, Bash/sh).

Excellent writing and communication skills.

ServiceNow experience.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.