Reporting to the Senior Director of Change, Process, and Tooling in the Residential Reliability Engineering Organization, this individual will have primary responsibility to drive business results through data mining, research, and investigation into problem/incident management metrics. They will also generate insights by conducting extensive analyses of Comcast's rich data. In the process they will develop a deep understanding of the Residential Reliability Engineering Organization, trends, and the ability to provide actionable items based on findings. This person will be responsible for the development, implementation and continued improvement of reactive and proactive ITIL-standard Problem Management practices. They will work to identify recurring Operational issues, determine root cause, and develop and implement problem solutions (including the shaping of the release process) to prevent recurrence of issues. They will construct methodologies by which to determine success and improvement opportunities with key metrics such as ticket state transitions, time in queue, and time to resolve.
Relying heavily on Incident, Change, and Problem ticket data, the manager would also be responsible for aggregating and analyzing telemetry data across the Residential Reliability Product group, building dashboards and connecting related data points to create a cohesive picture of the health of our Residential Operational capabilities to triage incidents and deploy changes.
The right candidate must value collaboration and transparency in supporting the development of a culture and associated processes focused on blameless incident review to identify opportunities for permanently solving problems and known errors as well as driving analysis to identify and proactively solve underlying problems prior to impacting customers.
Mine and analyze information from company databases used to track incidents, changes, and problems to provide insights, themes, and drive optimization.
Develop reports, dashboards, and processes to monitor and analyze performance and data accuracy.
Assess the effectiveness and accuracy of new data sources and data gathering techniques
Identify, analyze, and interpret trends or patters in complex data sets.
Locate and define new process improvement opportunities
Provide data driven insight to improve periodic and specific product change success
Execute proactive and reactive problem management analysis to minimize future problems.
Monitor trouble tickets to identify impacted system and application outages that require follow-up analysis.
Drive data collection for reviews of major outages and track problem tasks to completion.
Engage with appropriate technical and business teams to assist in the resolution of problem records
Review Problem Management policy and knowledge documentation on a regular basis to ensure relevance and accuracy
Facilitate task forces aimed at addressing a problematic issue with an unknown root cause.
Manage queues within operations platform and monitor all open and aging cases daily.
Develop and review compliance metrics and KPI's to identify areas to mature the problem process, policies, and training material.
Leverage operations platform performance analytics and reports to identify repeat incidents
Bachelors' Degree or equivalent
Engineering, Computer Science
Generally requires 6-9 years related experience
Problem or Incident Management experience
Root Cause Analysis techniques
Familiarity with the ServiceNow tool; configuration of reports and dashboards.
Metrics aggregation for multiple sources using SQL.
Familiarity with metrics graphing tools such as Tableau, or other similar BI tools.
Understanding and parsing large and disparate data sets.
Comcast is an EOE/Veterans/Disabled/LGBT employer