Site Reliability Engineering - Technical Duty Officer
Location: San Diego, CA
This position reports to the Manager of Technical Duty Officers in AMS. ServiceNow is changing the way people work. With a service-orientation toward the activities, tasks and processes that make up day-to-day work life, we help the modern enterprise operate faster and be more scalable than ever before.We're disruptive. We work hard but try not to take ourselves too seriously. We are highly adaptable and constantly evolving. We are passionate about our product, and we live for our customers. We have high expectations and a career at ServiceNow means challenging yourself to always be better.
Who is the TDO?:
The Technical Duty Officer (TDO) team provides leadership to a talented Site Reliability Engineering (SRE) group to keep our worldwide cloud service available. We advance training and give support to the operations teams on all issues impacting our infrastructure. The TDO engages in robust communication across the organization to drive necessary changes and execute initiatives with rigorous determination.
What you get to do in this role?:
Leverage your extensive system, network, and database skills to provide technical leadership for a team of on-site engineers who are responsible for the availability and performance of ServiceNow's cloud platform.
Lead as the crisis manager during all major outages, and provide technical input to the teams engaged in remediation.
Drive organization-wide change by participating in post-incident reviews, approving new architectural designs, and establishing strong relationships by working with many cross-functional teams.
Make operations more effective by continually training and mentoring the team on all aspects of the operational environment.
Build requirements for new procedures and automations and verify that these new services meet our needs before getting released to the production environment.
Coordinate all recovery efforts to provide rapid relief and resolution to any issue that could be impacting the operational environment.
What should you know to be successful?:
An in-depth understanding of the technology associated with operating a service or platform in the cloud, including datacenters, systems, networks, load balancers, applications, and relational databases.
Familiarity with Networking technologies such as routing, switching, DNS, load balancing, and CDN.
Working knowledge of BASH, Python, Perl or other scripting languages.
Experience with MariaDB/MySQL configuration, SQL query analysis, and database performance techniques.
Solid *nix systems administration, network administration and application layer experience.
Excellent collaboration skills across diverse cross-functional teams.
Proven abilities to effectively promote your ideas and obtain buy-in from stakeholders.
Meticulous analytical skills to identify and understand the root cause of critical issues.
3-5 years of technical leadership experience.
Bachelor's degree in Computer Science or Information Systems or equivalent technical discipline, or similar work experience in an enterprise 24/7 production environment supporting critical, real-time applications.
Strong understanding of Internet protocols, web technologies, and operating systems.
More about this role:
We provide competitive compensation, generous benefits, and a professional atmosphere. This is a very collaborative and inclusive work environment where individuals with a strong aptitude will have an opportunity to grow their professional careers through working with some of the most advanced technology in the industry.