Senior Major Incident Manager

Oracle Seattle , WA 98113

Posted 3 months ago

Responsible for the operation of production environments, including systems and databases, supporting critical business operations. Will perform administration and analysis for multiple production environments and recommend new and novel solutions to improve availability, performance, and supportability. This is an opportunity to bring a combination of deep technical knowledge with administration/analysis knowledge of Oracle's Cloud Infrastructure to provide escalation support to a wide range of complex production environment problems related to immense growth, scaling, leveraging the cloud, extremely high performance, and high availability requirements.

Install, monitor, maintain, support, and optimize all production server hardware and software. Provide escalated technical support for complex technical issues which may include leading problem management cases and providing management status. Coordinate escalated support cases and lead appropriate internal technical resources and/or third party vendors to resolution and coordinate a storage infrastructure of Oracle system and database appliances. Responsible for Oracle production environments; assist with server operating system and application upgrades, bug fixes, and patching; and work on standardization projects for both hardware and software under the Oracle technology stack while providing consistent system uptime as expected in a Cloud environment. Provide on-call support, on a rotating basis.

BS degree in Computer Science, Information Systems, or equivalent work experience. Identifies solutions in Technical Infrastructure support and server administration in a mid-sized environment; supporting and troubleshooting distributed applications, software, and operating systems; and administering and troubleshooting issues with messaging middleware and message brokers. Knowledge of enterprise platforms. Desire to keep up with modern technology and make recommendations. Knowledge of network architecture and protocols. Experience with enterprise administrative scripting using a major scripting language and experience with software deployment. Travel may be required in accordance with business needs. Related technical experience may be substituted for degree requirements, including Infrastructure support and server administration; supporting and troubleshooting distributed applications, software, and operating systems. Advanced knowledge of network architecture and protocols. Strong experience with enterprise administrative scripting using a major scripting language. 5 years or more of related technical experience may be substituted for degree requirements, including Infrastructure support and server administration; supporting and troubleshooting distributed applications, software, and operating systems.

Oracle is an Affirmative Action-Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability, protected veterans status, age, or any other characteristic protected by law.

The Oracle Cloud Infrastructure (OCI) Operations team is seeking accomplished and passionate individuals to lead and evolve our Incident Management practice to become a best-in-class service offering.

The primary function of a Cloud Incident Engineer is to direct Subject Matter Experts (SMEs) and Service(s) leaders to restore service as quickly as possible during Major Incidents while keeping accurate and timely data on the progress of such incidents and keeping senior leaders, stakeholders and end users updated.

Incident Commanders are also responsible for building and evolving the practice of Incident Management across OCI, using Post Incident Reviews, developing processes and systems to leverage the related metrics to identify and drive process and procedural improvements globally.

Who are you?

  • Passionate about Cloud, customer focused, have done incident management problem management and thrive in a dynamic team culture.

  • A technologist at heart, curious about how things work and how things break - likely to be someone who enjoys finding a better way to do things using automation

  • Able to build, maintain and leverage key relationships with internal stakeholders and service leaders to drive increased engagement and accountability for your work.

  • Love technology and how to apply it. Maybe you have set up your own environment in the cloud or have spent time developing apps or games that you share with others

  • Strong communicator who is passionate about the customer's experience

  • Motivated to be resourceful, innovative and entrepreneurial

  • Driven to learn about cloud infrastructure and its inter-dependencies

  • Humble and committed to always improving

Key Responsibilities:

  • Provides leadership in responding and resolving major incidents that impact business critical services, applications and infrastructure for OCI

  • Leverages broad technical expertise to convene appropriate SMEs (resolvers) and to direct Major Incident response, with focus on impact mitigation and service restoration

  • Work closely with SMEs to quickly identify customer impact (who, how, when)

  • Conducts escalation to service teams, senior management and leaders to ensure appropriate awareness, engagement and focus

  • Produces accurate and timely communications tailored to relevant audience (Senior Leaders and internal Stakeholders)

  • Leads and/or participates in Post Incident Review and Problem Management meetings with key stakeholders and service owners to review events and opportunities for ongoing improvement

  • Documents pertinent information relating to Incidents that aids process improvement, identifies deviations and enables the creation of an Incident Knowledge Base

  • Monitors and evaluates high-level service and infrastructure dashboards and takes action to address identified anomalies

  • Collates and analyses incident based data for team metrics and KPIs

  • Identifies opportunities and takes ownership for automation and/or continuous improvement of Incident Management process steps and best practices

  • Proactively engages with Service teams to identify and evaluate gaps in operational capabilities and improvements to support Cloud scalability and resiliency

  • Represents Incident Management at relevant software team Roadmap planning and backlog reviews, influencing the prioritization of automation and tooling enhancements

  • Work as part of the Major Incident Management team to ensure that the performance of the team achieves the defined performance targets and KPIs

Preferred Qualifications

  • Have a broad and deep knowledge of cloud infrastructure and related technologies

  • Experience in technical troubleshooting, with broad expertise in core infrastructure technologies (e.g. server, compute, storage, network, authentication, databases)

  • Able to review and edit automation code (e.g. Python, JavaScript, Linux shell) and data objects written in JSON or XML

  • Experience in managing and tuning systems and/or applications, with ability to review and validate system test output

  • Understand IP networking fundamentals and be familiar with Data Center network architectures and standard protocols (e.g. BGP, OSPF)

  • Experience in influencing internal/external teams within a diverse/large organization and skilled at building strong relationships, to deliver required & improved results

  • Strong leadership skills to direct service teams during Major Incidents that have the potential for significant business impact; remaining calm, professional and focused in high pressure situations

  • Excellent Incident and Problem Management knowledge and experience.

  • Exceptional written and verbal communication skills with meticulous attention to detail

  • Able to work unsupervised, independently and within a global team

  • Experienced user of a trouble ticketing system (Jira, Remedy or similar)

  • Flexibility to work within a "Follow the Sun" global shift rotation, covering local day-time hours, including holidays and weekends, on a rotational basis

  • Ability to be "on-call" as part of an on-call rotation shared across all team members

  • Ability to manage multiple tasks in a fast-paced, ever changing environment

  • Ability to think strategically and tactically and work in both a reactive (incident response) as well as proactive engagement model.

  • U.S. Citizenship or U.S. Lawful Permanent Resident Status/Protected Person Required Federal Government customer.

icon no score

See how you match
to the job

Find your dream job anywhere
with the LiveCareer app.
Mobile App Icon
Download the
LiveCareer app and find
your dream job anywhere
App Store Icon Google Play Icon

Boost your job search productivity with our
free Chrome Extension!

lc_apply_tool GET EXTENSION

Similar Jobs

Want to see jobs matched to your resume? Upload One Now! Remove
Senior Incident Response Analyst

Docusign, Inc.

Posted 7 days ago

VIEW JOBS 1/17/2020 12:00:00 AM 2020-04-16T00:00 Senior Incident Response Analyst IT, InfoSec, Cyber Risk & Business Operations | Seattle, Washington Our agreement with employees DocuSign is committed to building trust and making the world more agree-able for our employees, customers and the communities in which we live and work. You can count on us to listen, be honest, and try our best to do what’s right, every day. At DocuSign, everything is equal. We each have a responsibility to ensure every team member has an equal opportunity to succeed, to be heard, to exchange ideas openly, to build lasting relationships, and to do the work of their life. Best of all, you will be able to feel deep pride in the work you do, because your contribution helps us make the world better than we found it. And for that, you’ll be loved by us, our customers, and the world in which we live. The team Our IT, InfoSec, Cyber Risk & Business Ops team is in the business of trust and reliability. We create, maintain and operate scalable technology and data solutions that deliver an exceptional experience for our internal & external customers. We embrace Agile principles and values, favor DevOps practices, and view infrastructure as code, all while we create an infrastructure that scales and supports our growth and ambitious vision. This requires a smart, highly collaborative team who can identify, investigate, and implement new technologies to continue securely scaling our global business. This position DocuSign is seeking a passionate and talented Senior Incident Response Analyst to join our global Security Operations Incident Response (IR) Team, a critical part of our world-class Information Security function. This is a technical, hands-on role that will work with a variety of security tools and technologies protecting enterprise and production environments. The role is key in high priority security incidents collaborating across the people and processes of the impacted teams during the United States time zone. The successful candidate will have scope to shape and impact DocuSign's comprehensive information security incident response stack. This is a fantastic opportunity to join a team who live and breathe cyber security and to work for a company with security in its DNA. This position is an Individual Contributor and reports to Senior Security Operations Manager in Seattle. Responsibilities Handle the entire lifecycle of your assigned security incidents from detection to resolution and root cause analysis. You will be responsible for managing and escalating security incidents, in both DocuSign’s production and enterprise environments in accordance with DocuSign’s Incident Response plan. Utilize your expertise and security skills against threat actors’ tools, techniques, and practices through knowledge of the latest attack trends, tools and threat landscape. Collaborate across functions with various impacted business functions and technologist teams to drive the response lifecycle and actions required for the given incident scenario. Impact the Incident Response program through handling of high priority incidents at many levels within the organization including preparing details required for Leadership. Influence the Incident Response program with agile iterations of incident response plans and run books. Participate in on-call rotation to provide 24x7 incident response coverage. Basic Qualifications S. in Computer Science or related field or equivalent work experience. Extensive experience with IR processes and techniques developed through a minimum of 8 years’ experience working directly in an information security incident response handling role. Experience with specialized IR processes including reverse engineering, red/blue team exercises, and investigating common actions on intent like data exfiltration and lateral movement. Strong background in Windows, Apple, and *nux systems. Preferred Qualifications Solid awareness and analysis of threat actors including organized crime and other APT groups with the eye to moving detection earlier in the kill chain. Proven communications skills and collaboration capabilities required for handling security incidents. Experience working across technical and business teams and varying levels of Leadership in a professional manner. Strong technical knowledge in security engineering, system and network security, authentication and security protocols, cryptography, and application security. Ability to operate effectively and efficiently in a high-demand environment. Proven experience in large scale incident response. GCIH certification preferred. Knowledge of common threat hunting tools and technologies. Familiarity with data and message transport technologies, strong scripting skills, and common information technologies. Knowledge of machine learning and big-data tools a plus including Hadoop, SPLUNK, and SIEM tools. About us DocuSign® helps organizations connect and automate how they prepare, sign, act on, and manage agreements. As part of the DocuSign Agreement Cloud, DocuSign offers eSignature: the world's #1 way to sign electronically on practically any device, from almost anywhere, at any time. Today, hundreds of thousands of customers and hundreds of millions of users in over 180 countries use DocuSign to accelerate the process of doing business and simplify people's lives. Plus, we save more trees together! And that’s a good thing. DocuSign is an Equal Opportunity Employer. DocuSign is committed to building a diverse team of talented individuals who bring different perspectives to the business and who feel a sense of inclusion and belonging when they join our team. Individuals seeking employment at DocuSign are considered without regards to race, ethnicity, color, age, sex, religion, national origin, ancestry, pregnancy, sexual orientation, gender identity, gender expression, genetic information, physical or mental disability, registered domestic partner status, caregiver status, marital status, veteran or military status, citizenship status, or any other legally protected category. #LI-DS1 Docusign, Inc. Seattle WA

Senior Major Incident Manager