Senior Site Reliability Engineer

Microsoft Corporation Redmond , WA 98053

Posted 3 weeks ago

Apply

This Job is not relevant Tell us why

Are you an individual who loves to work on large-scale projects at one of the most exciting and diverse divisions within Microsoft? Are you looking for big, creative challenges that show immediate results since your customers are the product engineers for Office and M365? Do you want to be at the core of it all, acting as a force multiplier enabling groups of engineers to do amazaing work? If so, we have the perfect job for you!

The Engineering Systems 365 (ES365) team owns the tools that make up the end-to-end developer experience in Office and M365 (Substrate) from source control and check-in experience to build, validation, and deployment automation, and we're making big, bold changes - for the better! We're making it easy to build and ship apps across platforms and endpoints, and we're moving away from proprietary, internal-only tools onto "one Microsoft" investments, open source, and industry standard tools. This is an exciting time as we seek to re-invent productivity leveraging the power of Artificial Intelligence (AI) universally.

The charters of these teams include the following (and more):

Azure management & governance
Business continuity
Infrastructure as Code
Network engineering
Provisioning & service deployment
Security & vulnerability management
Systems State Management

As a Senior Site Reliability Engineer, you'll get to deliver novel solutions using a modern DevOps approach leveraging the full stack of technologies Microsoft has to offer to enable our organization to respond more effectively to evolving customer needs and market demands, all while reducing costs, eliminating duplicated work, and driving efficiencies through automation.

Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Required/Minimum Qualifications

6+ years technical experience in software engineering, network engineering, or systems administration
OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration
OR Master's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration.
4+ years' experience running large scale cloud services.

Other Requirements

Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:

Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

Preferred Qualifications

7+ years technical experience in software engineering, network engineering, or systems administration
OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 4+ years technical experience in software engineering, network engineering, or systems administration
OR Master's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration
OR Doctorate Degree in Computer Science, Information Technology, or related field.
Full-stack troubleshooting skills across network, application, hardware, management fabric, and distributed services layers.
Experience documenting complex systems accurately to convey technical ideas across teams.
Experience in implementing and managing Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for production services.
Experience in automation tools and frameworks (e.g., Terraform, Azure Resource Manager (ARM), Chef, Bicep) and scripting languages (e.g., Python, Bash, Powershell).

Software Engineering IC4 - The typical base pay range for this role across the U.S. is USD $117,200 - $229,200 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $153,600 - $250,200 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay

Microsoft will accept applications and processes offers for these roles on an ongoing basis.

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form.

Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.

Identify opportunities and drive the design and implementation of end-to-end telemetry, alerting, self-healing and automation capabilities to improve service health, manageability, and reliability.
On-Call Incident response: Participate in on-call rotations and own, triage, investigate and resolve service issues with an emphasis on broad communications, learning & teaching throughout the process.
Reduce operational burden: Develops code, scripts, systems, and/or tools that automate complex and repetitive tasks following the SDLC.
Failure Analysis: Analyzes telemetry data to develop capacity planning models, identify patterns and trends that drive continuous improvement.
Privileged Access Management: Review and approve access requests and perform regular audits.
Develop others: Mentors and coaches engineers to help them identify and propose relevant solutions.

Other

Embody our Culture & Values

Show Full Description

See how you match
to the job

Upload my resume

Download the
LiveCareer app and find
your dream job anywhere

Similar Jobs

View All

Want to see jobs matched to your resume?
Upload One Now!

Senior Site Reliability Engineer Remote

Epam Systems

Posted 2 days ago

VIEW JOBS

Senior Site Reliability Engineer Remote

Epam Systems

Posted 2 days ago

VIEW JOBS

Senior Site Reliability Engineer II (Fedramp)

Thousandeyes

Posted 2 days ago

VIEW JOBS

Apply