Site Reliability Engineer II

Microsoft Corporation Redmond , WA 98053

Posted 7 days ago

The Azure Dedicated team plays a unique role in the Azure ecosystem. Through unique integrations of bare metal infrastructure, we are powering many of the latest AI services and innovations for the entire company. We're seeking an Site Reliability Engineer II to join us in this mission to power the biggest AI training workloads imaginable.

As a Site Reliability Engineer II in our team, you will get exposed to some of the biggest AI infrastructure in the world and you will help us build the most reliable AI training services possible.

This opportunity will allow you to connect to the AI mission in a real and tangible way by building a service oriented view of the infrastructure that allows for common High Performance Computing building blocks execute flawlessly on it. You will get exposed to the biggest names in the AI industry and have the opportunity to get hands on key Graphics Processing Unit and Infiniband infrastructure that powers everything. This opportunity has flexible working arrangements for the successful candidate as appropriate.

Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Required Qualifications:

  • 4+ years technical experience in software engineering, network engineering, or systems administration
  • OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experience in software engineering, network engineering, or systems administration
  • OR Master's Degree in Computer Science, Information Technology, or related field

Other Requirements:

  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:

  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

Preferred Qualifications:

  • Experience in Infiniband networks and their management

  • Experience in High Performance Computing workload topologies and schedulers

  • 5+ years technical experience in software engineering, network engineering, or systems administration

  • OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration

  • OR Master's Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experience in software engineering, network engineering, or systems administration.

Site Reliability Engineering IC3 - The typical base pay range for this role across the U.S. is USD $98,300 - $193,200 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $127,200 - $208,800 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay

Microsoft will accept applications for the role until July 13, 2024.

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form.

Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.

#azurecorejobs

  • Help build the mission control automation and insights to manage the AI infrastructure such as Ethernet networks, Server Management and Infiniband Management.

  • Use your skills to bring Service Level Agreements in line with Service Level Obligations with the customers asks for AI training reliability.

  • Partake in livesite and troubleshooting anywhere in the stack and help identify key gaps in telemetry that impede the Service Level Agreements.

  • Embody our Culture and Values

icon no score

See how you match
to the job

Find your dream job anywhere
with the LiveCareer app.
Mobile App Icon
Download the
LiveCareer app and find
your dream job anywhere
App Store Icon Google Play Icon
lc_ad

Boost your job search productivity with our
free Chrome Extension!

lc_apply_tool GET EXTENSION

Similar Jobs

Want to see jobs matched to your resume? Upload One Now! Remove
Site Reliability Engineer II

Peak6 Investments

Posted Yesterday

VIEW JOBS 6/21/2024 12:00:00 AM 2024-09-19T00:00 WHO WE AREApex Fintech Solutions (AFS) powers innovation and the future of digital wealth management by processing millions of transactions daily, to simplify, Peak6 Investments Portland OR

Site Reliability Engineer II

Microsoft Corporation