Amazon strives to be the world's most customer centric company. To succeed, our products and services must be available at all times to our customers.
Within Amazon we have an entire organization dedicated to the availability of our shopping experiences worldwide named Consumer Reliability Engineering, and we are hiring.
We are responsible for the global availability of the Amazon retail shopping experiences. Ensuring a highly available experience is a massive challenge across 26 marketplaces' websites & mobile-apps, powered by tens of thousands of backend services. Multiply this width of scope with the depth of complexity introduced by the diversity of those services' implementation choices, consumption of AWS' services, frequent software updates, new feature launches, and you begin to get the picture.
To support the growth of complexity while strengthening our culture of resilient software engineering across all of Amazon's consumer SDEs, we are creating an anomaly detection and remediation function in Seattle to ensure that Amazon's retail customer experience is indistinguishable from perfect. Using machine learning models, you will build software that detects anomalies in the retail customer experience within seconds, localizes those anomalies within Amazon's ecosystem of tens of thousands of services and proactively repairs those anomalies before a single Amazon customer is impacted.
Additionally, we will create chaos experiments at all levels of granularity, from impacting hosting platforms on which services are running, to introducing latency in system-to-system interactions, to creating a complete loss of a significant portion of our architecture. The learnings from these experiments will drive the improvement of the software owned and operated by thousands of developers and provide guidance to our AWS partners. To accommodate the growing scale and complexity of Amazon, this work simply cannot be done by testers; only large-scale distributed solutions utilizing machine-learned insights of behavioral characteristics have a chance of coping with this challenge.
As a Software Development Engineer in this space, you will love the fast-paced, startup-like environment focused on building systems from the ground up that enable the execution and wide-scale coordination of chaos experiments; aggregation and learning from results; and the definition and execution of architectural improvements across all our software development groups. You will be responsible for scoping and delivering projects end-to-end, leveraging statistical evaluation, pattern recognition, and machine learning. You will deliver results personally and by leading your peers to deliver solutions that protect Amazon's services by proactively proving that the complex service graph powering Amazon retail websites globally are resilient against anomalous conditions created by failures, unexpected customer behavior, and even attackers.
The ideal candidate will have a proven track record of shipping complex software solutions through an agile methodology. You will have the ability to dive deep into a wide variety of problems and technologies to guide the right technical decisions for the products and the businesses you will support.
You will bring multiple years of DevOps experience from both owning and operating solutions of scale. You will be a strong communicator and will have proven abilities in both architecture and software solutions.
Amazon is an Equal Opportunity-Affirmative Action Employer - Minority / Female / Disability / Veteran / Gender Identity / Sexual Orientation