Sr. Software Development Engineer Iii, Chaos Engineering / Reliability Engineering

Amazon.Com, Inc. Seattle , WA 98113

Posted 3 weeks ago

Amazon strives to be the world's most customer centric company. To succeed, our products and services must be available at all times to our customers.

Within Amazon we have an entire organization dedicated to the availability of our shopping experiences worldwide named Consumer Reliability Engineering, and we are hiring.

We are responsible for the global availability of the Amazon retail shopping experiences. Ensuring a highly available experience is a massive challenge across 26 marketplaces' websites & mobile-apps, powered by tens of thousands of backend services. Multiply this width of scope with the depth of complexity introduced by the diversity of those services' implementation choices, consumption of AWS' services, frequent software updates, new feature launches, and you begin to get the picture.

To support the growth of complexity while strengthening our culture of resilient software engineering across all of Amazon's consumer SDEs, we are creating an anomaly detection and remediation function in Seattle to ensure that Amazon's retail customer experience is indistinguishable from perfect. Using machine learning models, you will build software that detects anomalies in the retail customer experience within seconds, localizes those anomalies within Amazon's ecosystem of tens of thousands of services and proactively repairs those anomalies before a single Amazon customer is impacted.

Additionally, we will create chaos experiments at all levels of granularity, from impacting hosting platforms on which services are running, to introducing latency in system-to-system interactions, to creating a complete loss of a significant portion of our architecture. The learnings from these experiments will drive the improvement of the software owned and operated by thousands of developers and provide guidance to our AWS partners. To accommodate the growing scale and complexity of Amazon, this work simply cannot be done by testers; only large-scale distributed solutions utilizing machine-learned insights of behavioral characteristics have a chance of coping with this challenge.

As a Software Development Engineer in this space, you will love the fast-paced, startup-like environment focused on building systems from the ground up that enable the execution and wide-scale coordination of chaos experiments; aggregation and learning from results; and the definition and execution of architectural improvements across all our software development groups. You will be responsible for scoping and delivering projects end-to-end, leveraging statistical evaluation, pattern recognition, and machine learning. You will deliver results personally and by leading your peers to deliver solutions that protect Amazon's services by proactively proving that the complex service graph powering Amazon retail websites globally are resilient against anomalous conditions created by failures, unexpected customer behavior, and even attackers.

The ideal candidate will have a proven track record of shipping complex software solutions through an agile methodology. You will have the ability to dive deep into a wide variety of problems and technologies to guide the right technical decisions for the products and the businesses you will support.

You will bring multiple years of DevOps experience from both owning and operating solutions of scale. You will be a strong communicator and will have proven abilities in both architecture and software solutions.

Amazon is an Equal Opportunity-Affirmative Action Employer - Minority / Female / Disability / Veteran / Gender Identity / Sexual Orientation



icon no score

See how you match
to the job

Find your dream job anywhere
with the LiveCareer app.
Mobile App Icon
Download the
LiveCareer app and find
your dream job anywhere
App Store Icon Google Play Icon
lc_ad

Boost your job search productivity with our
free Chrome Extension!

lc_apply_tool GET EXTENSION

Similar Jobs

Want to see jobs matched to your resume? Upload One Now! Remove
Sr Network Development Engineer III Border Engineering AWS

Amazon.Com, Inc.

Posted 6 days ago

VIEW JOBS 3/28/2020 12:00:00 AM 2020-06-26T00:00 The AWS Networking organization is hiring, and we're looking for talented Network Development Engineers (NDE) to join our team. Within Networking, we're confronting and solving complex and high-stake challenges. Our teams support all aspects of connectivity to/from Amazon and the outside world, as well as the connectivity between Amazon's data centers and services. As a member of one our network engineering teams, you will play a part in designing and architecting networks that simply cannot fail, must scale infinitely, and can never constrain growth or innovation. This opportunity sits within our Internet Services organization where the team is responsible for designing the networks connecting Amazon directly to the outside world (including the Internet, peers, and external customers), evolving the global backbone that interconnects our Regions as well as partnering with our internal customers. Engineers in this organization define routing policy, design and implement Traffic Engineering solutions, as well as architect the hardware platforms and network design to support internal and external connectivity. We move terabits of traffic in and out of our networks each day and are responsible for the ingress and egress points of traffic entering and leaving Amazon as well as the entirety of all internal and customer traffic that rides our global backbone. Engineers on this team make day-to-day and strategic decisions that carry a huge amount of responsibility and impact. Responsibilities As a Network Development Engineer at Amazon, your core job responsibility is to design network topologies, architectures, and services that solve for many requirements. We listen to our customers (both internal and external), but we also listen to our teams. Our Engineers contribute materially to each team's roadmap, and constantly help us to determine what's most important. Together, you and your leadership team will decide on the projects that best support your team's mission. You will have the resources and time necessary to understand, scope, and deliver these solutions. The primary area of responsibility you and your team will have is to design ahead of customer or technology needs, always predicting and solving for problems that have not yet occurred. This takes the form of creating and defending High Level Design (HLD) documents, working with vendors or internal stakeholders to influence technology roadmaps, constructing and testing your solutions, and providing support to the teams that will deploy and operate these designs. Our teams also serve as technical escalation points in support of our very talented operations teams. We're interested in engineers with experience having designed, operated, and implemented networks of very large scale, and those well versed in the operation of the Internet routing hierarchy. Candidates should understand the theory behind and have deep operational knowledge of routing protocols with particular emphasis on their impact to hardware platforms. We're looking for those who have pushed platforms to their limits, understand how to work within the limitations imposed by third-party software and how to avoid them through your designs. Amazon is committed to a diverse and inclusive workplace. Amazon is an equal opportunity employer and does not discriminate on the basis of race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or other legally protected status. For individuals with disabilities who would like to request an accommodation, please visit https://www.amazon.jobs/en/disability/us. Amazon.Com, Inc. Seattle WA

Sr. Software Development Engineer Iii, Chaos Engineering / Reliability Engineering

Amazon.Com, Inc.