VP Cloud Site Reliability Engineer
Req #: 190014033
Location: Jersey City, NJ, US
Job Category: Technology
Join J.P. Morgan's greenfield Cloud Engineering team building products that will accelerate our journey to Public Cloud. Our team is responsible for making Public Cloud services available within the bank, leveraging native services from multiple cloud service providers to deploy scalable, secure and resilient products.
Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale massively distributed, fault-tolerant systems. SRE is a mindset and a set of engineering approaches to running better production systemswe build our own creative engineering solutions to operations problems.
Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. As SREs are responsible for the big picture of how our systems relate to each other, we use a breadth of tools and approaches to solve a broad spectrum of problems. Practices such as limiting time spent on operational work, blameless postmortems and proactive identification of potential outages factor into iterative improvement that is key to both product quality and interesting and dynamic day-to-day work.
SRE's culture of diversity, intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We strive to create an environment that provides the support and mentorship needed to learn and grow.
Release engineering mindset to engage in and improve the whole lifecycle of servicesfrom inception and design, through deployment, operation and refinement.
Engineer and automate our environments and SDLC, designing for security, reliability and scale
Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
Solve operational problems by use engineering approaches in building innovative technical solutions.
Focus on automation as first class operating model to ensure stability and velocity during development.
Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews.
Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
Practice sustainable incident response and blameless postmortems.
Work with customers to provide in-depth technology solutions to their business problems.
Operational support including triaging of incidents, incident communications to users, executives and stakeholders, monitor key system components, bug fixes
Lead, coach and mentor the organization with practical examples, hands-on workshops and papers.
BS degree in Computer Science or related technical field or equivalent practical experience.
5+ years of programming experience in one or more of the following: Java, Python, Go
AWS - EC2, EKS, Fargate, Lambda, DynamoDB, RDS, SQS, SNS, Service Catalog, Cloudformation
Storage technologies - object and block storage.
OS - Linux (RHEL).
Cyber security fundamentals and working knowledge.
CI / CD pipelines and tool sets.
Testing Frameworks - xUnit, FitNesse, Cucumber
Source Control Management : Git, Subversion
Experience designing, developing, or maintaining production-grade cloud solutions in Cloud ecosystems such as Amazon Web Services, Microsoft Azure or Google Cloud Platform.
A systems thinker with hands-on experience with Lean and Agile methodologies such as Scrum and Kanban.
Jpmorgan Chase & Co.