As a Site Reliability Engineer (SRE), you'll help build a meaningful engineering discipline, combining software and systems to develop creative engineering solutions to operations problems. Much of our support and software development focuses on optimizing existing systems, building infrastructure, and reducing work through automation. You'll join a team of curious problem solvers with a diverse set of perspectives who are thinking big and taking risks. In this environment you'll take the lead on relevant projects, supported by an organization that provides the support and mentorship you need to learn and grow. As an SRE, you'll be focused on running better production applications and systems.
Develop, test, and debug automated tasks (Apps, Systems, Infrastructure)
Troubleshoot minor incidents and contribute to resolution through post-mortems
Participate in the application or service development lifecycle through code contributions
Engage with tools and operations teams to address failure patterns and incidents
Develop automation tools for efficient, noiseless alerting, toil, and technical debt
Conduct performance tests, document and/or identify application optimizations
Bachelor's degree or equivalent experience in an software engineering discipline
Proficiency in at least one software language (e.g. Python, Java, Go, etc.).
Understanding of the software delivery lifecycle
Expertise in application, data, and infrastructure architecture disciplines
Advanced knowledge of one or more infrastructure components (e.g. networking, cloud services, orchestration tools, containerization, compute, and storage systems)
Capable of managing service-level changes to a system or service
Hands-on experience with cloud deployment, monitoring, and ops analysis tools such as Kubernetes, Prometheus, Elasticsearch, Grafana, Kibana, Splunk, DynaTrace, etc.
Jpmorgan Chase & Co.