SRE Director

American Express Phoenix , AZ 85002

Posted 1 week ago

We're looking for a Site Reliability Engineering Director to work within the Global Mobile Engineering organization and lead an Engineering team responsible for mobile app performance, availability and reliability.

You'll be expected to work with several Technology partners, and Product Managers to help actively identify areas of opportunity within the availability platform and build a vision for the next generation platform, technology and constant innovations. In addition you will engage in hands-on design and ensure alignment of strategy, architecture, tools/methods with software engineers and architects. You will be responsible for opening up the boundaries in monitoring, tooling, and resolving in our efforts to maximize the performance and availability of our mobile applications.

You should be familiar with modern Software Development methodologies, and be able to dive deep and rapidly iterate on ideas despite ambiguity. Make no mistake - this is an opportunity to work in one of the best Technology units which help lead risk for American Express and influence how millions of people interact with their cards, their merchants and their money.

Qualifications:

  • BS or MS degree in computer science, computer engineering, or other technical discipline, or equivalent 3-6 years of work experience

  • Aptitude for learning and applying programming concepts

  • Detailed understanding of application flows, Proactive monitoring needs of production systems

  • In-depth knowledge of ITIL concepts such as Incident, Change, Problem management and support procedures

  • Ability to effectively communicate with internal and external business partners and technology teams

  • Very strong technical troubleshooting and analytical skills with the ability to resolve infrastructure (cloud) and application issues in Production environment

  • Direct application monitoring and work towards implementing automated monitoring scripts

  • Expertise with Splunk programming - writing queries, building dashboards, configuring alerts, and reports

  • Strong knowledge and experience with Linux System Engg and scripting languages utilizing solid coding practices (code re-use, functions, comments) Python, Perl and Shell

  • Strong development/support experience with Java, Kotlin, or Swift

  • Experience in Development and maintenance of iOS and Android apps

  • Experience on integration and usage of Mobile APM tools like Fabric, Sentre, MixPanel, App Dynamics etc. to analyze mobile app crashes preferred

  • Deployment and troubleshooting experience on JBOSS and Node JS

  • Self-motivated with a strong sense of urgency and dedication to deadlines

Plus:

  • Experience in Reliability space and tools

  • Experience in building dashboard and tools

  • Experience with Red Hat OpenShift, Kubernetes and Docker

  • Experience working with Jenkins and any open source CICD tools, network load balancers such as Big IP f5 and design/development of iRules.

  • Experience on modern databases (Redis, Couchbase ..)

  • People Manager

Employment eligibility to work in American Express in the U.S. is required as the company will not pursue visa sponsorship for these positions.


See if you are a match!

See how well your resume matches up to this job - upload your resume now.

Find your dream job anywhere
with the LiveCareer app.
Download the
LiveCareer app and find
your dream job anywhere
lc_ad

Boost your job search productivity with our
free Chrome Extension!

lc_apply_tool GET EXTENSION

Similar Jobs

Want to see jobs matched to your resume? Upload One Now! Remove
SRE Team Lead

Hootsuite

Posted 1 week ago

VIEW JOBS 11/27/2018 12:00:00 AM 2019-02-25T00:00 As a team lead in production operations and delivery (POD), you are accountable for the execution and delivery of a team of SRE/devops practitioners. You understand the importance of a devops mindset, and have practiced partnering with product engineering stakeholders to design and deliver a reliable, scalable, secure and performant platform. This role is envisaged to be 50% technical and hands-on, 50% people, process and project wrangling - but we understand that will change based on your skillset and the evolution of the team. We are open to hiring someone remote for this position! Who You Are… * You may or may not have managed a team before, but you have a strong background in cloud infrastructure design and operations, and can work within a team who will architect, build and automate the cloud production infrastructure on which Hootsuite runs. * You are a leader and an influencer. You oversee broad solutions and clarify complex problems. You propose and champion improvements in processes and technology choices. You make your team and your teammates better. * You enjoy working in fast-paced, highly scalable environments (billions of requests, millions of users, significant YoY growth) * You have been responsible for running critical services that multiple customers depend upon. You understand the importance and impact that operational optimization can have on a product and the positive ripple effects that it can have across an entire engineering organization. * You are empathetic: You take others' opinions into account and clearly communicate your thoughts to reach technical solutions quickly. * You've been around the block a few times - you have multiple years of operations experience in highly available cloud and datacenter environments and you've seen the myriad of ways technology can succeed or fail You're Great At… * The bread and butter - administering large-scale Linux systems and cloud platforms, using technologies such as Python, Go, Terraform, Puppet, Ansible and Chef. * Solving problems by writing code to automate your way out of them. You have replaced manual processes time and time again with your code. * Participating in a 24x7 on-call rotation, managing pager load to eliminate toil and angry named pipes. * Developing talent. You enjoy mentoring, coaching and pairing with coworkers * Condensing process from practice * Using tools like lean and agile to manage your team's projects and interruptions, adapting and refining over time Bonus Points… * You've got Kubernetes production experience * Understanding of operational security practices and IT audit (SOC2, Fedramp) standards * You've led or been part of a software development team Hootsuite is an inclusive employer. Every effort will be made to provide accommodations requested by candidates taking part in all aspects of the selection process. #LI-TG1 Hootsuite Phoenix AZ

SRE Director

American Express