Site Reliability Engineer

The Gap Pleasanton , CA 94588

Posted 2 months ago

As a part of our technology organization, you will have the opportunity to build next generation solutions that will transform the way our customers interact with our family of iconic brands. Our team employs a DevOps model, allowing our product teams to have full ownership of design, build and operate with immense scale. From distributed computing, to artificial intelligence, mobile, big data and cloud computing, you will have the opportunity to build a career that allows you to make an impact all while learning new technical and leadership skills. We are inspired by new challenges and push ourselves to create what's next in this dynamic industry. Come join this diverse team and grow with us.

  • Maintain 99.999% uptime for Gap, Inc family of eCommerce websites

  • Analyze failures, mitigate them on the spot, and work proactively to prevent them in the future

  • Quarterback high severity issues as required; Oversee RCA process

  • Work with teams across the organization to build and maintain monitor-able, performant, reliable, and highly scalable software systems

  • Partner with Architecture team on best practices for availability and resilience

  • Analyze system and application level metrics for Peak capacity planning and for troubleshooting

  • You've been around the block with at least 8 years of IT experience and recent site reliability engineering scope

  • Senior level Linux system administrator with networking and storage experience

  • Supported end-to-end systems and software high-scale web environments

  • Scripts using Python, bash etc. and experienced in monitoring/metric tools such as Splunk, Nagios, New Relic or similar

  • Experienced supporting Tomcat, jboss and other application servers

  • Experienced with cloud infrastructure

  • Comfortable working in DevOps and CI/CD principles

icon no score

See how you match
to the job

Find your dream job anywhere
with the LiveCareer app.
Mobile App Icon
Download the
LiveCareer app and find
your dream job anywhere
App Store Icon Google Play Icon
lc_ad

Boost your job search productivity with our
free Chrome Extension!

lc_apply_tool GET EXTENSION

Similar Jobs

Want to see jobs matched to your resume? Upload One Now! Remove
Senior Site Reliability Engineer

Workday, Inc.

Posted 3 weeks ago

VIEW JOBS 3/31/2019 12:00:00 AM 2019-06-29T00:00 Join our team and experience Workday! It's fun to work in a company where people truly believe in what they're doing. At Workday, we're committed to bringing passion and customer focus to the business of enterprise applications. We work hard, and we're serious about what we do. But we like to have a good time, too. In fact, we run our company with that principle in mind every day: One of our core values is fun. Job Description Join our team and experience Workday! It's fun to work in a company where people truly believe in what they're doing. At Workday, we're committed to bringing passion and customer focus to the business of enterprise applications. We work hard, and we're serious about what we do. But we like to have a good time, too. In fact, we run our company with that principle in mind every day: One of our core values is fun. About the Team Once a Workday application is built by our developers it is handed over to the Environments team. This is the team that supports the operations of our applications and services. They manage all environments from production, sandbox, sales, and implementation. This team pushes new code to our existing customers, monitors the health, performance, and reliability of the Workday stack, and in general, "keep the lights on" with 24/7 coverage. About the Role You will be a key contributor on the Environments Tools Services Team. You will help the team in building a user-friendly, scalable, and reliable tools framework. You will work closely with others in the Environments Team to help automate tenant management and other operational tasks. As a senior member of the team, you will help guide and mentor fellow team members. You will be part of on-call, and patch rotations. About You * High degree of comfort in Object Oriented, and Functional Programming. * Excellent analytical skills for troubleshooting and problem determination. * You love solving the complexities of orchestrating a deployment. You dislike doing things twice, so you automate each step along the way. You strive to make things self-healing, elastic, automatic, repeatable, and well tested. What Excites You * Passionate about automation. * Motivated by writing fast, scalable code with testability in mind. * Excited by working in a fast-paced environment. Qualifications * MS in Computer Science or related field and 2 years of relevant experience OR BS in Computer Science or related field and 5 years relevant experience * Experience in designing, analyzing, and troubleshooting large-scale distributed systems * Strong background in Linux and Shell Scripting * Strong experience with SQL and MySQL (NoSQL experience is a plus, too) * Strong Experience in one or more: Java, Python, Ruby * Experience using deploying and utilizing one or more: AWS, OpenStack, Azure, Google Cloud Platform, * Experience implementing, designing, deploying: Docker, Kubernetes, Serverless (Lambda's) * Experience with one or more orchestration, deployment tools: CloudFormation, Terraform, Ansible, Chef * Experience with one or more CI tools: Jenkins, TeamCity, Bamboo, Artifactory * Excellent documentation skills * LI-BT Workday, Inc. Pleasanton CA

Site Reliability Engineer

The Gap