Site Reliability Engineer

O'reilly Auto Parts Springfield , MO 65802

Posted 1 week ago

"The Site Reliability Engineer is responsible for the availability and performance the platforms and services of O'Reilly Auto Parts. Creates and defines monitoring and incident response tools and processes.

The Site Reliability Engineer will create a bridge between development and operations by applying a software engineering mindset to system administration. Time will be split between operations/on-call duties and developing systems and software that help increase site reliability and performance.

Essential Job Functions

  • Deploy methodologies for building and operating highly available and scalable services.

  • Work closely with Network Operations Center to develop monitoring tools, analyze root cause of incidents, and improve the Network Operations Center's ability to independently resolve issues.

  • Evaluate, build and modify automation for deploying and operating production services.

  • Provide leadership in reducing and resolving production incidents.

  • Proactively monitor and review application performance. Monitor specific metrics, set thresholds, and trigger alerts based on those thresholds.

  • Collect and analyze logging and diagnostic information.

  • Identify opportunities to improve all operations processes.

  • Facilitate effective transition of services into production ensuring that all requirements have been met in accordance with O'Reilly's Change Management standards.

  • Properly document all incident responses.

  • Provide updates and documentation to runbooks and operational manuals.

  • Document mean time to recover (MTTR) and mean time to failure (MTTF).

  • Participate in on-call rotations.

Skills and Qualifications

Required:

  • Bachelor's Degree or equivalent work experience.

  • 5+ years of professional experience in Site Reliability, Linux Systems Administration, DevOps, or Infrastructure Engineering.

  • Experience with programming languages including Java, JavaScript and SQL.

  • Experience with Shell Scripting such as Bash, Python or Ruby.

  • Familiarity with automation and configuration management tools and frameworks.

  • Excellent analytical and problem solving skills.

  • Strong written and verbal communication skills.

  • Must be well organized, detail oriented, and able to self-prioritize work.

  • Must exhibit a high degree of professionalism.

  • Composed urgency in stressful situations.

Desired:

  • ITIL Foundations Certification.
  • CRE or CMRP Certifications.

"


icon no score

See how you match
to the job

Find your dream job anywhere
with the LiveCareer app.
Mobile App Icon
Download the
LiveCareer app and find
your dream job anywhere
App Store Icon Google Play Icon
lc_ad

Boost your job search productivity with our
free Chrome Extension!

lc_apply_tool GET EXTENSION

Similar Jobs

Want to see jobs matched to your resume? Upload One Now! Remove
Remote Site Reliability Engineer

O'reilly Auto Parts

Posted 2 months ago

VIEW JOBS 9/6/2019 12:00:00 AM 2019-12-05T00:00 General Summary The Site Reliability Engineer is responsible for the availability and performance the platforms and services of O'Reilly Auto Parts. Creates and defines monitoring and incident response tools and processes. The Site Reliability Engineer will create a bridge between development and operations by applying a software engineering mindset to system administration. Time will be split between operations/on-call duties and developing systems and software that help increase site reliability and performance. Essential Job Functions * Deploy methodologies for building and operating highly available and scalable services. * Work closely with Network Operations Center to develop monitoring tools, analyze root cause of incidents, and improve the Network Operations Center's ability to independently resolve issues. * Evaluate, build and modify automation for deploying and operating production services. * Provide leadership in reducing and resolving production incidents. * Proactively monitor and review application performance. Monitor specific metrics, set thresholds, and trigger alerts based on those thresholds. * Collect and analyze logging and diagnostic information. * Identify opportunities to improve all operations processes. * Facilitate effective transition of services into production ensuring that all requirements have been met in accordance with O'Reilly's Change Management standards. * Properly document all incident responses. * Provide updates and documentation to runbooks and operational manuals. * Document mean time to recover (MTTR) and mean time to failure (MTTF). * Participate in on-call rotations. Skills/Qualifications/Education Required: * Bachelor's Degree or equivalent work experience. * 5+ years of professional experience in Site Reliability, Linux Systems Administration, DevOps, or Infrastructure Engineering. * Experience with programming languages including Java, JavaScript and SQL. * Experience with Shell Scripting such as Bash, Python or Ruby. * Familiarity with automation and configuration management tools and frameworks. * Excellent analytical and problem solving skills. * Strong written and verbal communication skills. * Must be well organized, detail oriented, and able to self-prioritize work. * Must exhibit a high degree of professionalism. * Composed urgency in stressful situations. Desired: * ITIL Foundations Certification. * CRE or CMRP Certifications. O'reilly Auto Parts Springfield MO

Site Reliability Engineer

O'reilly Auto Parts