Site Reliability Engineer

Copart Dallas , TX 75201

Posted 2 months ago

Copart is seeking a Site Reliability Engineer for our Dallas HQ office specializing in Systems and application monitoring and troubleshooting. This position will be part of a 24/7 Global Network Operations team that monitors and provides L1/L2 support to meet the SLA commitment of Copart's Global Data Center and Application infrastructure.

Ideal Candidate:

Team Player -- Candidate that works well in a collaborate team environment. Effective communication skills and a great personality is a must.

Talented -- Your skillsets expand beyond the core knowledge of Windows, UNIX, or Linux platforms. Not only should you be knowledgeable in core Systems, experience in VM environment, Networking, Scripting, Automation, Kubernetes, and awareness of other technologies is a plus.

Innovative -- We are always looking for ways improve our process and procedures. The ideal candidate should have natural desire to make things better and not be afraid to speak up if an opportunity for improvement arises.

Essential Duties and responsibilities:

  • Perform application deployments using Jenkins and Spinnaker on Prod and Non-Prod Environments.

  • Coordinate and Perform periodic failover testing of Copart's Network/Systems Infrastructure and application environments.

  • Build/Optimize tools with Python, Ansible and Grafana to monitor/collect key metrics and automate remediation of Infrastructure or application issues.

  • Perform monthly security patching of Systems OS and applications.

  • Maintenance and Optimization of the following tools and repositories (Nagios, Netbox, Prometheus, Grafana, Sumologic, Selenium, Instana, Github and more...)

  • Interface with internal teams (Product development, DevOps, Network, Systems and DB)

  • Utilize internal monitoring tools to analyze and pro-actively monitor Copart's Global Data Center and Application infrastructure to catch and quickly resolve issues before it arises.

  • Quickly and efficiently communicate issues with several of Copart's domains.

  • Develop analysis and reporting capabilities; monitor performance and quality control plans to identify improvements.

  • Document standard operating procedures, diagrams, and training materials for use by the teams.

Requirements:

  • Progressive knowledge of monitoring protocols such as SNMP, Netflow, Syslog etc.

  • Intermediate programming and scripting knowledge

  • Knowledge in different types of monitoring methodologies i.e Agent and agentless checks.

  • Troubleshooting knowledge with Linux/Unix/Windows based systems

  • Working with VM management software - Vsphere

  • Knowledge of monitoring tools, Nagios, Solar Winds, Site24*7

  • Be flexible and be able to handle competing/changing priorities.

  • Very strong oral and written communication skills

  • Must be a self-starter with the ability to work well in a team environment

  • Flexible schedule required

  • Knowledge of the areas are a BIG plus

  • Dashboard applications such as Grafana

  • Scripting/Programing/Automation -- Python, Bash, Ansible, Stackstorm

  • Experience working with Github, Jenkins, Spinnaker, Docker, Kubernetes

  • Front end scripting languages, libraries and frameworks such as Java, Javascript, Angular JS, Flask etc.

#LI-MS1


icon no score

See how you match
to the job

Find your dream job anywhere
with the LiveCareer app.
Mobile App Icon
Download the
LiveCareer app and find
your dream job anywhere
App Store Icon Google Play Icon
lc_ad

Boost your job search productivity with our
free Chrome Extension!

lc_apply_tool GET EXTENSION

Similar Jobs

Want to see jobs matched to your resume? Upload One Now! Remove
Site Reliability Engineer
New!

Tezza Business Solutions LLC

Posted Today

VIEW JOBS 5/4/2024 12:00:00 AM 2024-08-02T00:00 SITE RELIABILITY ENGINEER Job Summary: We are seeking a skilled and experienced Site Reliability Engineer (SRE) to join our IT team. As an SRE, yo Tezza Business Solutions LLC Nigeria Lagos

Site Reliability Engineer

Copart