Senior Site Reliability Engineer

Planetart San Diego , CA 78384

Posted 1 week ago

PlanetArts mission is to be the leading online destination for personalized invitations, announcements, home dcor and other personalized products. We provide consumers with unmatched tools and content and an unparalleled end-to-end customer experience that result in high-quality and meaningful finished products and memorable celebrations of life events. We are seeking a Site Reliability Engineer to manage the website operations for our Personal Creations.com, Gifts.com, and CafePress.com ecommerce divisions. In this position, you will own the monitoring infrastructure of our ecommerce platform and work closely with DevOps and Development teams to build solutions to continue improve our site reliability and performance.

What You'll Do:

  • Measure and monitor availability, performance, and overall health of our ecommerce platform.
  • Maintain and monitor our production cloud infrastructure.
  • Practice continuous improvement of monitoring processes, configurations, and thresholds.
  • Configure monitoring, logging, and alerting for production systems
  • Build dashboards, monitors, and tools to ensure reliability of production platform.
  • Communicate technical issues to non-technical stakeholders.
  • Develop/maintain incident responses and postmortems
  • Create monitoring roadmap to address monitoring deficiencies
  • Create and measure critical KPIs to achieve performance targets.
  • Reduce operational overhead by automating repeatable tasks
  • Document procedures, runbooks, and escalation processes
  • Work with DevOps and Development teams to communicate and prioritize production issues
  • Generate reports regarding key metrics and incidents
  • Communicate technical issues to non-technical stakeholders.
  • Participate in an on-call rotation.

Requirements

What You'll Need:

  • 4+ years in a Reliability Engineering, DevOps, or Infrastructure focused role.
  • Experience with Monitoring tools dashboarding, setting alarms, and incident response
  • Experience with AWS cloud computing services
  • Experience with Splunk, Dynatrace, New Relic, CloudWatch, or other enterprise monitoring tools
  • Experience with infrastructure performance monitoring and tunning
  • Experience with load\performance testing or capacity planning
  • Knowledge of shell scripting and a scripting language (PowerShell, Python, etc.)
  • Familiarity with popular CI/CD procedures and environments
  • Excellent troubleshooting and problem-solving skills
  • Ability to investigate and debug problems
  • Familiarity managing and deploying applications to Kubernetes
  • Familiarity with Jenkins or other deployment automation tool
  • Experience working with Atlassian tools (Jira, Confluence, etc.)
  • Certifications on the related field is a plus
  • Experience with infrastructure as code (Terraform, Puppet, CloudFormation)
  • Experience with version control platforms such as Git
  • Experience working with cache technologies (Memcached, Redis, etc.)
  • Previous involvement in on-premise to cloud migration
  • Covid-19 vaccine is required - reasonable accommodations will be considered

Benefits

PlanetArt offers a comprehensive benefits package including:

  • Health Insurance
  • Life Insurance
  • 401(k)
  • Paid Time Off
  • Hybrid In-Office Work Schedule
  • Employee Product Discounts
icon no score

See how you match
to the job

Find your dream job anywhere
with the LiveCareer app.
Mobile App Icon
Download the
LiveCareer app and find
your dream job anywhere
App Store Icon Google Play Icon
lc_ad

Boost your job search productivity with our
free Chrome Extension!

lc_apply_tool GET EXTENSION

Similar Jobs

Want to see jobs matched to your resume? Upload One Now! Remove
Senior Site Reliability Engineer

Alteryx Inc.

Posted 2 weeks ago

VIEW JOBS 11/16/2021 12:00:00 AM 2022-02-14T00:00 We're looking for problem solvers, innovators, and dreamers who are searching for anything but business as usual. Like us, you're a high performer who's an expert at your craft, constantly challenging the status quo. You value inclusivity and want to join a culture that empowers you to show up as your authentic self. You know that success hinges on commitment, that our differences make us stronger, and that the finish line is always sweeter when the whole team crosses together. #LI-RN1 #LI-REMOTE Alteryx is searching for a Senior Site Reliability Engineer in the United States (REMOTE is an option). Position Overview: You'll be working on a new team that will create the observability systems and practices for Alteryx Cloud. Help build the practice as well as design, architect, build and manage systems for metrics, logs and tracing. You'll also be an evangelist with stakeholders on SRE concepts and practices. Create a center of excellence for observability at Alteryx. What you'll do: * Create the observability architecture for metrics, logs, and tracing systems for Alteryx Cloud. * Design, Manage and operate the observability system. * Evangelize tools and practices with engineering, architecture and customer service teams as a subject matter expert. * Build dashboards, and escalation systems for production engineering, customer service and engineering teams * Ensure stakeholder teams are using the tools in the best way, make iterative improvements through problem-solving, empathy and communication About you: * 3+ years expertise with metrics, time-series, logging and/or distributed tracing * The team will offer Observability as a service * 2+ years experience with building observability systems using products like Elastic Search, splunk, datadog, Prometheus, TICK stack, Grafana, kibana (your choice) * 3+ years programming and software development - python, java, go, .NET or any other language * 1 or more years experience working with cloud providers like AWS, GCP or Azure * Working knowledge of SDLC concepts Compensation: Alteryx is committed to fair and equitable compensation practices. The salary range for this role in Redwood City, CA is $122,100 - $207,600. Compensation will ultimately be in line with the location in which the position is filled. Final compensation for this role will be determined by various factors such as a candidate's relevant work experience, skills, certifications, and geographic location. This role is eligible for variable compensation including bonus and stock grants. Find yourself checking a lot of these boxes but doubting whether you should apply? At Alteryx, we support a growth mindset for our associates through all stages of their careers. If you meet some of the requirements and you share our values, we encourage you to apply. As part of our ongoing commitment to a diverse, equitable, and inclusive workplace, we're invested in building teams with a wide variety of backgrounds, identities, and experiences. Benefits & Perks: Alteryx has amazing benefits for all Associates which can be viewed here. Alteryx Inc. San Diego CA

Senior Site Reliability Engineer

Planetart