Site Reliability Engineer

Artech San Francisco , CA 94103

Posted 2 weeks ago

Site
Reliability Engineer

Department:
Infrastructure (SRE)
Job Category: Engineering / Infrastructure
Duties:
Site reliability engineer to help with uniformity in deployment services, automate wherever possible, increase monitoring capabilities, and manage capacity and performance to help scale Pinterest infrastructure. Site Reliability Engineers (SREs) are responsible for the overall performance and reliability of Pinterest's infrastructure and products. SREs design and implement the tools that automate building reliable and performant systems.
Advocate and implement reliable design patterns (circuit breakers, graceful degradation, etc.) Work with product engineering teams on design and implementation choices of large scale distributed systems Automate as much as humanly possible and always configure as code Bring ideas to life (i.e. production) to help make the lives of engineers better Predict our future failures and work proactively to mitigate them
Skills Required:
Experience bringing software to production at high scale
The knack for writing, clean, readable, maintainable code
An eye for automation and instrumentation
The ability to decompose complex systems and find failure scenarios
Great communication skills
Minimum Degree Required: Bachelor's Degree

icon no score

See how you match
to the job

Find your dream job anywhere
with the LiveCareer app.
Mobile App Icon
Download the
LiveCareer app and find
your dream job anywhere
App Store Icon Google Play Icon
lc_ad

Boost your job search productivity with our
free Chrome Extension!

lc_apply_tool GET EXTENSION

Similar Jobs

Want to see jobs matched to your resume? Upload One Now! Remove
Site Reliability Engineer

Domino Data Lab

Posted 3 days ago

VIEW JOBS 11/14/2019 12:00:00 AM 2020-02-12T00:00 Domino has an ambitious vision for data science and machine learning. Our platform helps data science teams accelerate research, increase collaboration, and rapidly deploy predictive models. Our customers are the most sophisticated analytical organizations in the world, including Salesforce, Dell, RedHat, Gap, Bristol-Myers Squibb, and Bayer. Backed by Sequoia Capital, Zetta Venture Partners, and Bloomberg Beta, we are at the epicenter of the data science revolution, helping companies build better cars, develop more effective medicine, or simply recommend the best song to play next. You will be joining a team of high-performance engineers and have a significant impact on managing a growing infrastructure and service delivery. You'll be tasked to maintain the health of the Domino platform in a variety of environments, enhancing our observability systems, engineering reliability into our stack, and governing our infrastructure. We are especially interested in engineers with experience operating services on GCP or Azure or implementing security policies and controls in cloud service providers. Responsibilities * Engineer reliability and performance into our product and services * Instrument and monitor service health * Manage and secure our cloud-based infrastructure * Diagnose and fix issues in a distributed, containerized application * Incident response (on-call) and root cause analysis * Implement and manage access control and security services * Collaborate with developers and PMs to continuously improve Domino * Develop tools and processes to improve efficiency and reduce toil Qualifications Tech we use is listed in parentheses; comparable experience is OK. * Experience with managing cloud environments (AWS, GCP, Azure) * Strong coding ability (Python, Bash) * Systems fluency (Linux, storage, networking) * Experience with container management (Kubernetes, Docker) * Observability systems (New Relic, Prometheus) * Operating stacks based on modern software components (Redis, ElasticSearch, RabbitMQ, MongoDB, PostgreSQL, Play) * Programming experience (Python, Go, Bash) * Infrastructure and configuration automation (Terraform, SaltStack) * Exceptional problem solving acumen Domino Data Lab San Francisco CA

Site Reliability Engineer

Artech