Senior Site Reliability Engineer

Keeptruckin San Francisco , CA 94118

Posted 3 months ago

Who we are:

KeepTruckin is on a mission to modernize the trucking industry. With the leading fleet management platform, we are bringing trucks online and fundamentally changing the way freight is moved on our roads.

At KeepTruckin, we see our hard work rewarded in tangible ways every day and we believe that intelligence is most powerful when paired with humility. We're motivated by the opportunity to impact and improve every facet of a trillion-dollar industry that touches everyone's lives. KeepTruckin is proud to be a Forbes Cloud 100 company and recognized by Glassdoor as a "Best Place to Work" in 2019.

We are looking for people from all backgrounds who want to make an impact on the millions of drivers who keep our world moving. Together, we laugh hard, snack harder and work together to drive innovation at the intersection of tech and transportation.

About the Job:

As an early member of the Site Reliability Team, your role will be crucial in helping us design, scale, and manage our growing AWS-backed infrastructure. Your expertise will be contributed to scaling our architecture and building a highly available system with an enthusiastic team. We are looking for candidates who have production experience with AWS-based platforms, expertise in automating distributed systems, scaling a fast growing platform, maintaining high availability, and a forward thinking mindset ready to take on tomorrow's challenges.


  • Automate the provisioning, scaling, and management of our infrastructure using Configuration As Code and Configuration Management

  • Create deployment pipelines; take code from git to production

  • Continuously improve the monitoring and alerting capabilities of our platform, enabling us to be proactive instead of reactive

  • Identify and remove bottlenecks from systems in production

  • Ensure 99.9% customer-facing uptime


  • 4+ years professional SRE/DevOps experience

  • Working knowledge of AWS services and technologies (Redshift, DynamoDB, Kinesis, RDS, ELB, AutoScaling, Lambda, etc)

  • Experience with infrastructure as code and configuration management (Terraform, Nix, Ansible, CloudFormation, Chef, etc...)

  • Demonstrated ability working on high volume production systems

  • Experience with build managers such as Bazel, Pants, Buck

  • Knowledge of Python, Ruby, or Go

  • Experience with container orchestration framework such as Kubernetes, Docker Swarm

  • Understanding of relational and NoSQL databases (PostgreSQL a plus)

As an equal opportunity employer, we are committed to diversity in the workforce. In accordance with applicable law, we prohibit discrimination against any applicant or employee based on any legally recognized basis, including, but not limited to; race, color, religion, sex (including pregnancy, lactation, childbirth or related medical conditions), sexual orientation, gender identity, age (40 and over), national origin or ancestry, physical or mental disability, genetic information (including testing and characteristics), veteran status, uniformed service member status or any other status protected by federal, state or local law.

icon no score

See how you match
to the job

Find your dream job anywhere
with the LiveCareer app.
Mobile App Icon
Download the
LiveCareer app and find
your dream job anywhere
App Store Icon Google Play Icon

Boost your job search productivity with our
free Chrome Extension!

lc_apply_tool GET EXTENSION

Similar Jobs

Want to see jobs matched to your resume? Upload One Now! Remove
Senior Site Reliability Engineer


Posted 2 weeks ago

VIEW JOBS 9/26/2019 12:00:00 AM 2019-12-25T00:00 You will be providing voice services for literally millions of game players around the world every day, and operating our software in the cloud, on our own hardware, and in data-centers around the globe. We do this at massive scale, while ensuring our users enjoy a high-quality voice experience. We are looking for a talented, driven, and dedicated engineer. We have a continuous need for innovation to meet the needs of our evolving environment. Join our extremely efficient but growing Ops team to work with an extensive array of technologies at global scale. We provide great work flexibility and you'll never be bored. Vivox, part of Unity, is the leading provider of group voice communication for games and beyond. We provide and operate the communication technology for some of the world's largest games including Fortnite, Playerunknown's Battlegrounds, and Rainbow Six Siege. Our products are used by over 100m people every month in every country on the planet. Responsibilities * Managing global, large scale, multi-datacenter, production-level applications and infrastructure * Developing services to automatically detect and reduce service disruption, driving towards fluid elasticity between bare-metal, private cloud and commercial cloud * Participating in defining Site Reliability policy, process, technology and best practices and driving to instil a culture of Reliability within the teams by advocating for infrastructure improvements and best practices * Monitoring high-uptime, low-latency services to ensure the best experience for our customers * On-call rotation with the rest of the team Requirements * Strong understanding of infrastructure core components: Storage, System (CentOS) and/or Networking * Strong understanding of at scale Networking designs, including Dynamic routing, switching, DNS and CDN technology. * Understanding of network security and DDoS prevention * Demonstrated hands-on experience in devops, site reliability and/or infrastructure engineering in a quickly growing company using technologies such as Kafka, ELK, and Ceph * Demonstrated leadership in aspects of general large-scale distributed services and infrastructure, from network designs to overall system architecture * Strong development experience, preferably around system deployment, configuration and management using frameworks such as Chef, Puppet, Terraform, Ansible, and Salt * Excellent written and verbal communication skills * Customer centric and empathetic approach to support, thrive in making our customers and users successful * Ability to be a team mentor, lead by example, build trust and solid relationships * Experience with cloud computing environments, ie. Openstack, AWS, GCP and/or Azure * Advanced understanding of general infrastructure operation, and ability to drive a team to incident resolution, troubleshoot and resolve issues in a timely manner Bonus points * Ability to exploit reporting tools to increase productivity (InfluxDB, ElasticSearch, Kibana, Grafana etc.) * Familiarity with running/tuning SQL servers * Architected and/or deployed large scale orchestration system (i.e. Kubernetes, Foreman) * Experience working with and building complex, highly available and scalable real-time services Who we are Unity is the creator of the world's most widely-used real-time 3D (RT3D) development platform, providing content creators around the world with the tools they need to build rich, interactive 2D, 3D, VR and AR experiences. In fact, apps made with Unity reach 2.7 billion devices worldwide, and were installed more than 24 billion times in the last 12 months. The global engineering team keeps Unity at the forefront of technology and — working alongside partners like Magic Leap, Google, Facebook, Oculus and Microsoft — ensures optimized support for the latest technology and platforms. Unity is powering the real-time revolution, expanding beyond games and breaking into other industries including automotive, film, architecture, engineering, construction and more. Unity is an equal opportunity employer committed to fostering an inclusive, innovative environment with the best employees. Therefore, we provide employment opportunities without regard to age, race, color, ancestry, national origin, religion, disability, sex, gender identity or expression, sexual orientation, or any other protected status in accordance with applicable law. If there are preparations we can make to help ensure you have a comfortable and positive interview experience, please let us know. Headhunters and recruitment agencies may not submit resumes/CVs through this Web site or directly to managers. Unity does not accept unsolicited headhunter and agency resumes. Unity will not pay fees to any third-party agency or company that does not have a signed agreement with Unity. #SEN #LI-AP3 Unity San Francisco CA

Senior Site Reliability Engineer