Production Engineer

Covariant Emeryville , CA 94608

Posted 3 weeks ago

THE COMPANY

Our mission is to build the Covariant Brain, a universal AI to give robots the ability to see, reason and act on the world around them. Bringing AI from research in the lab to the infinite variability and constant change of our customer's real-world operations requires new ideas, approaches and techniques.

Success in the real world requires a team that represents that world: diversity of backgrounds, points of view, and experiences. Our common denominator: ambitious expectations, love of learning, empathy for those around us, and a team-first mindset.

THE ROLE

Production Engineers at Covariant play a mission-critical role in ensuring our services' seamless operation and future scalability. In this role, you'll be at the forefront of every significant engineering endeavor embedded within our production and research teams. As a production engineer, you will drive innovation and efficiency in our projects by applying your expertise in AWS, Docker, Kubernetes, Puppet, and Terraform to architect scalable and resilient infrastructure for our innovative AI robotics systems.

AREAS OF FOCUS

  • Own and orchestrate large GPU clusters across different cloud providers using IaaC and scripts to provide researchers with a single cohesive interface

  • Help other teammates architect and build scalable tooling for our edge robot fleet

  • Collaborate with brilliant researchers to evolve our training and inference tooling to be state-of-the-art

YOU WILL

  • Design, build, manage and monitor the infrastructure we use to deploy our AI software and robotics solutions

  • Develop and evolve software engineering and operational practices for the unique needs of distributed AI-powered cyber-physical systems

  • Identify and establish healthy engineering and operational culture and processes

  • Deliver previously impossible robotics capabilities that solve real needs for our partners and customers

  • Collaborate with, learn from, and support a diverse and cross-functional team, including mechanical, electrical, and robotics engineers, AI/ML researchers, and business development

YOU HAVE

  • Substantial previous experience in operating and automating production systems in both cloud and bare metal, deploying and administering Linux systems and/or wide-area networks, and building new tools and/or extending existing tools to add new capabilities

  • A track record of accelerating developer productivity through improved tooling, automation, and education

  • A track record of partnering with stakeholders to deliver solutions throughout the development process

  • A solid foundation in Python, Linux, and networking

  • Commitment to continuous learning and willingness to pick up new languages or technologies as needed, to solve real problems and deliver business impact

NICE TO HAVES

  • Desire to work with a small collaborative team, with a high degree of autonomy and responsibility

  • Are motivated to work on challenging real-world engineering problems without prior solutions

  • Are excited to join coworkers who strive to be inclusive, thoughtful, and down-to-earth

  • Are self-directed and enjoy figuring out what is the most important problem to work on

  • Have previously done one or more of the following: deployed client-side software, including protecting source code, establishing secure licensing, and performing release engineering; or, set up and scaled developer tooling and CI/CD systems; or built ML or IoT data pipelines processing images and metadata from live deployments; or managed high-bandwidth deep learning or super-computing hardware

SAMPLE WEEK IN THE LIFE

  • Monday: Start the week with a team meeting to discuss ongoing projects and explore potential collaborations. Resume work on the rollout of BigProxy v2 in the development environment, refining probing tests to enhance its reliability. Also, schedule a discussion with our Tailscale account representative to renew our contract.

  • Tuesday: Address an urgent issue with the networking backplane of one of our GPU clusters not performing optimally. Conduct a troubleshooting session with the cluster provider to adjust the NCCL topology file, following unexpected changes on their end.

  • Wednesday: Develop a new alert in Datadog to monitor the performance of the GPU cluster backplane, ensuring it is adaptable for use with various providers.

  • Thursday: Collaborate with a colleague on deploying a PyPi server in our cloud infrastructure. Continue the implementation and testing of BigProxy v2 which was paused on Tuesday.

  • Friday: Lead a presentation at the weekly engineering deep dive to discuss the features and potential rollout of BigProxy v2, which consolidates all connections from remote deployments to the cloud through a single channel and simplifies SSH access to GPU clusters outside AWS/GCP. Gather and incorporate feedback from the team to finalize the deployment strategy.

$165,000 - $210,000 a year

SALARY RANGE

Base pay is one element of our total rewards package which may also include comprehensive benefits and equity etc., depending on eligibility. The annual base salary range for this position is from $165,000 to $210,000. The actual base pay offered will be determined on factors such as years of relevant experience, skills, education etc. Decisions will be determined on a case-by-case basis.

COMPANY CORE VALUES

LEARNING CONSTANTLY

STRIVING FOR EMPATHY

TAKING ON THE IMPOSSIBLE, TOGETHER

BENEFITS (US)

Health, dental, and vision insurance for you and your family

Unlimited PTO and Flexible work hours

401(k) plan and company match

Lunch and dinner each day (for on-site employees)

Monthly Health & Wellness budget

Quarterly Learning budget

At covariant.ai we don't just accept difference-we celebrate it, we support it, and we thrive on it for the benefit of our employees, our products, and our community. Covariant.ai is proud to be an equal opportunity workplace. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, or Veteran status.


icon no score

See how you match
to the job

Find your dream job anywhere
with the LiveCareer app.
Mobile App Icon
Download the
LiveCareer app and find
your dream job anywhere
App Store Icon Google Play Icon
lc_ad

Boost your job search productivity with our
free Chrome Extension!

lc_apply_tool GET EXTENSION

Similar Jobs

Want to see jobs matched to your resume? Upload One Now! Remove
Production Engineer

Kamatics Corporation

Posted Yesterday

VIEW JOBS 5/19/2024 12:00:00 AM 2024-08-17T00:00 Job Description Production Engineer Requisition ID 2024-8187 Category Process Engineering Segment Kaman Aerospace Shift Shift 1 of Openings 1 Posted Date 1 mo Kamatics Corporation Avon, CT Hartford County, CT

Production Engineer

Covariant