Staff Data Engineer - Research/Machine Learning

Character AI Menlo Park , CA 94026

Posted 2 months ago

About the role

You would be a great fit for this role if you are an experienced engineer who will be instrumental in building the world's best LLMs by collecting and refining the essential training data that powers them. In pursuit of the best language models, your responsibility is twofold:

  • First, identify and collect data at the scale required to feed our largest models. This involves managing a diverse set of sources, including structured and unstructured content from text and multimedia formats. Your engineering expertise is crucial in crafting the infrastructure and tools necessary to efficiently collect and manage petabytes of data.

  • Second, you will experiment with various methods of extracting a balanced and comprehensive training dataset from the raw data. You will leverage your expertise in data to build datasets reflecting a hypothesis, train models, and evaluate experimental results. Through this experimentation, you will create the training datasets for our largest models.

These are critical steps in the construction of AI. With petabytes of data and numerous design decisions, each step requires careful attention. Expertise in AI is not necessary, but enthusiasm for the space and a track record of adapting to new domains is important.

Who we're looking for

Required Experience:

  • 5+ years of production software engineering experience

  • Experience building large-scale data processing pipelines, with tools like PySpark, Beam, or Flink

  • Familiarity with Machine Learning and NLP and willingness to learn more on the job

  • Track record of adapting to new domains and a desire to use data to improve products

Additional Desired Experience:

  • ML experience as an ML engineer, Data Scientist, or another similar role

  • Experience with cloud platforms like AWS or Azure, or tools such as Kubernetes and Terraform

  • Passionate about Conversational AI or large language models

You will be a good fit if you are proactive and have a "get things done" mindset. Given our current pace of growth and load on our systems, most people have had a significant impact during their first week at the company.

Ready to empower the world with AGI?

Founded in 2021 by AI pioneers Noam Shazeer and Daniel De Freitas, Character is a full-stack AI powerhouse and ranks among the most utilized AI research platforms globally. Our innovative approach allows users to customize their experience with personalized AI 'Characters.'

In just two years, we achieved unicorn status and were named Google Play's AI App of the Year - a testament to our groundbreaking technology and vision.

Noam co-invented core LLM tech and was recently honored as one of TIME's 100 Most Influential in AI. Daniel created LaMDA, the breakthrough conversational AI now powering Google's Bard.

We encourage you to apply even if you don't meet all qualifications. Underrepresented individuals often experience imposter syndrome-don't underestimate yourself.

Our commitment to diversity:

Character values diversity and welcomes applicants of all backgrounds. We are an equal opportunity employer and firmly uphold a non-discrimination policy based on race, religion, national origin, gender, sexual orientation, age, veteran status, or disability. Your unique perspectives are vital to us.


icon no score

See how you match
to the job

Find your dream job anywhere
with the LiveCareer app.
Mobile App Icon
Download the
LiveCareer app and find
your dream job anywhere
App Store Icon Google Play Icon
lc_ad

Boost your job search productivity with our
free Chrome Extension!

lc_apply_tool GET EXTENSION

Similar Jobs

Want to see jobs matched to your resume? Upload One Now! Remove

Staff Data Engineer - Research/Machine Learning

Character AI