Member Of Technical Staff: Data Acquisition (Crawler) Engineer

Essential AI Labs San Francisco , CA 94118

Posted 2 weeks ago

Essential AI's mission is to deepen the partnership between humans and computers, unlocking collaborative capabilities that far exceed what could be achieved today. We believe that building delightful end-user experiences requires innovating across the stack - from the UX all the way down to models that achieve the best user value per FLOP.

We believe that a small, focused team of motivated individuals can create outsized breakthroughs. We are building a world-class multi-disciplinary team who are excited to solve hard real-world AI problems. We are well-capitalized and supported by March Capital and Thrive Capital, with participation from AMD, Franklin Venture Partners, Google, KB Investment, NVIDIA.

If you're interested in our mission and being the best in the world at your craft, please apply to one of the following roles. If you don't see a role that fits exactly but still greatly want to contribute, please reach out to hiring@essential.ai

","jobBoardBottomDescriptionHtml":"

Essential AI commits to provide a work environment free of discrimination and harassment, as well as equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or veteran status. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. You may view all of Essential AI's recruiting notices here, including our EEO policy, recruitment scam notice, and recruitment agency policy.

"},"appConfirmationTrackingPixelHtml":null,"recruitingPrivacyPolicyUrl":null,"timezone":"America/Los_Angeles"},"posting":{"id":"614bf71e-63ec-4cd6-a039-76e9aea17e10","title":"Member of Technical Staff: Data Acquisition (Crawler) Engineer","isListed":true,"isConfidential":false,"departmentName":"Engineering","teamNames":["Engineering"],"locationName":"San Francisco","employmentType":"FullTime","descriptionHtml":"

Essential AI's mission is to deepen the partnership between humans and computers, unlocking collaborative capabilities that far exceed what could be achieved today. We believe that building delightful end-user experiences requires innovating across the stack - from the UX all the way down to models that achieve the best user value per FLOP.

We believe that a small, focused team of motivated individuals can create outsized breakthroughs. We are building a world-class multi-disciplinary team who are excited to solve hard real-world AI problems. We are well-capitalized and supported by March Capital and Thrive Capital, with participation from AMD, Franklin Venture Partners, Google, KB Investment, NVIDIA.

The Role

The Data Acquisition (Crawler) Engineer will be responsible for developing and maintaining the systems that allow for the smooth and efficient collection, storage, and processing of data from various sources. Your primary responsibility will be to design, develop, and maintain web crawlers and data acquisition systems in an efficient and reliable manner to support our model training.

What you'll be working on

  • Architect and build large scale distributed web crawler system.

  • Design and implement web crawlers and scrapers to automatically extract data from websites, handling challenges like dynamic content and scaling to large data volumes.

  • Develop data acquisition pipelines to ingest, transform, and store large volumes of data.

  • Develop a highly scalable system and optimize crawler performance.

  • Monitor and troubleshoot crawler activities to detect and resolve issues promptly.

  • Work closely with data infrastructure and data researcher to improve the quality of the data.

What we are looking for

  • Previous large scale web crawling experience is a must for this role.

  • Minimum of 5 years of experience in data-intensive applications and distributed systems.

  • Proficiency in high performance programming languages like Go or Rust or C++.

  • Strong understanding of orchestration and containerization frameworks like Docker / Kubernetes.

  • Experience building on GCP or AWS services.

  • Bonus: You have deep expertise working with headless browsers and Chrome DevTools Protocol.

  • Bonus: You are curious to learn and develop understanding of how data sources and quality affects LLM capabilities.

We encourage you to apply for this position even if you don't check all of the above requirements but want to spend time pushing on these techniques.

We are based in-person in SF. We offer relocation assistance to new employees.


icon no score

See how you match
to the job

Find your dream job anywhere
with the LiveCareer app.
Mobile App Icon
Download the
LiveCareer app and find
your dream job anywhere
App Store Icon Google Play Icon
lc_ad

Boost your job search productivity with our
free Chrome Extension!

lc_apply_tool GET EXTENSION

Similar Jobs

Want to see jobs matched to your resume? Upload One Now! Remove

Member Of Technical Staff: Data Acquisition (Crawler) Engineer

Essential AI Labs