Senior Data Scientist

605 Pasadena , CA 91101

Posted 2 weeks ago

The Senior Data Scientist is an upper-level position in 605s data science group, and is focused on the statistics-heavy, technical/backend aspects of data analytics. At 605, the Data Science and Client Analytics teams share a baseline knowledge of statistics and machine learning, R and python, and familiarity with relational databases and the scale/variety of 605s data assets. In contrast to the Client Analytics team, Data Scientists bring to the table some deeper technical skills, including:

  • Hands-on experience with industry-standard predictive modeling solutions such as scikit-learn, xgboost, Spark ML, and H2O.ai in a production setting
  • Familiarity with diverse methods for supervised learning, unsupervised learning, ETL pipelines in general
  • A more nuanced understanding of cloud-based (Amazon Web Services) computing resources/infrastructure, especially the need for and methods of parallelization for analytics tasks
  • Experience in designing and implementing wrapper/tool/utility functions for automated tasks, to be used by less-technical Analytics team members, and organizing them into distributable, regularly maintained and tested R/python packages
  • Proficiency with source code management (git) and related code development/review workflows, as well as continuous integration tools like Travis and/or Jenkins

Data Scientists at 605 are generally involved in at least two different projects at any given time and work alongside other data scientists or analysts. Projects produce both client-facing deliverables as well as internal tools/datasets consumed by other various teams at 605. This role could potentially be based out of one of 605s offices (in New York City or Pasadena, CA), or it could be full-time remote, dependending on the circumstances

Requirements

  • Masters degree in a quantitative, scientific, or engineering field and at least 2 years experience in a data science industry position
  • Advanced-level proficiency in either R or Python (baseline proficiency in both)
  • Intermediate-level proficiency with Apache Spark (via Python, R, Scala, and/or Java), with application to machine learning and/or ETL pipelines
  • Knowledge of diverse modeling algorithms for supervised learning, including most of the following: scikit-learn, xgboost, Spark ML, H2O.ai
  • Experience with most of the following AWS services: Redshift, S3, EC2, EMR, and Glue
  • Advanced-level proficiency with Linux/Unix operating system, command-line/shell environments, accessing remote machines and using Docker containers
  • Experience with git and github.com

Preferred Skills

  • Doctoral degree in a quantitative, scientific, or engineering field and/or at least 4 years experience in a data science industry position
  • Past work with household-level or person-level data sets including demographic, CRM, and/or self-reported (survey) data
  • Past work with time-stamped video consumption/viewing data or device usage data
  • Advanced-level proficiency in both R and Python (including class/function structure, package design and management)
  • Advanced-level proficiency with Apache Spark (via Python, R, Scala, and/or Java), including optimization for local and cluster scale applications
  • Experience with all of the following AWS services: Redshift, S3, EC2, EMR, and Glue
  • Knowledge of information security best practices

Benefits

  • Comprehensive health and dental insurance for employees and their families
  • Life insurance
  • 401k with match, eligible for match after one year
  • Pre-tax flexible compensation plan for medical, transit, parking or dependent care expenses
  • PTO & Sick daysif youre sick, you stay home
  • Work-from-home Fridays
  • A kitchen stocked with sodas, snacks, yogurt and other goodies
  • A tight-knit startup community who likes to eat! We celebrate everyones birthdays, have frequent team lunches, and do events in and out of the office
  • 605 is an active participant in conferences
icon no score

See how you match
to the job

Find your dream job anywhere
with the LiveCareer app.
Mobile App Icon
Download the
LiveCareer app and find
your dream job anywhere
App Store Icon Google Play Icon
lc_ad

Boost your job search productivity with our
free Chrome Extension!

lc_apply_tool GET EXTENSION

Similar Jobs

Want to see jobs matched to your resume? Upload One Now! Remove
Data Systems Devops Engineer Senior

California Institute Of Technology

Posted 1 week ago

VIEW JOBS 1/13/2020 12:00:00 AM 2020-04-12T00:00 Caltech is a world-renowned science and engineering institute that marshals some of the world's brightest minds and most innovative tools to address fundamental scientific questions. We thrive on finding and cultivating talented people who are passionate about what they do. Join us and be a part of the diverse Caltech community. Job Summary IPAC at Caltech has an opening for a Data Systems Dev/Ops Engineer to support NASA and Caltech astronomy missions and exploitation of IPAC’s vast archives of astronomical data, in fulfillment of our mission to enable transformative research in astrophysics and planetary science. Caltech is more than a world-renowned science and engineering research and education institution, it is a community—a community of curious and passionate individuals who collaborate to understand and solve complex scientific and societal challenges. Caltech is an award-winning workplace and one of the best places to work in the greater Los Angeles area. We offer an exciting and dynamic work environment, exceptional career development opportunities, competitive pay programs, great benefits, and the opportunity to participate in many campus programs and activities. Exceptional employees are critical to Caltech’s success. IPAC, part of the Physics, Math, and Astronomy Division at Caltech (www.caltech.edu), provides science operations, user support, data and archive services, and scientific vision to maximize discovery with observatories both in space and on the ground. IRSA (http://irsa.ipac.caltech.edu) is the steward of scientific data sets, ancillary data products, and documentation delivered by NASA's Infrared and Sub-millimeter astrophysics missions. Life at Caltech/IPAC IPAC is located on the campus of the California Institute of Technology in Pasadena, California by the foothills of the San Gabriel Mountains. Caltech is small, but has an extraordinary science impact, with 39 Nobel Prizes awarded to faculty and alumni. Caltech manages the Jet Propulsion Laboratory for NASA, the leading US center for the robotic exploration of the Solar System. Caltech also operates large-scale research facilities such as the Palomar and Keck Observatories, and LIGO, which in 2016 observed gravitational waves from colliding black holes for the first time. Pasadena is a city known for its contributions to science and technology, and its rich history, cultural treasures, scenic beauty, and year-round comfortable weather create a desirable environment for life and ideas to thrive. See Bill Nye’s take on our city : https://www.youtube.com/watch?v=44UHulpBilY. People choose to work at IPAC for many reasons, and the casual, employee-centric culture often leads to fulfilling, long-term careers and lasting relationships. Caltech’s benefits program offers a quality, competitive benefits package that is affordable for you and the Institute. The program provides a strong base of coverage for you and your dependents, and the ability to choose the plan and the level of coverage that best meet your needs. We also offer a 403(b) defined contribution plan to eligible staff as well as a Voluntary Retirement Savings (TDA) Plan. IPAC staff have access to the Institute’s facilities, including the athletic center, libraries, on-site daycare, professional development and enrichment classes, and Athenaeum club membership. Job Duties As an IPAC Data Systems Dev/Ops Engineer, you will work on tasks which may include: – Analysis of requirements for processing, networking and storage systems – Researching appropriate technologies for new project proposals, and assisting in preparation of cost estimates for system deployment and operation – Creation of detailed system designs meeting strategic and functional needs of the project – Devising and implementing solutions to systems challenges involving performance, capacity and/or budget, to help supported projects achieve science objectives – Development and operation of IPAC compute, storage and network infrastructure, employing both cloud and local data center technologies – Monitoring overall health and maintenance status of project computing infrastructure, and assistance in planning upgrades and expansions – Adapting science data analysis and archiving functions to cloud environments – Collaboration with IRSA project management on requirements and schedules – Coordination with other IPAC projects that share technologies and resources with IRSA You will apply your understanding of the full data systems hardware and software technology stack in helping design, develop and operate IPAC’s compute and network infrastructure. Your familiarity with both local data center and cloud technologies will help you create effective approaches to achieve science objectives within technical, budget, and operations constraints. Your work will involve petabyte-scale image storage and processing, multi-billion row databases, and web services for accessing and processing astronomical datasets. You will work with scientists, developers and engineering staff to help interpret and implement project requirements, and document the work that you’ve accomplished. Basic Qualifications If you have the following in your background, then we want to hear about your interest in joining our team: – Bachelor’s or equivalent degree in Computer Science, Computer Engineering, Information Sciences, or related field, or equivalent combination of training and experience – 10 years of experience or equivalent training/education in software development – 10 years of experience as a Unix systems administrator, and familiar with data center hardware specification, operation and management – Experience as a software developer in a Unix/Linux environment using C/C++, Python, and/or Java – Familiarity with software and systems configuration management principles and tools, source code version control (e.g. git/github) and issue/bug tracking systems (e.g. Jira) Preferred Qualifications Beyond these basic qualifications, there are a set of skills and experiences which will add to your ability to contribute to the roles and responsibilities of an IPAC Data Systems Dev/Ops Engineer. The following might give you a head-start here, but even if these do not describe you or your experience, we would still like to hear from you: – Master’s or equivalent degree in Computer Science, Computer Engineering, Information Sciences, or related field – Experience with server virtualization and cloud services such as AWS, GCP, Azure – Expertise in large dataset management, processing, and analysis techniques – Experience with software containerization and deployment technologies, such as Docker and Kubernetes Required Documents – Cover Letter – Resume California Institute Of Technology Pasadena CA

Senior Data Scientist

605