Research Cyberinfrastructure Engineer II, HPC And GPU Cluster (Rcieii)

Dartmouth College Hanover , NH 03755

Posted 2 weeks ago

Position Details

Position Information

Posting date 06/10/2024 Closing date Open Until Filled Yes Position Number 1128918 Position Title Research Cyberinfrastructure Engineer II, HPC and GPU Cluster (RCIEII) Department this Position Reports to Research Cyberinfrastructure Hiring Range Minimum $99,400 Hiring Range Maximum $114,300 Union Type Not a Union Position SEIU Level Not an SEIU Position FLSA Status Exempt Employment Category Regular Full Time w/end date Scheduled Months per Year 12 Scheduled Hours per Week 40 Schedule

M-F, 8a-5p

Location of Position

Hanover, NH

Remote Work Eligibility? Hybrid Is this a term position? Yes If yes, length of term in months. 36 Is this a grant funded position? No Position Purpose

The Research Cyberinfrastructure Engineer II (RCIEII) enhances research computing infrastructure, focusing on administration, High-Performance Computing (HPC), cloud, and advanced computing solutions. Responsibilities include building and maintaining a graphical processing unit (GPU) cluster primarily used for artificial intelligence (AI) and machine learning (ML) workloads. This role increases infrastructure security, availability, and scalability, leading automation and system optimization initiatives to advance research capabilities. The RCIEII provides advanced support, develops innovative solutions, and leads projects to enhance research success.

Description

Join Our Team as a Research Cyberinfrastructure Engineer II, HPC and GPU Cluster at Dartmouth!

Are you ready to enhance the future of research computing? Dartmouth is looking for a dynamic Research Cyberinfrastructure Engineer II (RCIEII) to innovate and lead in HPC and GPU cluster administration.

About the Role:

As an RCIEII, you will enhance research computing infrastructure, focusing on building and maintaining a GPU cluster for AI and ML workloads. You will ensure infrastructure security, availability, and scalability while leading automation and system optimization initiatives.

What You'll Do:

Lead Projects: Manage and optimize HPC environments and cloud-based infrastructures, focusing on high availability and performance.

Innovate: Implement cutting-edge computing services and applications, integrating GPU technologies into HPC environments.

Collaborate: Build strategic partnerships with IT departments, technology providers, and research groups to foster collaboration.

Mentor and Train: Create knowledge-sharing platforms, coordinate hackathons and workshops, and promote continuous development.

Your Skills and Expertise:

  • Bachelor's degree in Computer Science/IT or equivalent experience.

  • 3+ years in research computing, focusing on HPC system optimization and security.

  • Proficiency in scripting (Python, Bash) and automation tools (Ansible, Terraform).

  • Expertise in Linux, Windows server management, and container technologies (Docker, Kubernetes).

  • Skilled in cloud platforms (AWS, Azure, Google Cloud) and HPC software deployment.

Why Dartmouth?

Impactful Work: Contribute to groundbreaking research and innovative projects.

Collaborative Environment: Work with a diverse and interdisciplinary team of experts.

Professional Growth: Continuous learning and professional development opportunities.

Join Us:

Be a part of a team driving innovation in research computing. Apply now to lead the future of research cyberinfrastructure at Dartmouth!

Required Qualifications

  • Education and Yrs Exp Bachelors plus 3-5 years' experience or equivalent combination of education and experience Required Qualifications

  • Skills, Knowledge and Abilities

  • Bachelor's degree or equivalent experience in Computer Science/IT.

  • 3+ years in research computing, focusing on HPC system optimization and security.

  • Proficient in scripting (Python, Bash) and automation tools.

  • Proven project success in enhancing research computing environments.

  • Expertise in Linux and Windows server management.

  • Experienced in Docker and Kubernetes.

  • Familiar with Ansible, Terraform, Puppet for automation.

  • Strong analytical and problem-solving skills.

  • Skilled in cloud platforms (AWS, Azure, Google Cloud).

  • Effective communication and teamwork skills.

  • Leadership experience in mentoring and team development.

Preferred Qualifications

  • Advanced degree or certifications in relevant fields.

  • Expertise in AI/ML software and frameworks.

  • Experience with CUDA programming and/or C/C++.

  • Professional certifications (e.g., AWS Certified Solutions Architect, Google Cloud Professional Cloud Architect).

  • Experience in academic/research IT environments.

  • Hands-on data center operations experience.

  • Proficient in HPC software deployment and troubleshooting.

  • Skilled in cloud services for HPC workloads.

  • Experience in developing and maintaining infrastructure documentation.

  • Innovative in developing new services and applications.

  • Comprehensive understanding of security in computing environments.

  • Excellent troubleshooting skills using command-line tools and vendor support.

Department Contact for Recruitment Inquiries Jonathan Kulp Department Contact Phone Number 603.646.6110 Department Contact for Cover Letter and Title Elijah Gagne Department Contact's Phone Number 603.646.9650 Equal Opportunity Employer

Dartmouth College is an equal opportunity/affirmative action employer with a strong commitment to diversity and inclusion. We prohibit discrimination on the basis of race, color, religion, sex, age, national origin, sexual orientation, gender identity or expression, disability, veteran status, marital status, or any other legally protected status. Applications by members of all underrepresented groups are encouraged.

Background Check

Employment in this position is contingent upon consent to and successful completion of a pre-employment background check, which may include a criminal background check, reference checks, verification of work history, conduct review, and verification of any required academic credentials, licenses, and/or certifications, with results acceptable to Dartmouth College. A criminal conviction will not automatically disqualify an applicant from employment. Background check information will be used in a confidential, non-discriminatory manner consistent with state and federal law.

Is driving a vehicle (e.g. Dartmouth vehicle or off road vehicle, rental car, personal car) an essential function of this job? Not an essential function Special Instructions to Applicants

This position is a 36-month term position.

Dartmouth College has a Tobacco-Free Policy. Smoking and the use of tobacco-based products (including smokeless tobacco) are prohibited in all facilities, grounds, vehicles or other areas owned, operated or occupied by Dartmouth College with no exceptions. For details, please see our policy. https://policies.dartmouth.edu/policy/tobacco-free-policy

Additional Instructions Quick Link https://searchjobs.dartmouth.edu/postings/74282

Key Accountabilities

Description

Cyberinfrastructure Operations

  • Integrates GPU technologies into HPC environments, collaborating with researchers and HPC programmers.

  • Acts as a Subject Matter Expert (SME) in cloud services, HPC, automation, storage, and container technologies (e.g., Docker, Kubernetes), providing advanced support and consultancy.

  • Manages and optimizes HPC environments and cloud-based infrastructures, focusing on high availability, efficient load balancing, and performance across platforms such as AWS and GCP.

  • Designs and implements networking configurations, maintaining security compliance (e.g., FISMA, PCI, GDPR, HIPAA).

  • Develops and refines automation scripts and workflows using tools like Ansible, Terraform, Python, and PowerShell.

  • Coordinates disaster recovery plans, data integrity strategies, oversees hypervisor environments, and ensures computing services' resilience.

  • Provides on-call support, showcasing problem-solving capabilities and promoting knowledge sharing within the team.

  • Implements security measures to protect HPC environments, applications, servers, and storage from cyber threats.

  • Utilizes scalability techniques to ensure HPC systems can accommodate growing research demands.

  • Monitors system availability, implementing redundancy and failover strategies.

Percentage Of Time 40% Description

Computing and HPC Initiatives

  • Leads initiatives to design and implement computing services and applications addressing specific research challenges.

  • Collaborates with researchers to understand computational needs, translating these into practical, scalable solutions.

  • Oversees the integration of cloud-based solutions for HPC workloads.

  • Designs and manages data storage infrastructures ensuring data integrity, availability, and compliance with policies and regulations.

Percentage Of Time 20% Description

Collaboration and Relationship Management

  • Builds and nurtures strategic partnerships with IT departments, technology providers, and research groups.

  • Manages joint ventures with academic partners to pilot new technologies in research computing.

  • Engages stakeholders through updates, presentations, and collaborative sessions, ensuring their needs are met.

Percentage Of Time 20% Description

Training and Development

  • Creates a knowledge-sharing platform for team members to share best practices and solutions.

  • Coordinates hackathons, tech talks, and workshops to stimulate innovation and the adoption of new technologies.

  • Seeks continuous personal development and identifies opportunities for team advancement.

Percentage Of Time 10% Description

Leadership

  • Serves as the technical lead in critical problem-solving efforts.

  • Cultivates a problem-solving mindset within the team.

  • Reviews team processes and workflows, identifying inefficiencies.

  • Implements process improvements to enhance team productivity and project management.

Percentage Of Time 5%

  • -- Demonstrates a commitment to diversity, inclusion, and cultural awareness through actions, interactions, and communications with others. -- Performs other duties as assigned.

Supplemental Questions

Required fields are indicated with an asterisk (*).

  • How did you learn about this employment opportunity?

  • Current Dartmouth employee (Please specify full name below)

  • Word of mouth

  • Mentioned on social, digital, or print media (e.g. LinkedIn feed, VOX, Valley News, listserv)

  • jobs@dartmouth.edu email outreach (includes Job Alert notifications, marketing emails from Talent Acquisition)

  • Recruiter (Please specify full name or event below)

  • abilityJOBS

  • Chronicle of Higher Education

  • Glassdoor

  • Handshake

  • HigherEdJobs

  • Indeed

  • Inside Higher Ed

  • LinkedIn's Job Board

  • RecruitMilitary

  • Dartmouth's Job Board (searchjobs.dartmouth.edu)

  • Other (Please specify below)

  • If you would like to add more information to your answer, please specify here:

(Open Ended Question)

Documents Needed to Apply

Required Documents

  • Cover Letter

  • Resume

Optional Documents

  • Curriculum Vitae

  • Additional Document #1

icon no score

See how you match
to the job

Find your dream job anywhere
with the LiveCareer app.
Mobile App Icon
Download the
LiveCareer app and find
your dream job anywhere
App Store Icon Google Play Icon
lc_ad

Boost your job search productivity with our
free Chrome Extension!

lc_apply_tool GET EXTENSION

Similar Jobs

Want to see jobs matched to your resume? Upload One Now! Remove

Research Cyberinfrastructure Engineer II, HPC And GPU Cluster (Rcieii)

Dartmouth College