The objective for this work is to manage high performance computing (HPC) cluster (HPC administrator) and support users (HPC analyst) with respect to the installation, execution and debugging of research applications and code on high performance computing (HPC) clusters. This requires troubleshooting and ensuring client satisfaction to help clients (scientists) devote their time to NRC research priorities, not resolving IT related issues.
Scope of Work
Category
Tasks for Contractor
HPC administrator tasks
Maintain a HPC cluster (hardware, image management, local networking, scheduler, backups).
Troubleshoot the environment when an incident occurs to ensure a quick return to normal operations.
HPC Analyst Tasks
Meet with scientists and evaluate their requirements for HPC support.
Develop a task plan to meet scientists' needs and consult the technical authority for approval.
Application builds and installs, runtime troubleshooting (GNU, Intel, Fortran, Nvidia).
Support for open-source and commercial off-the-shelf (COTS) software, including:
Python and Anaconda installs.
Bash scripts, build/make tools, EasyBuild, and Spack.
Assist with in-house developed applications (compilation and runtime).
Other General Tasks
Management of:
Operating system (patching schedule, reliability for Linux distributions).
Accounts (creation, deletion).
Configuration via Git, MS DevOps, Ansible Playbooks.
RPM/DEB Packages.
Environment modules.
ThinLinc troubleshooting.
Troubleshoot & Hardware
Troubleshooting jobs on schedulers (PBS Pro/Torque, SLURM, SGE).
Ensure reliable CUDA installs, troubleshoot GPU failures and other CUDA software/driver issues.
Hardware support (memory upgrades, storage arrays, power and network cabling, ILO).
Documentation
Document each process for every task to ensure enterprise knowledge continuity.
Mandatory Requirements
The proposed resource has five (5) years’ experience within the last ten (10) years in administrating HPC (High Performance Computing) systems and performing HPC analyst tasks, as per Annex – A Statement of Work.
The proposed resource has worked for more than twelves (12) months. Each reference provided must have been in a role of supervision of the proposed resource
VIEW JOBS2/19/2025 12:00:00 AM2025-05-20T00:00 Only candidates with current TS/SCI clearance will be considered We are unable to sponsor a clearance at this time A POLYGRAPH is required for this positQue Technology GroupFort MeadeMD
VIEW JOBS2/19/2025 12:00:00 AM2025-05-20T00:00 System Administrator LOCATION Fort Eisenhower, GA 30905 CLEARANCE TS/SCI Full Poly (Please note this position requires full U.S. Citizenship) KEY SUMMARY We CymertekFort EisenhowerGA
VIEW JOBS2/19/2025 12:00:00 AM2025-05-20T00:00 System Administrator LOCATION Reston, VA 20190 CLEARANCE TS/SCI Full Poly (Please note this position requires full U.S. Citizenship) KEY SUMMARY We are seekiCymertekRestonVA