System Administrator

Renaud Consulting Ottawa , Ontario K2B 8A4

Posted 5 days ago

  1. Objective
The objective for this work is to manage high performance computing (HPC) cluster (HPC administrator) and support users (HPC analyst) with respect to the installation, execution and debugging of research applications and code on high performance computing (HPC) clusters. This requires troubleshooting and ensuring client satisfaction to help clients (scientists) devote their time to NRC research priorities, not resolving IT related issues.
  1. Scope of Work
CategoryTasks for Contractor
HPC administrator tasks
  • Maintain a HPC cluster (hardware, image management, local networking, scheduler, backups).
  • Troubleshoot the environment when an incident occurs to ensure a quick return to normal operations.
HPC Analyst Tasks
  • Meet with scientists and evaluate their requirements for HPC support.
  • Develop a task plan to meet scientists' needs and consult the technical authority for approval.
  • Application builds and installs, runtime troubleshooting (GNU, Intel, Fortran, Nvidia).
  • Support for open-source and commercial off-the-shelf (COTS) software, including:
    • Python and Anaconda installs.
    • Bash scripts, build/make tools, EasyBuild, and Spack.
    • MPI implementations (MPICH, OpenMPI, IntelMPI, HPMPI).
  • Assist with in-house developed applications (compilation and runtime).
Other General TasksManagement of:
  • Operating system (patching schedule, reliability for Linux distributions).
  • Accounts (creation, deletion).
  • Configuration via Git, MS DevOps, Ansible Playbooks.
  • RPM/DEB Packages.
  • Environment modules.
  • ThinLinc troubleshooting.
Troubleshoot & Hardware
  • Troubleshooting jobs on schedulers (PBS Pro/Torque, SLURM, SGE).
  • Ensure reliable CUDA installs, troubleshoot GPU failures and other CUDA software/driver issues.
  • Hardware support (memory upgrades, storage arrays, power and network cabling, ILO).
Documentation
  • Document each process for every task to ensure enterprise knowledge continuity.
Mandatory Requirements
  • The proposed resource has five (5) years’ experience within the last ten (10) years in administrating HPC (High Performance Computing) systems and performing HPC analyst tasks, as per Annex – A Statement of Work.
  • The proposed resource has worked for more than twelves (12) months. Each reference provided must have been in a role of supervision of the proposed resource
icon no score

See how you match
to the job

Find your dream job anywhere
with the LiveCareer app.
Mobile App Icon
Download the
LiveCareer app and find
your dream job anywhere
App Store Icon Google Play Icon
lc_ad

Boost your job search productivity with our
free Chrome Extension!

lc_apply_tool GET EXTENSION

Similar Jobs

Want to see jobs matched to your resume? Upload One Now! Remove
System Administrator
New!

Inproduction

Posted Today

VIEW JOBS 2/19/2025 12:00:00 AM 2025-05-20T00:00 InProduction is the leading provider of temporary seating, staging, structures, and scenic production for the U.S. live events industry. The Company is a valua Inproduction Warrenville IL

System Administrator

Renaud Consulting