Senior Network Software Engineer

Nvidia Santa Clara , CA 95051

Posted 1 week ago

At NVIDIA, we're driven by a profound commitment to transforming the future of computing, artificial intelligence, and visualization technologies. Joining NVIDIA's AI Efficiency Team means contributing to the infrastructure that powers our innovative AI research. This team focuses on optimizing efficiency and resiliency of ML workloads, as well as developing scalable AI infrastructure tools and services. Our objective is to deliver a stable, scalable environment for NVIDIA's AI researchers, providing them with the necessary resources and scale to foster innovation.

We are seeking a senior network software engineer to join our team. As a Senior Network Software Engineer, you will be instrumental in co-designing and implementing innovative solutions that power AI applications at an unprecedented scale. Your expertise in network software architecture and high-performance interconnects will drive innovation and enable us to deliver platforms that redefine what is possible. This is an exceptional opportunity to push the boundaries of technology and shape the future of AI and work with a world-class team of like-minded engineers.

What you will be doing:

  • Collaborate with multi-functional teams to analyze, co-design, and develop networking software and hardware for innovative AI platforms.

  • Drive the development of new networking algorithms and protocols for point-to-point and collective operations at scale.

  • Identify bottlenecks and inefficiencies in application code, proposing optimizations to enhance performance and network utilization.

  • Design and implement performance benchmarks and testing methodologies to evaluate performance at scale.

  • Provide guidance and recommendations for optimizing AI applications for speed, scalability, and resource efficiency.

  • Share knowledge with domain expert teams as they develop applications for the next generation of AI platforms.

  • Contribute to the development of tools and frameworks to facilitate network optimization.

What We Need to See:

  • PhD in Computer Science, Computer Engineering, or related field, or equivalent experience

  • 10+ years of experience with a focus on high-performance networking and AI applications

  • Expertise in RDMA networking (InfiniBand, ROCE), Ethernet, and PCIe.

  • Experience with at least one high-performance networking library: NCCL, UCX, libfabric, MPI, UCC.

  • Deep understanding of various aspects of high-performance networking, including network technologies, debugging, and performance analysis.

  • Experience in developing and optimizing deep learning frameworks such as PyTorch and TensorFlow.

  • Proficiency in Python and C/C++.

  • Experience in CUDA programming.

  • Track record of delivering performance improvements for software used in large-scale deployments.

  • Knowledge of Kubernetes (k8s) and cloud-native application principles is a plus.

  • Familiarity with continuous integration and delivery practices for performance optimization.

Ways To stand out from the crowd:

  • Hands-on experience in optimizing networking building blocks for DL frameworks like PyTorch and TensorFlow.

  • Experience in developing communication libraries such as NCCL, UCX, UCC, MPI.

  • In-depth knowledge of RDMA, GPU-Direct, and network technologies.

  • Provide references to your code contributions.

This is an exceptional opportunity to push the limits of state-of-the-art technology and contribute to the development of platforms the world has never seen before. As part of NVIDIA, you'll work alongside top-tier talent in a collaborative environment that fosters innovation and creativity.

If you're passionate about shaping the future of AI and high-performance computing, apply now to embark on an exciting journey with us!

The base salary range is 220,000 USD - 419,750 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.


icon no score

See how you match
to the job

Find your dream job anywhere
with the LiveCareer app.
Mobile App Icon
Download the
LiveCareer app and find
your dream job anywhere
App Store Icon Google Play Icon
lc_ad

Boost your job search productivity with our
free Chrome Extension!

lc_apply_tool GET EXTENSION

Similar Jobs

Want to see jobs matched to your resume? Upload One Now! Remove
Senior Network Software Engineer

Nvidia

Posted 1 week ago

VIEW JOBS 4/15/2024 12:00:00 AM 2024-07-14T00:00 At NVIDIA, we're driven by a profound commitment to transforming the future of computing, artificial intelligence, and visualization technologies. Joining NVID Nvidia Santa Clara CA

Senior Network Software Engineer

Nvidia