Engineer: Cluster - IV
The Judge Group Inc.

Atlanta, Georgia

Posted in IT

This job has expired.

Job Info

Location: REMOTE

Job Title: Cluster Administrator Engineer
Duration: 1+ Year Contract on W2

Location: 100% Remote

This role will be at least 1 year in duration which will be approved on a quarterly basis. This role is 100% remote.

Our client is looking for an experienced cluster administrator to manage HPC clusters. The right candidate will have experience on SLURM and related technologies and will be familiar with workloads related to machine learning training and inference (GPU and CPU).

Regarding the scope of work/responsibilities:

  • Serve as the primary contact for a GPU+CPU cluster
  • Collected team feedback and relayed to the support team (schedule downtimes/maintenance, propose changes to the cluster, etc.)
  • Perform capacity planning to help determine compute/storage needs for the team moving forward
  • Serve as the owner of the SLURM job scheduler, defining the configuration that better fits the team and developing/enabling advanced features
  • Serve as the team datasets owner (manage the datasets that live in the cluster and how people access them)
  • Help the team optimize/troubleshoot complex jobs/pipelines (AI centric, simulation, 3D graphics, etc.).
  • Educate the team on how to use the cluster (SLURM, BeeGFS, datasets, etc.), enabling a fast ramp up time of new scientists and engineers (via tutorials, presentations, wiki docs, etc.)

Desired skills:
  • Good communication skills. You can effectively communicate with a variety of shareholders, including presenting plans to higher management and having technical discussions with engineers/scientists.
  • Experience designing and managing large clusters with heterogeneous HW (CPUs, GPUs, etc.)
  • User-centric and results oriented. You can learn from data what the needs of our scientists/engineers will and can produce a cluster growth plan to fulfill these needs
    Power user. You are willing to extensively test the different workflows that run in the cluster and help optimize them.
  • Cluster tech stack. You are an expert on cluster orchestration and management, familiar with technologies such as SLURM, BeeGFS, Docker, etc. (or you are willing to learn them quickly)

  • Minimum Educational Requirement: BS degree or higher


This job and many more are available through The Judge Group. Find us on the web at

This job has expired.

More IT jobs

Arlington, Virginia
Posted 1 minute ago

McLean, Virginia
Posted 1 minute ago

Washington, District of Columbia
Posted 1 minute ago

Improve Job Search

Subscribe to job alerts and add your resume to our resume database for employers!

Sign up now