High Performance Computing
High Performance Computing, or HPC, is what we call computers running in clusters running some sort of workload that requires a lot of CPU, memory, and/or GPU resources. Clusters are set up in such a way that users get access to as many of those resources as possible without having resource contention, and accomplishes this by use of a job scheduling software.
Infrastructure Nerds has experience managing HPC clusters using several job schedulers running on multiple operating systems. Our DevOps approach ensures consistency across the computing environment, and our storage experience ensures that your data is available where you need it to be.
Use Cases
Here are a few of the possible use cases for HPC clusters:
- Scientific Research
- Video Encoding
- Artificial Inteligence (AI) / Machine Learning (ML)
Job Schedulers
There are many job schedulers out there. Here are the ones that Infrastructure Nerds has managed:
- Slurm
- Sun Grid Engine (SGE)
- torque (aka openPBS)
Our Approach
What sets us apart from most cluster managers is our commitment to DevOps approaches within the HPC space. Many people running clusters tend to manage them based on system images, which can get stale, bringing security and compliance risks. We prefer treat HPC nodes and their management servers as just another 'role' of server within our configuration management environment, subject to the same regular compliance checks and update rules.
HPC cluster nodes aren't special snowflakes that need their own special methodolgies. They're just servers with a specific use case, and should be treated as such.