Our client is a consultancy that specialises in infrastructure solutions for top-tier clients. We are seeking an experienced HPC & AI Infrastructure Architect to lead the design and deployment of next-generation data‑centre infrastructure supporting massive AI/ML workloads. This role demands deep domain expertise in compute, thermal, power and DC architecture for scale beyond 50 MW.
Key Responsibilities
- Model and simulate HPC workloads: estimate compute, storage, networking demand.
- Run thermal & energy modelling (CFD, PUE), and evaluate cooling solutions (liquid, immersion, free-air).
- Lead AI/ML inference modelling: estimate GPU/TPU needs (H100, GH200, MI300, custom chips).
- Architect multi‑GPU nodes, memory hierarchies, interconnects (PCIe/CCIX/NVLink).
- Optimize FP16/BF16/INT8 performance and profiling to eliminate bottlenecks.
- Drive distributed training strategies: data, model, pipeline parallelism using frameworks (Horovod, DeepSpeed, Megatron‑LM, Ray).
- Design HPC/GPU cluster and parallel storage (Lustre, GPFS, NVMe tiers), interconnect topology (InfiniBand, NVLink), while optimizing TCO and density.
Site & Infrastructure Analysis
- Evaluate grid capacity: 50–100 MW power availability, redundancy, competing demands.
- Assess cooling infrastructure: support advanced options and site scalability.
- Co‑work with enterprise network architects: ensure carrier‑grade fibre with multi‑carrier, low‑latency access.
- Ensure physical scalability: zoning, space planning, modular expansion and security.
For more information contact, brian@church-int.com or apply.