CSCE 654: Supercomputing
Graduate course, Texas A&M University, 1900
Course Information
Fall 2025 Credit Hours: 3
Time: Tuesday/Thursday (TR) 2:20-3:25pm
Location: HRBB 126
Instructor
Instructor: Ariful Azad
Office: PETR 215
Phone: 979-845-5351
E-Mail: ariful@tamu.edu
Office Hours: Tuesday 1pm - 2pm
Course Overview
This course discusses supercomputing architectures, developing programs for supercomputers, parallel training and inference of AI models using supercomputers, and various large-scale scientific applications. Python and C/C++ will be used in the programming assignments. A basic understanding of computer architecture, algorithms, and data structures is necessary for this class
Grading Policy
Component | Weight |
---|---|
Midterm Exam | 20% |
Homework (3) | 30% |
Mini Programming Exercises | 10% |
Paper Reading and Presentation (1) | 10% |
Projects (presentation, and report) | 30% |
Class Schedule
Class | Topic | Lecture |
---|---|---|
Module 1: Foundations of Supercomputing (3 lectures, 1 mini exercise, 1 HW) | ||
1 | What is supercomputing? Supercomputer applications and examples of the world’s fastest supercomputers | |
2 | What does a supercomputer look like? Architecture of modern supercomputers | |
3 | How to use a supercomputer? Job submission and monitoring. In-class exercise (15 min, bring laptops) | |
4 | Visit TAMU’s supercomputing center | |
Module 2: Programming for Supercomputers (5 lectures, 4 in-class exercises, 1 HW) | ||
5 | Introduction to distributed programming using MPI: MPI Basics and Point-to-Point Communication (15 min hands-on) | |
6 | MPI Collectives and Performance Optimization: broadcast, reduce, gather, all-to-all (15 min hands-on) | |
7 | Multithreaded programming with OpenMP for multicore processors (15 min hands-on) | |
8 | GPU programming with CUDA: GPU architecture and parallel programming (15 min hands-on) | |
9 | Hybrid programming: MPI + OpenMP + CUDA; scalability plots and Amdahl’s law | |
Module 3: AI at Scale (5 lectures, 3 in-class exercises, 1 HW) – Project proposals due after fall break | ||
10 | Parallelism strategies: data/model parallelism, SGD batching, PyTorch example | |
11 | More parallelism strategies: tensor and pipeline parallelism (30 min hands-on) | |
12 | LLM Training using supercomputers: transformer models, scaling laws; DeepSpeed, Megatron-LM, FSDP | |
13 | Midterm exam (just before fall break) | |
14 | LLM Inference and Efficiency: latency, cost, batching, quantization, sparsity, Mixture-of-Experts | |
15 | Other AI models and software infrastructure for deep learning | |
Module 4: Paper Reading and Presentation (3 classes) | ||
16 | Student paper reading and presentation (based on your project) | |
17 | Student paper reading and presentation (based on your project) | |
18 | Student paper reading and presentation (based on your project) | |
Module 5: Scientific Computing on Supercomputers (3 lectures, 1 in-class exercise) | ||
19 | Matrix Computations and Solvers | |
20 | Irregular and Graph Computations | |
21 | Scientific simulations on supercomputers | |
22 | Project final presentation | |
23 | Project final presentation |