CSCE 654: Supercomputing

Graduate course, Texas A&M University, 1900

Course Information

Fall 2025 Credit Hours: 3
Time: Tuesday/Thursday (TR) 2:20-3:25pm
Location: HRBB 126

Instructor

Instructor: Ariful Azad
Office: PETR 215
Phone: 979-845-5351
E-Mail: ariful@tamu.edu
Office Hours: Tuesday 1pm - 2pm

Course Overview

This course discusses supercomputing architectures, developing programs for supercomputers, parallel training and inference of AI models using supercomputers, and various large-scale scientific applications. Python and C/C++ will be used in the programming assignments. A basic understanding of computer architecture, algorithms, and data structures is necessary for this class

Grading Policy

ComponentWeight
Midterm Exam20%
Homework (3)30%
Mini Programming Exercises10%
Paper Reading and Presentation (1)10%
Projects (presentation, and report)30%

Class Schedule

ClassTopicLecture
 Module 1: Foundations of Supercomputing (3 lectures, 1 mini exercise, 1 HW) 
1What is supercomputing? Supercomputer applications and examples of the world’s fastest supercomputers 
2What does a supercomputer look like? Architecture of modern supercomputers 
3How to use a supercomputer? Job submission and monitoring. In-class exercise (15 min, bring laptops) 
4Visit TAMU’s supercomputing center 
 Module 2: Programming for Supercomputers (5 lectures, 4 in-class exercises, 1 HW) 
5Introduction to distributed programming using MPI: MPI Basics and Point-to-Point Communication (15 min hands-on) 
6MPI Collectives and Performance Optimization: broadcast, reduce, gather, all-to-all (15 min hands-on) 
7Multithreaded programming with OpenMP for multicore processors (15 min hands-on) 
8GPU programming with CUDA: GPU architecture and parallel programming (15 min hands-on) 
9Hybrid programming: MPI + OpenMP + CUDA; scalability plots and Amdahl’s law 
 Module 3: AI at Scale (5 lectures, 3 in-class exercises, 1 HW) – Project proposals due after fall break 
10Parallelism strategies: data/model parallelism, SGD batching, PyTorch example 
11More parallelism strategies: tensor and pipeline parallelism (30 min hands-on) 
12LLM Training using supercomputers: transformer models, scaling laws; DeepSpeed, Megatron-LM, FSDP 
13Midterm exam (just before fall break) 
14LLM Inference and Efficiency: latency, cost, batching, quantization, sparsity, Mixture-of-Experts 
15Other AI models and software infrastructure for deep learning 
 Module 4: Paper Reading and Presentation (3 classes) 
16Student paper reading and presentation (based on your project) 
17Student paper reading and presentation (based on your project) 
18Student paper reading and presentation (based on your project) 
 Module 5: Scientific Computing on Supercomputers (3 lectures, 1 in-class exercise) 
19Matrix Computations and Solvers 
20Irregular and Graph Computations 
21Scientific simulations on supercomputers 
22Project final presentation 
23Project final presentation