CSCE 654: Supercomputing

Graduate course, Texas A&M University, 1900

Course Information

Fall 2025 Credit Hours: 3
Time: Tuesday/Thursday (TR) 2:20-3:25pm
Location: HRBB 126

Instructor

Instructor: Ariful Azad
Office: PETR 215
Phone: 979-845-5351
E-Mail: ariful@tamu.edu
Office Hours: Tuesday 1pm - 2pm

Course Overview

This course discusses supercomputing architectures, developing programs for supercomputers, parallel training and inference of AI models using supercomputers, and various large-scale scientific applications. Python and C/C++ will be used in the programming assignments. A basic understanding of computer architecture, algorithms, and data structures is necessary for this class

Grading Policy

Component	Weight
Midterm Exam	20%
Homework (3)	30%
Mini Programming Exercises	10%
Paper Reading and Presentation (1)	10%
Projects (presentation, and report)	30%

Class Schedule

Class	Topic	Lecture
	Module 1: Foundations of Supercomputing (3 lectures, 1 mini exercise, 1 HW)
1	What is supercomputing? Supercomputer applications and examples of the world’s fastest supercomputers
2	What does a supercomputer look like? Architecture of modern supercomputers
3	How to use a supercomputer? Job submission and monitoring. In-class exercise (15 min, bring laptops)
4	Visit TAMU’s supercomputing center
	Module 2: Programming for Supercomputers (5 lectures, 4 in-class exercises, 1 HW)
5	Introduction to distributed programming using MPI: MPI Basics and Point-to-Point Communication (15 min hands-on)
6	MPI Collectives and Performance Optimization: broadcast, reduce, gather, all-to-all (15 min hands-on)
7	Multithreaded programming with OpenMP for multicore processors (15 min hands-on)
8	GPU programming with CUDA: GPU architecture and parallel programming (15 min hands-on)
9	Hybrid programming: MPI + OpenMP + CUDA; scalability plots and Amdahl’s law
	Module 3: AI at Scale (5 lectures, 3 in-class exercises, 1 HW) – Project proposals due after fall break
10	Parallelism strategies: data/model parallelism, SGD batching, PyTorch example
11	More parallelism strategies: tensor and pipeline parallelism (30 min hands-on)
12	LLM Training using supercomputers: transformer models, scaling laws; DeepSpeed, Megatron-LM, FSDP
13	Midterm exam (just before fall break)
14	LLM Inference and Efficiency: latency, cost, batching, quantization, sparsity, Mixture-of-Experts
15	Other AI models and software infrastructure for deep learning
	Module 4: Paper Reading and Presentation (3 classes)
16	Student paper reading and presentation (based on your project)
17	Student paper reading and presentation (based on your project)
18	Student paper reading and presentation (based on your project)
	Module 5: Scientific Computing on Supercomputers (3 lectures, 1 in-class exercise)
19	Matrix Computations and Solvers
20	Irregular and Graph Computations
21	Scientific simulations on supercomputers
22	Project final presentation
23	Project final presentation

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)