Information Theory (Summer 2023)

Programs: BSc {CS, ITS, AI, Math, Physics}, MSc {ITS, AI, Math, Physics} (TBC)
Lecturer: Michael Walter
Teaching assistants: TBD
Time and Place: Mon 14-16 (Lecture, room TBD), exercises TBD
First meeting: Mon Apr 3
Credits: 5 CP
Contact time: 2+2 SWS
Language: English
Further Information and Registration: Moodle, VVZ

Interested in this Course?

  1. We invite you to explore the 2021 course homepage to get an impression of the course and teaching material.
  2. If you plan on participating in this course, please kindly sign up on Moodle so that we have an overview of how many people will show up.
  3. Please contact us if you have any questions about this course, scheduling conflicts, etc.

Course description and syllabus

This course will give an introduction to information theory – the mathematical theory of information. Ever since its inception, information theory has had a profound impact on society. It underpins important technological developments, from reliable memories to mobile phone standards, and its versatile mathematical toolbox has found use in computer science, machine learning, physics, electrical engineering, mathematics, and many other disciplines.

Starting from probability theory, we will discuss how to mathematically model information sources and communication channels, how to optimally compress information, and how to design error-correcting codes that allow us to reliably communicate over noisy communication channels. We will also see how techniques used in information theory can be applied more generally to make predictions from noisy data.

Tentative syllabus:

  • Welcome, Introduction to Information Theory
  • Probability Theory Refresher
  • Numerical Random Variables, Convexity and Concavity, Entropy
  • Symbol Codes: Lossless Compression, Huffman Algorithm
  • Block Codes: Shannon’s Source Coding Theorem, its Proof, and Variations
  • Stream Codes: Lempel-Ziv Algorithm
  • Stream Codes: Arithmetic Coding
  • Joint Entropies & Communication over Noisy Channels
  • Shannon’s Noisy Coding Theorem
  • Proof of the Noisy Coding Theorem
  • Proof of the Converse, Shannon’s Theory vs Practice
  • Reed-Solomon Codes
  • Message Passing for Decoding and Inference, Outlook
  • Student Presentations

The course should be of interest to students of computer science, mathematics, physics, electrical engineering, and related disciplines. Students interested in a BSc project in this area are particularly encouraged to participate.

Familiarity with discrete probability (we will briefly remind you of the most important facts). Some experience with precise mathematical statements and rigorous proofs (since we’ll see many of those in the course). In addition, part of the homework will require programming in Python.

Lecture notes, video recordings, literature

Lecture notes 📖 and video recordings 🎞️ of the lectures will be provided.

In addition, the following optional references can be useful for supplementary reading:

  • David J. C. MacKay’s beautiful textbook “Information Theory, Inference, and Learning Algorithms”, Cambridge (2003). It is available online 📖 and contains much more material than what we will be able to discuss in class. This book served as the original inspiration for this course and we will point out which lectures correspond to which chapters.
  • Thomas M. Cover and Joy A. Thomas’s classical textbook “Elements of Information Theory”, Wiley (2006).

Homework and practice

There will be regular homework problem sets, which will consist of theory problems and a small programming project. See here for an example from two years ago. In addition, we will offer many practice problems so that you can test your understanding as the course progresses. Both homework and practice problems will be discussed in the exercises.

Finally, instead of having a midterm exam, we would like you to read about a topic in information theory and give a short presentation to your peers. We will schedule these in one of the last weeks of the term. We will give you many suggestions for topics (on Moodle) but you are free to pick your own topic (but please confirm it with us).

Learning objectives

At the end of the course, you will be able to:

  • mathematically model memoryless and Markov information sources using probability theory
  • mathematically model memoryless information channels using probability theory
  • compute basic properties of probability distributions, such as entropies and essential bit contents, using mathematical techniques such as Jensen’s inequality
  • define typical sets and estimate their size and probability
  • prove and apply Shannon’s source coding theorem
  • prove and apply the Kraft inequality for uniquely decodable (UD) codes, its converse for prefix codes, and use it to bound the average length for optimal UD or prefix codes
  • design optimal symbol codes using Huffman’s coding algorithm
  • analyze and apply the Lempel-Ziv algorithm for stream coding
  • analyze and apply arithmetic coding for different probabilistic models
  • compute basic properties of joint probability distributions, such as conditional entropies and mutual informations
  • understand basic properties of block codes, such as their rate and various probabilities of error
  • prove and apply Shannon’s noisy channel coding theorem
  • encode Reed-Solomon codes and decode them for erasure errors
  • understand maximum likelihood approach to codeword and bitwise decoding
  • design and apply message passing algorithms for encoding and inference problems
  • independently read about a topic in information theory and present it to your peers

We will check off these objectives as we move along in the course.


Lectures, exercises classes, self-study, and a short presentation.


The final grade will be determined as a function of your exam grade, homework grade, and presentation grade. More detail will follow soon.