← All Lessons
Week 13|Cryptography

Cryptography: The Math of Secrets

Crack codes, build ciphers in Python, and learn how math keeps secrets safe.

Materials for this lesson

  • Laptop (charged)
  • Printed cipher reference sheet
  • Pencil and paper

Warm-Up: Crack the Code

🔥 Warm-Up

The following message has been encrypted with a Caesar cipher (each letter has been shifted by the same number of positions in the alphabet). Can you decode it by hand?

GUVF VF GUR JNEZ HC ZRFFNTR. LBH PENPXRQ VG!

Hint: The most common letter in English is E. What is the most common letter in the encrypted message?


Core Lesson: The Science of Secrecy

Why Cryptography Matters

Every time you send a text message, buy something online, or log into a website, your data is protected by cryptography -- the science of encoding information so that only the intended recipient can read it. Without it, anyone who intercepted your data could read your passwords, steal your credit card numbers, or impersonate you.

Cryptography is one of the oldest applications of mathematics. It has decided wars, toppled governments, and today protects trillions of dollars of digital commerce.

A Brief History of Codes

| Era | Cipher | How it works | |-----|--------|-------------| | ~50 BC | Caesar cipher | Shift each letter by a fixed number | | 1400s | Substitution cipher | Replace each letter with a different letter | | 1500s | Vigenere cipher | Multiple Caesar shifts using a keyword | | 1900s | Enigma machine | Electromechanical device with billions of settings | | 1970s | DES / AES | Computer-based symmetric encryption | | 1970s | RSA (public key) | Based on the difficulty of factoring large primes |

The Caesar Cipher

The Caesar cipher is named after Julius Caesar, who used it to communicate with his generals. The idea is simple: shift every letter in the alphabet by a fixed number of positions.

With a shift of 3:

  • A becomes D
  • B becomes E
  • C becomes F
  • ...
  • X becomes A
  • Y becomes B
  • Z becomes C

So "HELLO" with shift 3 becomes "KHOOR".

💡 Key Concept

Modular arithmetic is the math behind the Caesar cipher. When you reach the end of the alphabet, you wrap around to the beginning. Mathematically: encrypted_position = (original_position + shift) mod 26. The "mod" operation gives the remainder after division. So position 27 mod 26 = 1 (wraps back to the start). You already use modular arithmetic every day -- clocks! 3 hours after 11 o'clock is 2 o'clock, because (11 + 3) mod 12 = 2.

Why the Caesar Cipher Is Easy to Break

There are only 26 possible shifts (0 through 25). An attacker can simply try all 26 and see which one produces readable English. This is called a brute force attack. Any cipher where the attacker can try every key is inherently weak.

How many possible keys does a Caesar cipher have?

The Substitution Cipher

A substitution cipher is more sophisticated. Instead of shifting all letters by the same amount, each letter is replaced by a different, randomly chosen letter. For example:

Plain:  A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Cipher: Q W E R T Y U I O P A S D F G H J K L Z X C V B N M

Now "HELLO" becomes "ITSSG".

The number of possible keys is 26! (26 factorial) = 403,291,461,126,605,635,584,000,000 -- about 4 x 10^26. Brute force is hopeless. So how do you crack it?

Frequency Analysis: Breaking Substitution Ciphers

The key insight, discovered by Arab mathematician Al-Kindi around 850 AD, is that letters in any language appear with predictable frequencies. In English:

| Letter | Frequency | |--------|-----------| | E | 12.7% | | T | 9.1% | | A | 8.2% | | O | 7.5% | | I | 7.0% | | N | 6.7% | | S | 6.3% | | H | 6.1% |

If the most common letter in the ciphertext is "X", then X probably represents E. The second most common probably represents T. You can use these frequency patterns, plus common words and letter combinations (TH, HE, IN, ER, AN), to crack the whole cipher.

The Mathematics of Cryptography -- Numberphile

Frequency analysis works against substitution ciphers because:

Modern Cryptography: Public and Private Keys

Modern encryption uses a brilliant idea: you do not need to keep the encryption method secret -- you only need to keep the key secret.

💡 Key Concept

Public-key cryptography (invented in the 1970s) uses two keys:

  • A public key that anyone can see -- used to encrypt messages to you.
  • A private key that only you know -- used to decrypt messages sent to you.

It is like a mailbox: anyone can drop a letter through the slot (public key), but only you have the key to open the box and read it (private key). The mathematical foundation relies on trapdoor functions -- operations that are easy to do in one direction but extremely hard to reverse. The most famous example: multiplying two large prime numbers is easy, but factoring the result back into those two primes is incredibly hard when the numbers have hundreds of digits.

When you see the padlock icon in your browser's address bar, your connection is secured by this exact type of cryptography (TLS/SSL), which uses both public-key encryption and symmetric encryption working together.

In public-key cryptography, which key do you share with everyone?


Hands-On Lab: Building Ciphers in Python

Program 1: Caesar Cipher -- Encode and Decode

# Caesar Cipher -- Encoder and Decoder

def caesar_encrypt(plaintext, shift):
    """Encrypt a message using a Caesar cipher."""
    result = ""
    for char in plaintext.upper():
        if char.isalpha():
            # Convert letter to number (A=0, B=1, ..., Z=25)
            num = ord(char) - ord('A')
            # Shift and wrap around using modular arithmetic
            shifted = (num + shift) % 26
            # Convert back to letter
            result += chr(shifted + ord('A'))
        else:
            # Keep spaces, punctuation, etc. unchanged
            result += char
    return result

def caesar_decrypt(ciphertext, shift):
    """Decrypt by shifting in the opposite direction."""
    return caesar_encrypt(ciphertext, -shift)

# Try it out!
message = "MEET ME AT THE LIBRARY AT NOON"
shift = 7

encrypted = caesar_encrypt(message, shift)
decrypted = caesar_decrypt(encrypted, shift)

print(f"Original:  {message}")
print(f"Shift:     {shift}")
print(f"Encrypted: {encrypted}")
print(f"Decrypted: {decrypted}")
print(f"Match:     {message == decrypted}")
Tip

Test it with your own messages! Change the message and shift variables. Try encrypting a message and giving only the ciphertext to a friend -- can they crack it without knowing the shift?

Program 2: Brute-Force Caesar Cracker

This program tries all 26 possible shifts and displays the results. You just scan the output and pick the one that makes sense.

# Caesar Cipher -- Brute Force Cracker
# Try all 26 possible shifts

def caesar_decrypt(ciphertext, shift):
    result = ""
    for char in ciphertext.upper():
        if char.isalpha():
            num = ord(char) - ord('A')
            shifted = (num - shift) % 26
            result += chr(shifted + ord('A'))
        else:
            result += char
    return result

# The encrypted message to crack
ciphertext = "LIPPS ASVPH"

print("Brute-force cracking all 26 shifts:")
print("=" * 45)

for shift in range(26):
    decrypted = caesar_decrypt(ciphertext, shift)
    print(f"  Shift {shift:2d}: {decrypted}")

print("\nLook for the one that reads as English!")

'LIPPS ASVPH' decrypted with a shift of 4 gives you:

Program 3: Frequency Analysis Tool

# Frequency Analysis Tool
# Counts letter frequencies in a text and compares to English

from collections import Counter

# English letter frequencies (approximate percentages)
ENGLISH_FREQ = {
    'E': 12.7, 'T': 9.1, 'A': 8.2, 'O': 7.5, 'I': 7.0,
    'N': 6.7, 'S': 6.3, 'H': 6.1, 'R': 6.0, 'D': 4.3,
    'L': 4.0, 'C': 2.8, 'U': 2.8, 'M': 2.4, 'W': 2.4,
    'F': 2.2, 'G': 2.0, 'Y': 2.0, 'P': 1.9, 'B': 1.5,
    'V': 1.0, 'K': 0.8, 'J': 0.15, 'X': 0.15, 'Q': 0.10,
    'Z': 0.07
}

def analyze_frequency(text):
    """Count letter frequencies in a text."""
    # Count only letters
    letters = [c.upper() for c in text if c.isalpha()]
    total = len(letters)

    if total == 0:
        print("No letters found!")
        return

    counts = Counter(letters)

    print(f"Total letters: {total}")
    print(f"\n{'Letter':<8} {'Count':<8} {'Frequency':<12} {'English Avg':<12} {'Bar'}")
    print("-" * 65)

    for letter, count in counts.most_common():
        freq = count / total * 100
        eng_freq = ENGLISH_FREQ.get(letter, 0)
        bar = "#" * int(freq)
        print(f"  {letter:<6} {count:<8} {freq:>5.1f}%       {eng_freq:>5.1f}%       {bar}")

# Analyze a sample ciphertext (this is ROT13)
ciphertext = """
GUVF VF N YBAT GRFG ZRFFNTR GUNG JR JVYY HFR GB CENPGVPR
SERDHRAPL NANYLFVF. GUR ZBER GRKG LBH UNIR GUR ORGGRE GUR
NANYLFVF JBEXF. RNPU YRGGRE FUBJF HC JVGU VGF RKCRPGRQ
SERDHRAPL JUVPU URYCF HF SVTHER BHG GUR FHOFGVGHGVBA.
"""

print("Ciphertext frequency analysis:")
print("=" * 65)
analyze_frequency(ciphertext)

Program 4: Substitution Cipher Encoder

# Substitution Cipher -- Encoder/Decoder
import random
import string

def generate_random_key():
    """Create a random substitution key."""
    letters = list(string.ascii_uppercase)
    shuffled = letters.copy()
    random.shuffle(shuffled)
    key = dict(zip(letters, shuffled))
    return key

def invert_key(key):
    """Create the decryption key (reverse mapping)."""
    return {v: k for k, v in key.items()}

def substitute(text, key):
    """Apply a substitution cipher."""
    result = ""
    for char in text.upper():
        if char in key:
            result += key[char]
        else:
            result += char
    return result

# Generate a random key
key = generate_random_key()

# Display the key
print("Substitution Key:")
print("Plain:  ", " ".join(string.ascii_uppercase))
print("Cipher: ", " ".join(key[c] for c in string.ascii_uppercase))
print()

# Encrypt a message
message = "THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG"
encrypted = substitute(message, key)
decrypted = substitute(encrypted, invert_key(key))

print(f"Original:  {message}")
print(f"Encrypted: {encrypted}")
print(f"Decrypted: {decrypted}")
print(f"Match:     {message == decrypted}")
Tip

Notice that "THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG" contains every letter of the alphabet. This type of sentence is called a pangram, and it is perfect for testing ciphers because it shows you the full substitution mapping.


Challenge: Crack This Substitution Cipher

🏆 Challenge

The following message was encrypted with a simple Caesar shift (not a full substitution). But we will not tell you the shift! Use your frequency analysis tool and your brute-force cracker to decode it. Then, for the real challenge, try the harder substitution cipher below.

Part 1: Caesar Challenge

Crack this message:

YMJ GWFNS NX F YJWWNKNH YMNSL. NY HFS WJFXTS, NY HFS NRFLNSJ, NY HFS
HTRUTXJ UTJYWD, NY HFS IT RFYMJRFYNHX, NY HFS JAJS IJHWDUY XJHWJY
RJXXFLJX. YMJWJ NX ST QNRNY YT BMFY NY HFS FHMNJAJ.

Part 2: Substitution Challenge

Now try cracking a full substitution cipher. Use the frequency analysis tool, common letter patterns, and deductive reasoning.

RDS KWZR XSGWRCTWY RDCEO CE RDS BZKYX CQ EZR
RDS AZFX ZK RDSKS. CR CQ RDS AZFX ZK PZFHYRSK
QACSEACQRQ BDZ FWE HSKG XSSJ QSFKSRQ.

Strategy:

  1. Run frequency analysis. What is the most common letter?
  2. Look for single-letter words (probably "A" or "I").
  3. Look for common three-letter words (THE, AND, FOR).
  4. Use your deductions to fill in more letters, one at a time.

Bonus: Auto-Cracking with Chi-Squared

For students who want to go further, here is a program that automatically determines the most likely Caesar shift by comparing letter frequencies mathematically using the chi-squared statistic:

# Automatic Caesar Cipher Cracker using Chi-Squared Test
from collections import Counter

# Expected English letter frequencies
ENGLISH_FREQ = [
    0.082, 0.015, 0.028, 0.043, 0.127, 0.022, 0.020,  # A-G
    0.061, 0.070, 0.002, 0.008, 0.040, 0.024,          # H-M
    0.067, 0.075, 0.019, 0.001, 0.060, 0.063,          # N-S
    0.091, 0.028, 0.010, 0.024, 0.002, 0.020, 0.001   # T-Z
]

def chi_squared(observed_freq, expected_freq):
    """Calculate chi-squared statistic between two frequency distributions."""
    score = 0
    for obs, exp in zip(observed_freq, expected_freq):
        if exp > 0:
            score += (obs - exp) ** 2 / exp
    return score

def auto_crack_caesar(ciphertext):
    """Try all 26 shifts and return the most likely one."""
    letters = [c.upper() for c in ciphertext if c.isalpha()]
    total = len(letters)

    if total == 0:
        return 0, ""

    best_shift = 0
    best_score = float('inf')

    for shift in range(26):
        # Decrypt with this shift
        decrypted = []
        for c in letters:
            num = (ord(c) - ord('A') - shift) % 26
            decrypted.append(chr(num + ord('A')))

        # Count letter frequencies in decrypted text
        counts = Counter(decrypted)
        observed = [counts.get(chr(i + ord('A')), 0) / total for i in range(26)]

        # Compare to English frequencies
        score = chi_squared(observed, ENGLISH_FREQ)

        if score < best_score:
            best_score = score
            best_shift = shift

    # Decrypt with best shift
    result = ""
    for char in ciphertext.upper():
        if char.isalpha():
            num = (ord(char) - ord('A') - best_shift) % 26
            result += chr(num + ord('A'))
        else:
            result += char

    return best_shift, result

# Test it!
test_messages = [
    "LIPPS ASVPH",
    "GUVF VF N FRPERG ZRFFNTR",
    "YMJ GWFNS NX F YJWWNKNH YMNSL",
]

for cipher in test_messages:
    shift, plaintext = auto_crack_caesar(cipher)
    print(f"Ciphertext: {cipher}")
    print(f"Best shift: {shift}")
    print(f"Plaintext:  {plaintext}")
    print()
💡 Key Concept

The chi-squared test measures how well an observed frequency distribution matches an expected one. A lower score means a better match. By decrypting with each possible shift and comparing the resulting letter frequencies to standard English frequencies, the computer can automatically pick the shift that produces the most English-like text. This is the same idea behind Al-Kindi's frequency analysis, but automated and made precise with statistics.


Resources