Strassen's Matrix Multiplication

CLO-1 · Asymptotic Analysis

Naive: T(n) = 8T(n/2) + Θ(n²) → O(n³) | Strassen: T(n) = 7T(n/2) + Θ(n²) → O(n^log₂7) ≈ O(n^2.807)

Crossover point: Strassen overtakes naive at n ≈ ~32–64 depending on constants.

n	Naive n³	Strassen n^2.807	Ratio (Naive/Strassen)	Ops Saved

Drag to explore n: 32

Worst-Case Demo

For Strassen, worst-case and average-case are both O(n^2.807) — unlike sorting algorithms, there is no "lucky" input. Every 2×2 multiplication always uses exactly 7 multiplications and 18 add/subtracts.

⚠ Common Mistake: Thinking Strassen is always faster. For small n (n<32), naive is faster due to constant overhead. Strassen is only asymptotically superior.

CLO-2 · Element Access Patterns

Watch how each method accesses matrix elements. Strassen re-uses intermediate sums, dramatically reducing the number of multiplications needed.

Naive (8 multiplications)

Strassen (7 multiplications)

Click Animate to watch element access patterns side by side.

Impact of Removing Optimization

If we naively use 8 multiplications instead of 7: each recursion level multiplies the gap. At depth d, naive does 8ᵈ multiplications vs Strassen's 7ᵈ. The ratio 8ᵈ/7ᵈ = (8/7)ᵈ grows exponentially with recursion depth.

CLO-3 · Divide & Conquer Paradigm

Paradigm: Divide & Conquer
1. Divide: Split each n×n matrix into four n/2 × n/2 submatrices.
2. Conquer: Compute 7 products P₁–P₇ (instead of 8) using clever linear combinations.
3. Combine: Add/subtract P₁–P₇ to form the four quadrants of result C.

Recursion Tree

Recursion depth to show: 2

Why 7 and Not 8? The Key Insight

Strassen found linear combinations of the submatrices that allow computing C using 7 multiplications + 18 additions, instead of 8 multiplications + 4 additions (naive 2×2).

Counterexample: Why Greedy Fails Here

A greedy approach might try to pick the "cheapest" element to multiply first. But matrix multiplication has strict order constraints (non-commutative in general). A greedy choice can miss the global structure that Strassen exploits — showing why D&C is the right paradigm here.

🔴 Greedy: AB ≠ BA for matrices. A greedy heuristic cannot discover the 7-product trick — it requires global algebraic insight, not local decisions.

📖 Algorithm Overview

What it does: Strassen's algorithm multiplies two n×n matrices faster than the naive O(n³) approach by reducing the number of recursive multiplications from 8 to 7.

Why it matters: In 1969, Volker Strassen proved that the obvious O(n³) bound is not tight. His algorithm runs in O(n^log₂7) ≈ O(n^2.807), which is a meaningful improvement for large matrices used in graphics, ML, and scientific computing.

When to use: Theoretically better for n > ~32, but in practice cache effects mean standard BLAS libraries often outperform Strassen for moderate n. It shines in distributed computing for very large matrices.

🧮 Fully Worked Example (Step-by-Step)

💡 Key Concepts

Strassen's 7 Products for 2×2:
P₁ = (A₁₁+A₂₂)(B₁₁+B₂₂) P₂ = (A₂₁+A₂₂)B₁₁
P₃ = A₁₁(B₁₂−B₂₂) P₄ = A₂₂(B₂₁−B₁₁)
P₅ = (A₁₁+A₁₂)B₂₂ P₆ = (A₂₁−A₁₁)(B₁₁+B₁₂)
P₇ = (A₁₂−A₂₂)(B₂₁+B₂₂)

Result Construction:
C₁₁ = P₁+P₄−P₅+P₇ C₁₂ = P₃+P₅
C₂₁ = P₂+P₄ C₂₂ = P₁−P₂+P₃+P₆

⚠ Common Mistakes:
• Confusing P₄ = A₂₂(B₂₁−B₁₁) with (A₂₁−A₁₁)B₂₂ (that's P₆)
• Forgetting the subtraction in C₁₁ = P₁+P₄−P₅+P₇
• Thinking the algorithm works for odd-size matrices without padding

📐 Space Complexity Analysis

Time: O(n^log₂7) ≈ O(n^2.807) — from the Master Theorem: a=7, b=2, f(n)=Θ(n²), log₂7 ≈ 2.807 > 2, so Case 1 applies.

Space: O(n²) for the 7 intermediate product matrices P₁–P₇ at each recursion level, plus O(log n) stack frames → O(n² log n) total auxiliary space.

Comparison: Naive uses O(n²) space (just the output matrix). Strassen uses more memory — a trade-off for fewer multiplications.

Matrix Input (2×2 only for step-by-step)

Controls

Operation Counters

Pseudocode

Step Explanation

Step History (click to jump)