Matrix Multiplication Explained: From Theory to Python Implementation

Matrix multiplication is one of the most fundamental operations in linear algebra and appears everywhere in data science, machine learning, computer graphics, physics, and engineering. Whether you are building neural networks, solving systems of linear equations, or transforming coordinates in a graphics pipeline — matrix multiplication sits at the center.

In this long-form guide you'll get:

A clear mathematical definition of matrix multiplication.
Intuition about what multiplication means geometrically (as linear transformations).
Practical, ready-to-run Python implementations using NumPy, PyTorch, and TensorFlow.
Step-by-step, line-by-line explanations of each code snippet so beginners can follow exactly what every statement does.
Common pitfalls, performance notes, and helpful debugging tips.

Introduction — Why matrix multiplication matters (SEO friendly)
Quick definition and math rule
Geometric/intuitive view (transformations)
Important properties & common mistakes
Implementation overview: NumPy vs PyTorch vs TensorFlow
Detailed code examples (line-by-line explanations)
- NumPy: matrix × vector and matrix × matrix
- PyTorch: matrix × vector and matrix × matrix
- TensorFlow: matrix × vector and matrix × matrix
- Shape / dtype gotchas and how to fix them
Performance notes (complexity, BLAS, GPU)
Debugging tips and best practices
Conclusion

2. Quick definition and the mathematical rule

If A is an $m \times n$ matrix and B is an $n \times p$ matrix, then the matrix product $C = A \times B$ is defined and will be an $m \times p$ matrix. The element in row i and column j of C is:

C_{ij} = \sum_{k=1}^{n} A_{ik} \cdot B_{kj}

This is the dot product of the i-th row of A and the j-th column of B.

Two important special cases we will use often:

Matrix × Vector: If v is a vector of length n (shape n or (n,1)), and A is m × n, then A @ v yields a vector of length m.
Matrix × Matrix: General case above, yielding shape m × p.

3. Geometric / intuitive view

One helpful way to think about a matrix is as a linear transformation. Multiplying a vector x by a matrix A produces a new vector y = A x — the input x transformed by the linear rule A. When you multiply two matrices A B, you are composing two linear transformations: first apply B, then apply A. This composition is why matrix multiplication is associative (but not commutative): the order of transformations matters.

4. Important properties & common mistakes

Properties:

Associative: $(A B) C = A (B C)$ .
Distributive: $A(B + C) = AB + AC$ .
Not commutative in general: $AB \ne BA$ usually.
Identity: There exists an identity matrix $I$ so that $AI = IA = A$ (when sizes match).
Transpose: $(AB)^T = B^T A^T$ .

Common mistakes:

Trying to multiply matrices with incompatible shapes.
Confusing row vs column vectors (1D arrays vs 2D column arrays).
Mixing dtypes (integers vs floats) that lead to unexpected behavior in libraries like PyTorch/TensorFlow.
Expecting element-wise multiplication when you write * instead of @ or .dot().

5. Implementation overview: NumPy vs PyTorch vs TensorFlow

NumPy: The go-to for CPU-based numerical work and quick prototyping. Use np.dot() or @ for matrix multiplication. Great for learning and small-to-medium data.
PyTorch: Designed for deep learning with GPU acceleration, auto-differentiation, and dynamic computation graphs. Use torch.matmul() or @. Mind dtype and .to(device).
TensorFlow: Also designed for deep learning with static/dynamic graph options, GPU support, and tf.matmul() and tf.linalg.matvec() for matrix-vector. TensorFlow often needs .numpy() to fetch values if using eager execution.

We’ll now walk through code examples for each library and explain every line.

6. Detailed code examples (line-by-line explanations)

All examples use the same mathematical matrices for clarity:
$A = \begin{bmatrix}1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix}\quad(2\times3)$ $v = \begin{bmatrix}7 \\ 8 \\ 9 \end{bmatrix}\quad(3)$ $B = \begin{bmatrix}1 & 2 \\ 3 & 4 \\ 5 & 6 \end{bmatrix}\quad(3\times2)$

These choices make the shapes compatible: $A (2\times3)$ × $v (3)$ → vector of length 2; $A (2\times3)$ × $B (3\times2)$ → $2\times2$ .

We’ll show the outputs and explain the arithmetic step-by-step.

6.1 NumPy — matrix × vector and matrix × matrix

# NumPy example
import numpy as np

# Define A, v, B
A = np.array([[1, 2, 3],
              [4, 5, 6]])           # shape: (2, 3)
v = np.array([7, 8, 9])            # shape: (3,)
B = np.array([[1, 2],
              [3, 4],
              [5, 6]])              # shape: (3, 2)

# Matrix × Vector
result_vector = A @ v               # or np.dot(A, v)
print("NumPy - Matrix × Vector:\n", result_vector)

# Matrix × Matrix
result_matrix = A @ B               # or np.dot(A, B)
print("NumPy - Matrix × Matrix:\n", result_matrix)

Line-by-line explanation (NumPy)

import numpy as np
- Imports NumPy and aliases it as np. Standard convention.
A = np.array([[1, 2, 3], [4, 5, 6]])
- Creates a 2×3 array. A.shape is (2, 3).
v = np.array([7, 8, 9])
- Defines a 1D NumPy array (vector) with shape (3,). NumPy will treat this as a column vector when used with matrix multiplication from the left (A @ v).
B = np.array([[1, 2], [3, 4], [5, 6]])
- Defines a 3×2 array. B.shape is (3, 2).
result_vector = A @ v
- Uses the @ operator to compute matrix × vector. Equivalent to np.dot(A, v). Result shape is (2,).
- Let's compute manually (digit-by-digit) to verify correctness:
  - Row 1 of A is [1, 2, 3]. Dot with v = [7, 8, 9]:
    - $1 \times 7 = 7$
    - $2 \times 8 = 16$
    - $3 \times 9 = 27$
    - Sum: $7 + 16 + 27 = 50$
  - Row 2 of A is [4, 5, 6]. Dot with v:
    - $4 \times 7 = 28$
    - $5 \times 8 = 40$
    - $6 \times 9 = 54$
    - Sum: $28 + 40 + 54 = 122$
  - So result_vector = [50, 122].
print("NumPy - Matrix × Vector:\n", result_vector)
- Displays the computed vector [50 122].
result_matrix = A @ B
- Matrix × matrix multiplication producing shape (2, 2).
- Manual calculation (digit-by-digit):
  - C[0,0] = row1(A)·col1(B) = $1×1 + 2×3 + 3×5 = 1 + 6 + 15 = 22$
  - C[0,1] = row1(A)·col2(B) = $1×2 + 2×4 + 3×6 = 2 + 8 + 18 = 28$
  - C[1,0] = row2(A)·col1(B) = $4×1 + 5×3 + 6×5 = 4 + 15 + 30 = 49$
  - C[1,1] = row2(A)·col2(B) = $4×2 + 5×4 + 6×6 = 8 + 20 + 36 = 64$
  - So result_matrix = [[22, 28], [49, 64]].
print("NumPy - Matrix × Matrix:\n", result_matrix)
- Prints the 2×2 result matrix.

Notes: In NumPy, 1D arrays (shape (n,)) often behave conveniently in dot products, but sometimes you need explicit shapes. If you want a column vector with shape (3,1), use v = v.reshape(3,1).

6.2 PyTorch — matrix × vector and matrix × matrix

# PyTorch example
import torch

# Define A, v, B
A = torch.tensor([[1, 2, 3],
                  [4, 5, 6]], dtype=torch.float32)  # shape: (2,3)
v = torch.tensor([7, 8, 9], dtype=torch.float32)    # shape: (3,)
B = torch.tensor([[1, 2],
                  [3, 4],
                  [5, 6]], dtype=torch.float32)      # shape: (3,2)

# Matrix × Vector
result_vector = torch.matmul(A, v)                    # or A @ v
print("PyTorch - Matrix × Vector:\n", result_vector)

# Matrix × Matrix
result_matrix = torch.matmul(A, B)                    # or A @ B
print("PyTorch - Matrix × Matrix:\n", result_matrix)

Line-by-line explanation (PyTorch)

import torch
- Imports the PyTorch library.
A = torch.tensor([...], dtype=torch.float32)
- Creates a 2×3 tensor with float32 dtype. Using floating types is common for GPU computation and gradients.
v = torch.tensor([7, 8, 9], dtype=torch.float32)
- Creates a 1D tensor of length 3. v.shape is (3,).
B = torch.tensor([...], dtype=torch.float32)
- Creates a 3×2 tensor.
result_vector = torch.matmul(A, v)
- Computes A @ v. The result will be a 1D tensor of shape (2,). The arithmetic is identical to the NumPy manual computation:
  - Row1·v = 17 + 28 + 3*9 = 50
  - Row2·v = 47 + 58 + 6*9 = 122
print("PyTorch - Matrix × Vector:\n", result_vector)
- Prints the tensor (e.g., tensor([ 50., 122.])).
result_matrix = torch.matmul(A, B)
- Matrix × matrix, yields a (2,2) tensor. Same numeric results as NumPy: [[22,28],[49,64]].
print("PyTorch - Matrix × Matrix:\n", result_matrix)
- Displays the result.

PyTorch specifics & tips

If you want GPU acceleration, move tensors with .to('cuda') (if a CUDA GPU is available): A = A.to('cuda') and v = v.to('cuda'). After computation, use .cpu().numpy() to move results to host and convert to NumPy.
For autograd (gradients), set requires_grad=True on tensors you want to differentiate.

6.3 TensorFlow — matrix × vector and matrix × matrix

# TensorFlow example
import tensorflow as tf

# Define A, v, B
A = tf.constant([[1, 2, 3],
                 [4, 5, 6]], dtype=tf.float32)  # shape: (2,3)
v = tf.constant([7, 8, 9], dtype=tf.float32)    # shape: (3,)
B = tf.constant([[1, 2],
                 [3, 4],
                 [5, 6]], dtype=tf.float32)      # shape: (3,2)

# Matrix × Vector
result_vector = tf.linalg.matvec(A, v)            # efficient for matrix-vector
print("TensorFlow - Matrix × Vector:\n", result_vector.numpy())

# Matrix × Matrix
result_matrix = tf.matmul(A, B)
print("TensorFlow - Matrix × Matrix:\n", result_matrix.numpy())

Line-by-line explanation (TensorFlow)

import tensorflow as tf
- Imports TensorFlow.
A = tf.constant([...], dtype=tf.float32)
- Creates a constant tensor with shape (2,3).
v = tf.constant([7, 8, 9], dtype=tf.float32)
- Creates a 1D tensor.
B = tf.constant([...], dtype=tf.float32)
- Creates a (3,2) tensor.
result_vector = tf.linalg.matvec(A, v)
- Uses tf.linalg.matvec (optimized for matrix-vector multiplication). Returns a tensor with shape (2,). Manually:
  - Row1·v = 50
  - Row2·v = 122
print("TensorFlow - Matrix × Vector:\n", result_vector.numpy())
- result_vector.numpy() converts the tensor to a NumPy array and fetches it (works in eager mode which is the default).
result_matrix = tf.matmul(A, B)
- Standard matrix multiplication. Numeric result: [[22,28],[49,64]].
print("TensorFlow - Matrix × Matrix:\n", result_matrix.numpy())
- Prints the 2×2 matrix.

TensorFlow specifics & tips

If running in graph mode (older TF versions), wrap in a tf.function or run inside a session; eager mode prints with .numpy().
To run on GPU, TensorFlow will automatically use available GPUs (no .to('cuda') like PyTorch), but ensure GPU drivers and CUDA are set up.

6.4 Shapes, broadcasting, and column/row vectors — common gotchas with code

Often you’ll see shape errors. Here are examples and fixes.

Example: shape mismatch

# Wrong: trying to multiply (2,3) with (2,) accidentally
A = np.array([[1,2,3],[4,5,6]])   # (2,3)
w = np.array([1,2])               # (2,) -> wrong length

# A @ w  # ShapeError: shapes (2,3) and (2,) not aligned: 3 (dim 1) != 2 (dim 0)

Fix: ensure the inner dimensions match. To multiply A (2×3) you need a vector of length 3: w = np.array([1,2,3]).

Example: explicit column vector vs 1D array

v1 = np.array([7,8,9])        # shape (3,)
v2 = v1.reshape(3,1)         # shape (3,1) a column vector

# A @ v1 -> shape (2,)       # returns 1D array
# A @ v2 -> shape (2,1)      # returns column vector

Use the form you need for downstream code, but be mindful that (2,) and (2,1) behave differently in broadcasting.

7. Performance notes (complexity, BLAS, GPU)

Complexity: Naive matrix multiplication for an $n \times n$ matrix is $O(n^3)$ . Practical libraries do better via optimized BLAS/LAPACK and blocked algorithms.
BLAS: NumPy uses BLAS (like OpenBLAS, MKL) under the hood for dot/@. This gives highly optimized CPU performance.
GPU: PyTorch and TensorFlow can offload operations to GPUs — huge speedups for large matrices (provided data transfer overhead is small compared to computation).
Batching: Both PyTorch and TensorFlow support batched matrix multiplication (e.g., torch.bmm or tf.matmul with 3D tensors), useful for neural networks.
Precision: Use float32 for speed on GPUs; float64 for higher precision on CPU if needed.

8. Debugging tips and best practices

Check shapes early: Print .shape on arrays/tensors before multiplication.
Check dtype: In PyTorch/TensorFlow, dtypes must match (e.g., float32 vs float64). Convert with .float()/.double() in PyTorch or tf.cast(..., tf.float32) in TensorFlow.
Avoid Python lists for heavy math: Convert lists to arrays/tensors for speed.
Use @ or matmul for linear algebra: * is element-wise multiplication; @ is matrix multiplication.
Profile large ops: Use timeit or profiler tools to find bottlenecks.
Minimize CPU↔GPU transfers: Move tensors to GPU once, compute many operations, then bring results back.

9. Conclusion

Matrix multiplication is more than a formula — it is a powerful algebraic tool for composing linear transformations. In practical Python work you’ll usually rely on libraries like NumPy for quick CPU work and PyTorch/TensorFlow when you want GPU acceleration and autograd. This article walked through the mathematical rule, geometric intuition, and full, line-by-line implementations in NumPy, PyTorch, and TensorFlow for both matrix×vector and matrix×matrix cases.

If you’re learning linear algebra or building machine learning models, practice by:

Implementing matrix multiplication by hand for small matrices.
Inspecting shapes and dtypes in your code regularly.
Trying the PyTorch examples on GPU to see speed differences for large matrices.

Mr. AI Engineer

Search This Blog