TL;DR

In scientific computing and machine learning, we often work with sparse matrices, where most elements are zero. Storing such a matrix as a regular two-dimensional array wastes memory and can slow down computation. CSR (Compressed Sparse Row) is a classic format for solving this problem.

I had known the concept of CSR for a long time, but I never fully understood how it stores and reads data. After playing with an interactive demo, it finally clicked. I hope this helps others understand CSR more intuitively too.

CSR Format in Detail

Meaning of the three arrays

Array Full name Purpose
Values (A) Value Array Stores all non-zero elements in row-major order
Column Indices (JA) Column Index Array Stores the column index of each non-zero element
Row Pointers (IA) Row Pointer Array Stores the position of the first non-zero element of each row in Values

How to read row i?

1
2
3
4
5
# Pseudocode
start = IA[i]
end = IA[i + 1]
row_values = A[start:end]
row_cols = JA[start:end]

Space complexity

Assume the matrix size is $n \times m$, and the number of non-zero elements is $nnz$:

  • Dense storage: $O(n \times m)$
  • CSR storage: $O(nnz + n + 1)$

When $nnz \ll n \times m$, CSR has a clear advantage.

Common operation time complexity

Let $nnz_i$ be the number of non-zero elements in row $i$:

Operation Time complexity
Traverse all non-zero elements $O(nnz)$
Access row $i$ $O(1)$ to locate it, and $O(nnz_i)$ to traverse it
Find element $A_{ij}$ $O(nnz_i)$ if unsorted; $O(\log nnz_i)$ if column indices are sorted
Sparse matrix-vector multiplication $O(nnz)$
Matrix addition $O(nnz_A + nnz_B)$
Transpose CSR $O(nnz + n + m)$
Insert or delete an element $O(nnz)$
Column access $O(nnz)$

Variant formats

Format Feature Use case
CSC (Compressed Sparse Column) Compressed by column Column-first access
COO (Coordinate List) Coordinate list Matrix construction stage
BSR (Block Sparse Row) Block-based compression Regular block structure