Skip to content

Sharding

Many small chunks, few storage objects. Random access of small chunks with the storage footprint of large ones.

What it does

Sharding packs many chunks of a Zarr array into a single storage object. The shard's internal chunk index lets readers fetch any specific chunk via a byte-range request, so random access stays as fast as having one chunk per object, while the total number of storage objects drops by orders of magnitude.

Why it matters

  • HPC filesystems: escape per-file inode pressure when an array would otherwise produce hundreds of thousands of chunk files
  • Cloud object stores: escape per-object overhead (request cost, listing latency) on S3, GCS, Azure
  • Same access patterns: existing Zarr readers transparently support sharded arrays via byte-range reads

Sharding diagram

How to use it

Run with uv run site/examples/sharding.py — dependencies are pinned in the script's PEP 723 header.

python
# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "zarr>=3.1",
#   "numpy",
# ]
# ///
"""Sharding: many small chunks packed into few storage objects."""

import zarr
import numpy as np

# A 1024 x 1024 array with 16x16 inner chunks grouped into 256x256 shards
arr = zarr.create_array(
    store='array.zarr',
    shape=(1024, 1024),
    chunks=(16, 16),       # the unit of access
    shards=(256, 256),     # the unit of storage
    dtype='float32',
    overwrite=True,
)
arr[:] = np.random.random((1024, 1024)).astype('float32')

Learn more

EGU 2026 · ESSI2.2 · EGU26-15196