Sharding

Many small chunks, few storage objects. Random access of small chunks with the storage footprint of large ones.

What it does

Sharding packs many chunks of a Zarr array into a single storage object. The shard's internal chunk index lets readers fetch any specific chunk via a byte-range request, so random access stays as fast as having one chunk per object, while the total number of storage objects drops by orders of magnitude.

Why it matters

HPC filesystems: escape per-file inode pressure when an array would otherwise produce hundreds of thousands of chunk files
Cloud object stores: escape per-object overhead (request cost, listing latency) on S3, GCS, Azure
Same access patterns: existing Zarr readers transparently support sharded arrays via byte-range reads

Sharding diagram

How to use it

Run with uv run site/examples/sharding.py — dependencies are pinned in the script's PEP 723 header.

python

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "zarr>=3.1",
#   "numpy",
# ]
# ///
"""Sharding: many small chunks packed into few storage objects."""

import zarr
import numpy as np

# A 1024 x 1024 array with 16x16 inner chunks grouped into 256x256 shards
arr = zarr.create_array(
    store='array.zarr',
    shape=(1024, 1024),
    chunks=(16, 16),       # the unit of access
    shards=(256, 256),     # the unit of storage
    dtype='float32',
    overwrite=True,
)
arr[:] = np.random.random((1024, 1024)).astype('float32')

Sharding ​

What it does ​

Why it matters ​

How to use it ​

Learn more ​

Sharding

What it does

Why it matters

How to use it

Learn more