Sharding
Many small chunks, few storage objects. Random access of small chunks with the storage footprint of large ones.
What it does
Sharding packs many chunks of a Zarr array into a single storage object. The shard's internal chunk index lets readers fetch any specific chunk via a byte-range request, so random access stays as fast as having one chunk per object, while the total number of storage objects drops by orders of magnitude.
Why it matters
- HPC filesystems: escape per-file inode pressure when an array would otherwise produce hundreds of thousands of chunk files
- Cloud object stores: escape per-object overhead (request cost, listing latency) on S3, GCS, Azure
- Same access patterns: existing Zarr readers transparently support sharded arrays via byte-range reads
How to use it
Run with uv run site/examples/sharding.py — dependencies are pinned in the script's PEP 723 header.
python
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "zarr>=3.1",
# "numpy",
# ]
# ///
"""Sharding: many small chunks packed into few storage objects."""
import zarr
import numpy as np
# A 1024 x 1024 array with 16x16 inner chunks grouped into 256x256 shards
arr = zarr.create_array(
store='array.zarr',
shape=(1024, 1024),
chunks=(16, 16), # the unit of access
shards=(256, 256), # the unit of storage
dtype='float32',
overwrite=True,
)
arr[:] = np.random.random((1024, 1024)).astype('float32')