Zarr at Scale — Jones et al. EGU 2026

1 Sharding

Many small chunks, few storage objects

Random access of small chunks with the storage footprint of large ones.

Sharding can pack thousands of chunks into a single storage object. Readers fetch only the bytes they need via byte-range requests, so random access stays fast.

HPC filesystems: escape per-file inode pressure

Cloud object stores: skip per-object request overhead

Same access patterns: readers transparently support shards

import zarr

zarr.create_array(
    store="array.zarr",
    shape=(1024, 1024),
    chunks=(16, 16),     # unit of access
    shards=(256, 256),   # unit of storage
    dtype="float32",
)

zarr-at-scale.maxjones.dev/sharding

Traditional Zarr layout vs. sharded layout

2 Virtualization

Read archival files as one Zarr dataset

No conversion, no copies. Point at NetCDF4, HDF5, GRIB, GeoTIFF, and more.

VirtualiZarr scans existing files and builds a virtual manifest. Icechunk commits these manifests transactionally. Read across the whole archive as a single Zarr dataset.

No conversion: expose terabytes of NetCDF/HDF5 without rewriting

Atomic updates: safe under concurrent writes via Icechunk

Format-agnostic: NetCDF4, HDF5, GRIB, GeoTIFF, FITS

Publish once, read everywhere: producers ship a manifest alongside the archive — every downstream user gets cloud-native access

import virtualizarr as vz
import xarray as xr

vds = vz.open_virtual_mfdataset(
    urls, registry=registry,
    parser=vz.parsers.HDFParser(),
    combine="by_coords",
)
vds.virtualize.to_icechunk(session.store)
session.commit("CMIP6 archive")
ds = xr.open_zarr(session.store)

zarr-at-scale.maxjones.dev/virtualization

3 Variable chunk grids

One array, many chunk sizes

Match chunk size to data density. Fine where it matters, coarse where it doesn't.

Just landed in zarr-python. A Zarr array can now have non-uniform chunks along any axis. Faster reads where activity concentrates, less storage overall.

Match density: fine chunks where data is dense, coarse elsewhere

Time archives: fine for recent observations, coarse for history

Storage optimization: atmospheric levels, ocean depths, healpix grids, and other irregular axes

Less waste: no padding sparse regions to fit a uniform grid

import zarr

zarr.config.set({"array.rectilinear_chunks": True})

zarr.create_array(
    store="era5.zarr",
    shape=(745440, 37, 721, 1440),               # (time, level, lat, lon)
    chunks=[24, (8, 5, 4, 5, 15), 144, 360],      # ΔP = 200 hPa
    dtype="float32",
)

zarr-at-scale.maxjones.dev/variable-chunk-grids

Variable chunks along the ERA5 pressure-level axis

4 In-browser rendering

Render multi-terabyte datasets in the browser

Stream Zarr chunks straight to the GPU. No server, no pre-rendered tiles.

Zarrita.js reads Zarr chunks directly from a browser fetch. deck.gl-zarr pushes them onto the GPU. Pan, zoom, recolor a multi-terabyte dataset interactively, on a laptop.

No tile pipeline: skip pre-render, tile cache, tile server

Cloud-native: data lives in Zarr on object storage, read directly

GPU-accelerated: pan, zoom, recolor at interactive rates

Client-side compute: recolor, band math, value readout — no refetch

import * as zarr from "zarrita"
import { ZarrLayer } from "@developmentseed/deck.gl-zarr"

const store = new zarr.FetchStore(ZARR_URL)
const root = await zarr.open.v3(store, { kind: "group" })
const arr  = await zarr.open.v3(
    root.resolve("/embeddings"), { kind: "array" })

new ZarrLayer({ node: arr, metadata: root.attrs,
                selection, getTileData, renderTile })

zarr-at-scale.maxjones.dev/in-browser

Earth-science profile

GeoZarr: open conventions for geospatial Zarr

Composable conventions for CRS, spatial transforms, pyramids, and climate metadata. OGC standardization in progress, target summer 2026.

Browse real GeoZarr datasets: inspect.geozarr.org

proj

CRS

EPSG, WKT2, PROJJSON

spatial

Transforms

Affine, grid registration

multiscales

Pyramids

Resolution levels for tiling

Data model + metadata

NetCDF structure + Climate & Forecast vocabulary

GeoZarr
geozarr.org