Skip to content

Scale & datasets

The detector and scene layers are accurate; phase 1.6 makes them fast and bulk. This guide covers the four scale features: the float32 fast path, vectorised multi-source rendering, the dataset raw+truth generator, and the getframes command-line interface. Everything here is additive — the exact float64 path is unchanged and remains the default.

The float32 fast path

Pass precision="float32" when you build a Camera to run the whole signal chain — and each frame's ground truth — in single precision. That halves the memory of the per-pixel arrays, which matters for large detectors and when you are generating thousands of frames:

import getframes as gf

cam = gf.Camera.from_preset("zwo_asi2600mm", precision="float32")
frame = cam.expose(photon_rate=200.0, exposure=30.0, seed=0)

frame.dtype                       # uint32 — the digitised ADU stay exact integers
frame.truth.mean_electrons.dtype  # float32 — the floating-point truth is light

Only the floating-point arrays change; the digitised ADU are integer counts either way. The result matches the float64 path to single-precision tolerance. If you call the scene or noise layers directly, the same control is a dtype / float_dtype argument:

rate_map = scene.photon_rate_map(dtype="float32")     # f32 photons/s/pixel map

Vectorised catalog rendering

A Catalog of many stars no longer loops in Python. GaussianPSF deposits the whole catalog in one batched, memory-chunked NumPy expression — pixel-for-pixel identical to the per-source path, just far faster for crowded fields:

import numpy as np
import getframes as gf

rng = np.random.default_rng(0)
n = 100_000
table = {"x": rng.uniform(0, 2048, n), "y": rng.uniform(0, 2048, n),
         "mag": rng.uniform(16, 23, n)}

scene = gf.Scene(
    shape=(2048, 2048),
    optics=gf.Telescope(4.0, 0.2, throughput=0.4, band=gf.Bandpass.ab("g")),
    psf=gf.GaussianPSF(fwhm_arcsec=0.7),
    sources=[gf.Catalog.from_table(table, magnitude="mag", x="x", y="y")],
)
rate = scene.photon_rate_map()    # 10^5 stars, no Python per-star loop

Other PSFs fall back to a per-source loop automatically (via PSF.add_sources), so this is purely a speed-up where it applies.

Generating raw + truth datasets

getframes.dataset streams paired data — a realistic raw frame and the noise-free electrons it was drawn from — straight to disk, the input an ML pipeline (denoising, deconvolution, calibration) wants. Feed it any iterable of scenes; random_star_fields is a re-iterable source of random fields:

import getframes as gf

cam = gf.Camera.from_preset("zwo_asi2600mm", precision="float32")
scenes = gf.dataset.random_star_fields(n=10_000, shape=cam.resolution, seed=0)

ds = gf.dataset.pairs(camera=cam, scenes=scenes, exposure=60.0,
                      dtype="float32", seed=1)

paths = ds.to_npz("train/")   # one {raw, truth} .npz per frame, streamed

Each frame draws a distinct derived seed, so the set is reproducible yet the frames are independent. Iterating yields {"raw": ADU, "truth": electrons} dicts directly, and ds.to_arrays() stacks a small set into (N, H, W) arrays.

The command line

An experiment can be a shareable TOML file. The getframes command (installed with the package) has three subcommands:

getframes presets                              # list the built-in cameras
getframes generate frame.toml -o dark.fits     # one frame (or a short series)
getframes dataset data.toml -o train/          # stream raw+truth pairs

A generate config names a preset (or an inline camera) and a frame spec:

[camera]
preset = "andor_ikon_m934"
default_temperature_c = -60.0
precision = "float32"

[frame]
type = "dark"        # dark | bias | flat | light
exposure_s = 30.0
seed = 0
n_frames = 1

A dataset config drives bulk pair generation; the detector is sized to the requested shape:

[camera]
preset = "zwo_asi2600mm"
precision = "float32"

[dataset]
n = 1000
shape = [512, 512]
exposure_s = 60.0
mag_range = [16, 22]
seed = 0

Benchmarks

benchmarks/run.py is a small, dependency-light harness that times the signal chain, catalog rendering, and dataset generation so throughput regressions show up. It is not part of the test gate (timings are machine-dependent); run it by hand:

python benchmarks/run.py            # default sizes
python benchmarks/run.py --quick    # smaller and faster