Balaena Quant's LogoBalaena Quant
Data

Data

Loading, caching, and managing market data with DataLoader and DataInfo.

Overview

The data layer in ADRS handles fetching, caching, and normalising market data from upstream providers. It is built around two primary abstractions:

  • DataLoader — fetches raw data from a provider (or your own source) and caches results to disk.
  • Datamap — an in-memory store that holds aligned, processed data ready for use in an alpha.

DataInfo

DataInfo describes a single data source that an alpha requires. It specifies which data topic to load, which columns to keep, and how many additional historical bars are needed before the start of the backtest (the lookback).

from adrs.data import DataInfo, DataColumn

DataInfo(
    topic="binance-spot|candle?symbol=BTCUSDT&interval=1h",
    columns=[DataColumn(src="close", dst="close_binance")],
    lookback_size=100,  # extra historical bars needed for warm-up
)
FieldTypeDescription
topicstrData topic string understood by the DataLoader (see topic format)
columnslist[DataColumn]Column mappings from source → destination name
lookback_sizeintNumber of extra historical bars to prepend for indicator warm-up

DataColumn

DataColumn renames a source column to a destination name in the output DataFrame.

DataColumn(src="close", dst="close_binance")
FieldTypeDescription
srcstrColumn name in the raw source data (e.g. "close")
dststrColumn name in the merged DataFrame (must be unique across all DataInfo objects)

Topic format

Topics follow the pattern <provider>|<endpoint>?<params>:

binance-spot|candle?symbol=BTCUSDT&interval=1h
bybit-linear|candle?symbol=BTCUSDT&interval=1m
coinbase|candle?symbol=BTCUSD&interval=1h
yfinance|candle?ticker=SPY&interval=1d
custom|my-endpoint

Supported providers can be found in the datasource docs. For custom providers, they are resolved by the registered handlers in your DataLoader.


DataLoader

DataLoader is the entry-point for fetching data. By default it talks to the Datasource API, but it is designed to be extended with your own handlers for any data source.

import json
from adrs import DataLoader

dataloader = DataLoader(
    data_dir="outdir",                              # local cache directory
    credentials=json.load(open("credentials.json")), # {"cybotrade_api_key": "..."}
)

Constructor

DataLoader(
    data_dir: str,
    credentials: dict[str, str] | None = None,
    format: str | None = None,
    use_cybotrade_datasource: bool | None = None,
    cybotrade_api_url: str | None = None,
    handlers: list[Handler] = [],
)
ParameterTypeDescription
data_dirstrDirectory where downloaded data is cached
credentialsdictAPI credentials, e.g. {"cybotrade_api_key": "..."}
handlerslist[Handler]Custom data handlers (see below)

Loading data

df = await dataloader.load(
    topic="binance-spot|candle?symbol=BTCUSDT&interval=1h",
    start_time=datetime.fromisoformat("2024-01-01T00:00:00Z"),
    end_time=datetime.fromisoformat("2025-01-01T00:00:00Z"),
)

Results are automatically cached to data_dir so subsequent calls with the same topic and time range are instant.


Custom Handlers

A handler is an async function that intercepts a topic and returns a pl.DataFrame (or None to pass through to the next handler). This makes it straightforward to pull data from any source — local files, third-party APIs, databases, etc.

Handler signature

from datetime import datetime
import polars as pl

async def my_handler(
    topic: str,
    start_time: datetime,
    end_time: datetime,
) -> pl.DataFrame | None:
    ...

The returned DataFrame must contain a start_time column with dtype pl.Datetime("ms", time_zone="UTC"). Return None if the handler does not recognise the topic so ADRS falls through to the next handler.

Example — loading from a local file

async def local_handler(topic: str, start_time: datetime, end_time: datetime):
    if topic != "custom|btc-1h":
        return None  # not our topic — pass through

    return pl.read_parquet("data/btc_1h.parquet")

dataloader = DataLoader(
    data_dir="outdir",
    credentials=json.load(open("credentials.json")),
    handlers=[local_handler],
)

df = await dataloader.load(
    topic="custom|btc-1h",
    start_time=start_time,
    end_time=end_time,
)

Built-in handler — yfinance

ADRS ships with a ready-made handler for Yahoo Finance data:

from adrs.data.handler import yfinance_handler

dataloader = DataLoader(
    data_dir="outdir",
    credentials=json.load(open("credentials.json")),
    handlers=[yfinance_handler],
)

df = await dataloader.load(
    topic="yfinance|candle?ticker=SPY&interval=1d",
    start_time=datetime.fromisoformat("2020-01-01T00:00:00Z"),
    end_time=datetime.fromisoformat("2025-01-01T00:00:00Z"),
)

Supported interval values match those accepted by yfinance: 1m, 5m, 15m, 30m, 1h, 1d, 1wk, 1mo.

On this page