Downloading Data
A step-by-step guide to fetching, caching, and storing market data for your own use.
This guide walks through every step of getting data out of ADRS and into a format you can work with independently — whether that is a local Parquet file, a database, or a custom in-house format.
Set up a DataLoader
DataLoader is the entry-point for all data fetching. It needs a directory to cache downloads and,
for Datasource data, your API credentials.
import json
import asyncio
from adrs import DataLoader
dataloader = DataLoader(
data_dir="data/raw", # cache goes here
credentials=json.load(open("credentials.json")), # {"cybotrade_api_key": "..."}
)The cache means every topic/range pair is only downloaded once. Subsequent calls return the cached result instantly, so it is safe to re-run scripts without hammering the API.
Choose your data topics
A topic is a string that identifies both the exchange/feed and the query parameters:
binance-spot|candle?symbol=BTCUSDT&interval=1h
bybit-linear|candle?symbol=BTCUSDT&interval=1m
coinbase|candle?symbol=BTCUSD&interval=15m
yfinance|candle?ticker=SPY&interval=1dPick the exchanges and intervals you need. In this guide we will download BTC 1-hour candles from Binance spot and Bybit linear futures.
Download and inspect
from datetime import datetime
async def main():
start_time = datetime.fromisoformat("2023-01-01T00:00:00Z")
end_time = datetime.fromisoformat("2025-01-01T00:00:00Z")
df_binance = await dataloader.load(
topic="binance-spot|candle?symbol=BTCUSDT&interval=1h",
start_time=start_time,
end_time=end_time,
)
print(df_binance.head())
# ┌─────────────────────────┬──────────┬──────────┬──────────┬──────────┬─────────────┐
# │ start_time ┆ open ┆ high ┆ low ┆ close ┆ volume │
# │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
# │ datetime[ms, UTC] ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 │
# ╞═════════════════════════╪══════════╪══════════╪══════════╪══════════╪═════════════╡
# │ 2023-01-01 00:00:00 UTC ┆ 16541.77 ┆ 16611.13 ┆ 16499.01 ┆ 16537.84 ┆ 1043.21423 │
asyncio.run(main())All DataFrames returned by DataLoader use Polars and always include a
start_time column typed as Datetime[ms, UTC] alongside standard OHLCV columns.
Save to disk
Once you have the DataFrame, use Polars' built-in writers to persist it in whatever format suits you.
# Parquet — recommended for large datasets, fast to read back
df_binance.write_parquet("data/btc_binance_1h.parquet")
# CSV — useful for sharing or inspection in spreadsheets
df_binance.write_csv("data/btc_binance_1h.csv")Read it back later without touching the network:
import polars as pl
df = pl.read_parquet("data/btc_binance_1h.parquet")Download multiple symbols at once
Use asyncio.gather to fetch several topics in parallel:
import asyncio
async def main():
start_time = datetime.fromisoformat("2023-01-01T00:00:00Z")
end_time = datetime.fromisoformat("2025-01-01T00:00:00Z")
topics = [
"binance-spot|candle?symbol=BTCUSDT&interval=1h",
"binance-spot|candle?symbol=ETHUSDT&interval=1h",
"bybit-linear|candle?symbol=BTCUSDT&interval=1m",
]
results = await asyncio.gather(*[
dataloader.load(topic=t, start_time=start_time, end_time=end_time)
for t in topics
])
for topic, df in zip(topics, results):
symbol = topic.split("symbol=")[1].split("&")[0]
df.write_parquet(f"data/{symbol}.parquet")
print(f"✓ {topic} → {df.shape[0]:,} rows")
asyncio.run(main())Using a custom handler
If your data does not come from Datasource — say it lives in an internal database, a vendor CSV, or another
REST API — register a custom handler and DataLoader will call it automatically when it sees your topic:
import polars as pl
from datetime import datetime
async def my_db_handler(topic: str, start_time: datetime, end_time: datetime):
# Only handle topics we own; return None to fall through to the next handler
if not topic.startswith("mydb|"):
return None
feed = topic.split("|")[1] # e.g. "funding-rates?symbol=BTC"
# ... fetch from your database ...
return pl.DataFrame({"start_time": [...], "value": [...]})
dataloader = DataLoader(
data_dir="data/raw",
credentials=json.load(open("credentials.json")),
handlers=[my_db_handler],
)
df = await dataloader.load(
topic="mydb|funding-rates?symbol=BTC",
start_time=start_time,
end_time=end_time,
)
df.write_parquet("data/btc_funding.parquet")Using Yahoo Finance data
ADRS ships with a ready-made handler for Yahoo Finance, which is useful for equities, ETFs, and macro indicators:
from adrs.data.handler import yfinance_handler
dataloader = DataLoader(
data_dir="data/raw",
credentials=json.load(open("credentials.json")),
handlers=[yfinance_handler],
)
# Download daily S&P 500 ETF data
df_spy = await dataloader.load(
topic="yfinance|candle?ticker=SPY&interval=1d",
start_time=datetime.fromisoformat("2020-01-01T00:00:00Z"),
end_time=datetime.fromisoformat("2025-01-01T00:00:00Z"),
)
df_spy.write_parquet("data/spy_1d.parquet")Cache layout
After running your downloads, the data/raw directory will contain cached files managed by ADRS:
Re-downloading
Pass override_existing=True to dataloader.load() to force a fresh download even if a cached file already
exists for that topic and time range.
Balaena Quant