Systems & projects

Projects

The systems we build to test our ideas — each links out to its code, paper, and data.

Model Lake

Amalur

Amalur explores the convergence of data integration and machine learning — automating how scattered training data across silos is integrated for downstream models. It is the foundation of the group’s Model Lake vision, where heterogeneous data and rich model zoos meet in one place.

IEEE TKDE 2024 Data integrationMachine learning

Synthetic data

SiloFuse

SiloFuse generates cross-silo synthetic tabular data using latent diffusion models, so organisations can share realistic data without ever exposing raw, feature-partitioned records.

ICDE 2024 DiffusionPrivacy

Time series

WaveStitch

WaveStitch performs flexible and fast conditional time-series generation with diffusion models, stitching together realistic signals under user-specified constraints.

SIGMOD 2025 DiffusionGenerative

LLM serving

TranSQL / Database-as-Runtime

TranSQL serves large language models with relational queries — compiling model inference to SQL so that LLMs can run inside a database engine, even on low-resource hardware.

SIGMOD 2025 Best demo runner-up LLM servingSQL