What we work on

Research

Our work spans three connected frontiers, all asking how data should be managed when AI is the primary consumer.

AI in Data & Model Lakes

We bring data integration and machine learning together — so heterogeneous data and rich model zoos meet in one lake, and serving models becomes a query.

Projects

Model Lake

Amalur

Amalur explores the convergence of data integration and machine learning — automating how scattered training data across silos is integrated for downstream models. It is the foundation of the group’s Model Lake vision, where heterogeneous data and rich model zoos meet in one place.

IEEE TKDE 2024 Data integrationMachine learning

LLM serving

TranSQL / Database-as-Runtime

TranSQL serves large language models with relational queries — compiling model inference to SQL so that LLMs can run inside a database engine, even on low-resource hardware.

SIGMOD 2025 Best demo runner-up LLM servingSQL

Selected publications

Federated & Private Learning

We train models and generate synthetic data across organisational silos, extracting value from data without ever moving the raw records.

Projects

Synthetic data

SiloFuse

SiloFuse generates cross-silo synthetic tabular data using latent diffusion models, so organisations can share realistic data without ever exposing raw, feature-partitioned records.

ICDE 2024 DiffusionPrivacy

Time series

WaveStitch

WaveStitch performs flexible and fast conditional time-series generation with diffusion models, stitching together realistic signals under user-specified constraints.

SIGMOD 2025 DiffusionGenerative

Selected publications

Quantum Data Management

We rethink classic data-management problems — query optimisation, entity matching, anomaly detection — for the NISQ-era quantum processor.

Selected publications