title: Bytewise Online Autoregressive (BOA) Constrictor for HEP data compression layout: gsoc_proposal project: BOA year: 2026 organization: University of Manchester difficulty: medium duration: 350 mentor_avail: June-October project_mentors:

Description

As global demand for data storage and sharing continue to grow, managing such volumes has become increasingly challenging. This issue is particularly acute in high-energy physics (HEP), where vast and complex datasets routinely push the limits of existing compression and storage technologies: each year, experiments at the Large Hadron Collider (LHC) at CERN produce approximately thirty petabytes of data.

Current solutions, such as the ROOT framework combined with algorithms like Lempel–Ziv–Markov chain Algorithm (LZMA) and ZLIB, are currently used to address these challenges. The Bytewise Online Autoregressive (BOA) Constrictor is a streaming lossless compressor built on the Mamba architecture and coupled to a parallelised range coder, aiming to achieve greater gains in storage efficiency through improved lossless compression methods. Currently, this improved compression comes at the expense of lower throughput on current hardware, highlighting the deployment trade-offs for neural compressors in HEP.

In this project, we aim to support the ongoing developments of BOA by expanding the existing end-to-end experiment scripts into a comprehensive benchmark suite, by focusing on small models and physics-informed priors, and by benchmarking alternative backbones to quantify when/why Mamba is competitive for neural compression.

Task ideas

Expected results and milestones

Requirements

AI Policy

AI assistance is allowed for this contribution, but its use will not be welcomed in the candidate selection exercise or for writing the initial proposal. The applicant takes full responsibility for all code and results, disclosing AI use for non-routine tasks (algorithm design, architecture, complex problem-solving). Routine tasks (grammar, formatting, style) do not require disclosure.

How to apply

Please email the mentors with a brief background and interest in green computing and sustainable research. Include “gsoc26” in the subject line. Mentors will provide an evaluation task after submission.

Resources