D-PDLP: Scaling PDLP to Distributed Multi-GPU Systems

Li, Hongpei; Huang, Yicheng; Liu, Huikang; Ge, Dongdong; Ye, Yinyu

First Distributed PDHG Solver

D-PDLP

Scaling PDLP to Distributed Multi-GPU Systems

Solve massive-scale linear programming problems with billions of variables across multiple GPUs. Achieve near-linear speedup with full FP64 numerical accuracy.

6× Speedup on 8 GPUs

10⁹+ Variables Supported

FP64 Full Precision

Read Paper View on GitHub

Hongpei Li¹ • Yicheng Huang² • Huikang Liu³ • Dongdong Ge³ • Yinyu Ye⁴

¹Cardinal Operations | ²Shanghai University of Finance and Economics | ³Shanghai Jiao Tong University | ⁴Stanford University

Overview

Abstract

We present a distributed framework of the Primal-Dual Hybrid Gradient (PDHG) algorithm for solving massive-scale linear programming (LP) problems. Although PDHG-based solvers demonstrate strong performance on single-node GPU architectures, their applicability to industrial-scale instances is often limited by single-GPU computational throughput.

To overcome these challenges, we propose D-PDLP, the first distributed PDLP framework, which extends PDHG to a multi-GPU setting via a practical two-dimensional grid partitioning of the constraint matrix. We introduce a block-wise random permutation strategy combined with nonzero-aware matrix partitioning to improve load balance and computational efficiency.

Extensive experiments on standard LP benchmarks (MIPLIB and Mittelmann) as well as huge-scale real-world datasets show that our distributed implementation achieves strong scalability and high performance while preserving full FP64 numerical accuracy.

Key Features

What Makes D-PDLP Special

Distributed PDHG

The first distributed PDHG-based solver that scales across multiple GPUs via efficient 2D grid partitioning.

First

2D Grid Partitioning

Practical decomposition of constraint matrix across GPU clusters with minimal communication overhead.

Block-Wise Permutation

Novel random shuffling strategy that balances load while preserving local structure for efficient SpMV.

Open-sourced Software

Both Julia (prototyping) and C/CUDA (production) versions available. C version is open-sourced.

Near-Linear Speedup

Achieve up to 6× speedup on 8 GPUs with strong scalability on computationally intensive problems.

Full FP64 Accuracy

Maintains full double-precision numerical accuracy while delivering massive performance gains.

Architecture

How It Works

2D Grid Partitioning

The constraint matrix is partitioned into blocks distributed across a 2D device mesh. Each device stores local blocks and executes partial SpMV operations before global synchronization.

Smart Load Balancing

Block-wise random permutation balances workload while preserving dense micro-structures, unlike naive approaches that cause imbalance or destroy locality.

Results

Scalability & Performance

Mega-Scale Speedups

Running time comparison on massive instances

zib03 19M × 29M

812s

245s

3.3× faster

mcf_2500 1.5M × 126M

2943s

504s

5.8× faster

sdm_50k 5.5M × 10M

377s

61s

6.2× faster

1 GPU 8 GPUs

Julia vs C Implementation

Instance	4 GPUs		8 GPUs
	Julia	C	Julia	C
zib03 Koch	336s	283s	245s	205s
rand-10m PageRank	5s	3s	4s	2s
mcf_2500 MCF	935s	836s	504s	435s
wil50 QAP	31s	29s	25s	22s
tai50b QAP	52s	43s	43s	35s
ds1 Unit Commitment	143s	42s	122s	31s
ds2 Unit Commitment	691s	397s	591s	288s
SGM10 Average	101.2s	86.1s 1.17×	77.7s	62.2s 1.25×

The C-based solver is open-sourced on GitHub.

Datasets

Benchmark Coverage

MIPLIB 2017 Mittelmann PageRank Multicommodity Flow QAP Relaxations Unit Commitment Design Matching zib03

Ready to Accelerate Your LP Solving?

Get started with D-PDLP today and experience massive speedups on your large-scale optimization problems.

Get Started Read the Paper

Citation

BibTeX

@misc{li2026dpdlpscalingpdlpdistributed,
      title={D-PDLP: Scaling PDLP to Distributed Multi-GPU Systems}, 
      author={Hongpei Li and Yicheng Huang and Huikang Liu and Dongdong Ge and Yinyu Ye},
      year={2026},
      eprint={2601.07628},
      archivePrefix={arXiv},
      primaryClass={math.OC},
      url={https://arxiv.org/abs/2601.07628}, 
}

D-PDLP

Open Source & Ready to Use

Abstract

What Makes D-PDLP Special

Distributed PDHG

2D Grid Partitioning

Block-Wise Permutation

Open-sourced Software

Near-Linear Speedup

Full FP64 Accuracy

How It Works

2D Grid Partitioning

Smart Load Balancing

Scalability & Performance

Mega-Scale Speedups

Julia vs C Implementation

Benchmark Coverage

Ready to Accelerate Your LP Solving?

BibTeX