chart-linePerformance

Throughput and latency benchmarks across routing configurations.

This page documents Fynd solver performance benchmarks across different configurations. All benchmarks use the fynd-benchmark scale subcommand, which builds a solver in-process for each worker count, runs a sustained load test, and reports throughput and latency statistics. See Benchmarking for how to run these yourself.

All results below were produced using scripts/bench-remote.sh, which provisions an EC2 instance, builds the solver from source, and runs the full scaling sweep automatically. Pool configuration files used by each benchmark are in tools/benchmark/arrow-up-right. To reproduce the most_liquid results:

WORKER_COUNTS="1,2,3,4,6,8" \
NUM_REQUESTS=10000 \
POOL_CONFIG="tools/benchmark/most_liquid_2hop.toml" \
TYCHO_URL="$TYCHO_URL" \
TYCHO_API_KEY="$TYCHO_API_KEY" \
RPC_URL="$RPC_URL" \
  bash scripts/bench-remote.sh

CPU Scaling: 2-Hop Routing

Measures how throughput scales with worker thread count for 2-hop route finding using the most_liquid algorithm.

Setup

Parameter
Value

Instance

AWS c7a.8xlarge (32 vCPU, AMD EPYC)

Algorithm

most_liquid

Max hops

2

Protocols

uniswap_v2, uniswap_v3, uniswap_v4, sushiswap_v2, pancakeswap_v2, pancakeswap_v3, ekubo_v2, fluid_v1

Requests per iteration

10,000

Concurrency

fixed:48

Warmup

30s after health check

Results

Workers
Throughput (req/s)
Median RT (ms)
P99 RT (ms)
RPS/Worker

1

397.19

120

129

397.19

2

743.16

64

69

371.58

3

1035.84

46

50

345.28

4

1444.04

33

36

361.01

6

2109.26

22

25

351.54

8

2820.08

16

18

352.51

Analysis

Throughput scales nearly linearly across all tested worker counts (~350-397 req/s per worker). The solver crosses 1000 req/s at 3 workers (1036 req/s). Latency stays tight throughout — P99 is only 18ms at 8 workers.

Recommendation. For most_liquid 2-hop routing at 1000 req/s sustained throughput, provision at least 3 CPU cores. Use 4 cores for comfortable headroom.

CPU Scaling: 3-Hop Routing

Measures how throughput scales with worker thread count for 3-hop route finding using the most_liquid algorithm.

Setup

Parameter
Value

Instance

AWS c7a.8xlarge (32 vCPU, AMD EPYC)

Algorithm

most_liquid

Max hops

3

Protocols

uniswap_v2, uniswap_v3, uniswap_v4, sushiswap_v2, pancakeswap_v2, pancakeswap_v3, ekubo_v2, fluid_v1

Requests per iteration

10,000

Concurrency

fixed:48

Warmup

30s after health check

Results

Workers
Throughput (req/s)
Median RT (ms)
P99 RT (ms)
RPS/Worker

1

22.30

2138

2531

22.30

2

39.46

1208

1465

19.73

4

75.95

627

784

18.99

8

146.52

319

440

18.32

12

177.23

260

386

14.77

16

298.22

146

243

18.64

20

352.63

117

226

17.63

24

243.00

163

320

10.13

28

384.13

92

251

13.72

32

366.06

86

272

11.44

Analysis

Throughput is non-monotonic across worker counts. The solver peaks at 384 req/s at 28 workers and does not reach 1000 req/s on this instance. P99 stays bounded (≤440ms), a significant improvement over the pre-lock-PR results where P99 spiked to 794ms at 24 workers.

Recommendation. For most_liquid 3-hop routing with this full protocol set, provision at least 28 CPU cores. The increased computational complexity of 3-hop search means throughput variability is expected.

Comparison: 2-Hop vs 3-Hop

Target RPS
2-Hop Workers
3-Hop Workers
Notes

1,000

3

3-hop peaks at ~384 req/s at 28 workers; 1000 req/s not reached

most_liquid 3-hop does not reach 1000 req/s on a 32-vCPU instance with 8 protocols. The combinatorial growth in the 3-hop search space creates a hard throughput ceiling for this algorithm.

CPU Scaling: Bellman-Ford 2-Hop

Measures how throughput scales with worker thread count for 2-hop route finding using the bellman_ford algorithm.

Setup

Parameter
Value

Instance

AWS c7a.8xlarge (32 vCPU, AMD EPYC)

Algorithm

bellman_ford

Max hops

2

Protocols

uniswap_v2, uniswap_v3, uniswap_v4, sushiswap_v2, pancakeswap_v2, pancakeswap_v3, ekubo_v2, fluid_v1

Requests per iteration

10,000

Concurrency

fixed:48

Warmup

30s after health check

To reproduce:

Results

Workers
Throughput (req/s)
Median RT (ms)
P99 RT (ms)
RPS/Worker

1

85.31

562

586

85.31

2

154.58

310

328

77.29

3

220.12

217

228

73.37

4

290.93

164

173

72.73

6

406.87

117

124

67.81

8

518.54

92

99

64.82

Analysis

Throughput scales near-linearly across all tested worker counts. Per-worker efficiency declines gradually from ~85 req/s at 1 worker to ~65 req/s at 8 workers. The solver does not cross 1000 req/s within the tested 8-worker range; linear extrapolation places that threshold at approximately 16 workers.

Recommendation. For Bellman-Ford 2-hop routing at 1000 req/s sustained throughput, provision at least 16 CPU cores.

CPU Scaling: Bellman-Ford 3-Hop

Measures how throughput scales with worker thread count for 3-hop route finding using the bellman_ford algorithm.

Setup

Parameter
Value

Instance

AWS c7a.8xlarge (32 vCPU, AMD EPYC)

Algorithm

bellman_ford

Max hops

3

Protocols

uniswap_v2, uniswap_v3, uniswap_v4, sushiswap_v2, pancakeswap_v2, pancakeswap_v3, ekubo_v2, fluid_v1

Requests per iteration

10,000

Concurrency

fixed:48

Warmup

30s after health check

To reproduce:

Results

Workers
Throughput (req/s)
Median RT (ms)
P99 RT (ms)
RPS/Worker

1

65.36

735

780

65.36

2

121.68

394

416

60.84

4

233.09

205

221

58.27

8

429.44

111

121

53.68

12

584.86

81

90

48.74

16

760.92

62

71

47.56

20

874.51

54

65

43.73

24

974.94

48

57

40.62

28

1201.92

39

49

42.93

32

1219.96

38

49

38.12

Analysis

Throughput scales near-linearly up to 8 workers (~54 req/s per worker). Beyond that, per-worker efficiency gradually declines — from ~49 req/s at 12 workers to ~38 req/s at 32 workers — as the instance approaches its CPU ceiling. The solver crosses 1000 req/s at 28 workers (1202 req/s). Median latency falls from 735ms at 1 worker to 38ms at 32 workers; P99 stabilises at 49ms from 28 workers onward.

Recommendation. For Bellman-Ford 3-hop routing at 1000 req/s sustained throughput, provision at least 28 CPU cores. Use 32 cores for headroom under variable load.

Comparison: Bellman-Ford 2-Hop vs 3-Hop

Target RPS
2-Hop Workers
3-Hop Workers
Ratio

1,000

~16

28

~1.8x

Bellman-Ford 3-hop requires roughly 1.8× more CPU cores than 2-hop to reach the same throughput target. This is a much smaller penalty than seen with most_liquid (8×), reflecting Bellman-Ford's more uniform search cost growth across hop counts — it already explores the full path space at 2 hops, so adding a third hop grows the search space less dramatically relative to the base cost.

Algorithm Comparison: most_liquid vs bellman_ford

All results in this section use identical hardware, protocol set, and request load.

2-Hop

Workers
most_liquid (req/s)
bellman_ford (req/s)
Ratio

1

397.19

85.31

4.7x

2

743.16

154.58

4.8x

3

1035.84

220.12

4.7x

4

1444.04

290.93

5.0x

6

2109.26

406.87

5.2x

8

2820.08

518.54

5.4x

most_liquid is consistently ~5× faster for 2-hop routing. Its greedy liquidity-ranked search terminates early once the best path is found, while Bellman-Ford explores all paths exhaustively.

3-Hop

Workers
most_liquid (req/s)
bellman_ford (req/s)
Winner

8

146.52

429.44

bellman_ford (2.9x)

16

298.22

760.92

bellman_ford (2.6x)

20

352.63

874.51

bellman_ford (2.5x)

24

243.00

974.94

bellman_ford (4.0x)

28

384.13

1201.92

bellman_ford (3.1x)

32

366.06

1219.96

bellman_ford (3.3x)

Both algorithms use the same 8-protocol set. bellman_ford is consistently 2.5–4× faster at 3 hops. The advantage reflects the increased computational complexity of most_liquid's 3-hop search.

At 3 hops, the results reverse: bellman_ford is 2–2.5× faster than most_liquid and keeps scaling while most_liquid plateaus. The most_liquid greedy search requires pre-computed edge weights (spot price × depth) that are recomputed on every block, creating a periodic pause that limits scaling; Bellman-Ford carries no pre-computed edge state so its per-block update is a no-op.

Last updated