Improve parallelism by solving most difficult deals first by tameware · Pull Request #216 · dds-bridge/dds

tameware · 2026-06-29T02:40:21Z

$ time ./benchmark.sh --branch opus-two-percent --branch opus-improve-parallelism --repeats 5 --max-deals 10000
Building dtest from 'opus-two-percent'...
Building dtest from 'opus-improve-parallelism'...
Restoring 'benchmark'...
DDS dtest benchmark
===================
branch:      branch 'opus-two-percent' (/var/folders/12/xtx6dlwd0mdcxkspvmsszsrc0000gn/T//dds-dtest-branch.Cf0Cly)
compare:     branch 'opus-improve-parallelism' (/var/folders/12/xtx6dlwd0mdcxkspvmsszsrc0000gn/T//dds-dtest-compare.gwQNLJ)
details:     off (summary only)
run order:   interleaved branch, compare
epsilon:     0.5%
hands dir:   /Users/adamw/src/dds/hands
max_deals:   10000
files:       list10000.txt list1000.txt list100.txt list10.txt list1.txt
git branch:  benchmark
repeats:     5


Summary (avg user ms)
==============================================================================
solver file          opus-improve opus-two-per cmp/branch note
------ ------------- ------------ ------------ ---------- ---------------
solve  list10000.txt         2.67         2.63      1.01x opus-two-percent faster
solve  list1000.txt          2.12         2.09      1.01x opus-two-percent faster
solve  list100.txt           2.06         2.05      1.00x equal
solve  list10.txt            4.20         4.30      0.98x opus-improve-parallelism faster
solve  list1.txt            12.00        12.00      1.00x equal
calc   list10000.txt         9.68        10.25      0.94x opus-improve-parallelism faster
calc   list1000.txt          9.63        10.64      0.91x opus-improve-parallelism faster
calc   list100.txt           8.22         8.70      0.95x opus-improve-parallelism faster
calc   list10.txt           14.44        14.38      1.00x equal
calc   list1.txt            38.20        38.00      1.01x opus-two-percent faster
------ ------------- ------------ ------------ ---------- ---------------
TOTAL  elapsed (s)         684.60       716.16      0.96x opus-improve-parallelism faster

Completed 100 runs (100 expected).

The heuristic extraction refactor changed weight_alloc_trump_void1's first branch from `lead_suit == trump` to `suit == trump`. Since that is exhaustive with the following `else if (suit != trump)`, the three ruffing branches (using the `24 - rank + ...` formula) became dead code, and trump ruffs were scored with side-suit discard weights instead. This mis-ordered ruffs, costing alpha-beta cutoffs. The effect is small for solve but compounds heavily in calc's warm-TT iterative deepening: calc explored ~34% more nodes than v2.9. Restoring the original `lead_suit == trump` pitch branch makes the ruffing branches reachable again and cuts calc time ~25% (gap to v2.9: 1.37x -> 1.02x). Ordering-only change; double-dummy results are unchanged. Co-authored-by: Cursor <cursoragent@cursor.com>

Per Copilot. Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

The heuristic/quick-tricks refactor introduced static_cast<unsigned char> wrappers on values that v2.9 used as signed, changing search behavior: - make_3 / make_3_ctx: winner[]/second_best[] .hand and .rank were cast to unsigned char, turning the -1 "no card" sentinel into 255. This broke winner[trump].hand == -1 style checks in QuickTricks, losing cutoffs. - weight_alloc_trump_void2 / _void3: rel_rank[aggr[suit]][...] indexed through static_cast<unsigned char>(aggr[suit]), truncating the 13-bit aggregate holding to 8 bits and reading the wrong rel_rank row. - QuickTricksPartnerHand{Trump,NT}: bit_map_rank index cast the signed rank through unsigned char. With these reverted to v2.9's signed handling, the per-move-generation ordering trace now matches v2.9 exactly (0 divergences on list1), closing the residual calc gap to parity. Ordering/pruning-only change; double-dummy results are unchanged and all library tests pass. Co-authored-by: Cursor <cursoragent@cursor.com>

The parallel board loop handed boards out in index order via an atomic counter, so a hard board picked near the end left one worker running long while the others sat idle. Hand out the hardest boards first (longest- processing-time-first) so the tail consists of cheap boards. parallel_all_boards_n gains an optional dispatch-order permutation: workers still pull from the same atomic counter, but the slot is mapped through the order before becoming a board number, so only the dispatch sequence changes and result placement is unaffected. The solve path passes no order and is unchanged. calc estimates per-deal difficulty with a cheap, trump-independent structural proxy (deal_fanout, mirroring Scheduler::Fanout) and sorts board indices by descending difficulty before dispatch. calc list1000 -n18: ~11.0s -> ~9.6s wall (~13%), user CPU unchanged. Co-authored-by: Cursor <cursoragent@cursor.com>

CalcDDtableN builds one board per strain for a single deal. deal_fanout is trump-independent, so all boards share one fanout and the difficulty sort is a pure no-op there. Gate the sort behind a difficulty_sort flag (default on for batch CalcAllTablesN) and disable it for the single-deal path. Co-authored-by: Cursor <cursoragent@cursor.com>

Copilot

Pull request overview

This PR aims to improve overall throughput for batch calculations by reducing “tail latency” in parallel workloads: it estimates deal difficulty cheaply, sorts boards hardest-first, and adds an optional dispatch-order mechanism to the parallel board runner.

Changes:

Extend parallel_all_boards_n() to optionally dispatch boards in a caller-provided order.
Add a cheap per-deal “fanout” estimate and use it to stable-sort batch calc boards hardest-first before parallel execution.
Simplify/remove several legacy static_cast<unsigned char>(...) conversions in solver/heuristic code paths.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
library/src/system/parallel_boards.hpp	Adds optional `order` parameter to control dispatch order (hardest-first, etc.).
library/src/system/parallel_boards.cpp	Implements ordered dispatch via slot→board mapping.
library/src/calc_tables.cpp	Computes per-deal difficulty estimate and dispatches hardest boards first for batch calc.
library/src/heuristic_sorting/heuristic_sorting.cpp	Cleans up heuristic code (including rel-rank indexing casts) and adjusts some void/trump logic.
library/src/quick_tricks.cpp	Removes redundant casts when indexing with `abs_rank[..].rank`.
library/src/ab_search.cpp	Removes redundant casts when copying `abs_rank` winner/second-best into `Pos`.

+  // Map a dispatch slot to the board number to process. With an order, hand out
+  // boards in that sequence (e.g. hardest first); otherwise in index order.
+  const bool use_order =
+    (order != nullptr && static_cast<int>(order->size()) == count);
+  auto board_of = [&](const int slot) -> int {
+    return use_order ? (*order)[static_cast<unsigned>(slot)] : slot;
+  };


Only honor the optional dispatch order when it is a valid permutation of [0, count: each element in range and unique. A malformed order (duplicates or out-of-range values) now falls back to index order, preventing invalid board indices from reaching process_board. EOF ) Co-authored-by: Cursor <cursoragent@cursor.com>

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

+  // Map a dispatch slot to the board number to process. With an order, hand out
+  // boards in that sequence (e.g. hardest first); otherwise in index order.
+  const bool use_order =
+    (order != nullptr && static_cast<int>(order->size()) == count);
+  auto board_of = [&](const int slot) -> int {
+    return use_order ? (*order)[static_cast<unsigned>(slot)] : slot;
+  };


Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.

zzcgumn · 2026-07-03T08:15:07Z

+static auto is_permutation_of_range(
+  const std::vector<int>& order,
+  const int count) -> bool
+{


We have std::set_intersection and std::set_difference available in . You can use std::ranges::views::iota to generate a range from [0..count].

Standard library functions are at least expected to be faster and reduce the amount of code we need to write.

Nice find! Fixed.

Or not…

Copilot reports that the change is less efficient:

is_permutation_of_range() currently re-sorts the order vector and builds an intermediate common vector via set_intersection. This is O(n log n) plus extra allocations and will run on every parallel_all_boards_n() invocation that supplies an order (e.g. batch calc), adding overhead in a performance-sensitive path. You can validate a permutation of [0, count) in O(n) without sorting by tracking seen indices and checking bounds/duplicates.

Thoughts, @zzcgumn?

I've had another look. I agree with Copilot - I prefer the original implementation:

static auto is_permutation_of_range( const std::vector<int>& order, const int count) -> bool { std::vector<char> seen(static_cast<unsigned>(count), 0); for (const int v : order) { if (v < 0 || v >= count || seen[static_cast<unsigned>(v)]) return false; seen[static_cast<unsigned>(v)] = 1; } return true; }

seen could equally well be std::vector<unsigned char>

Use std::ranges::set_intersection and std::views::iota instead of a hand-rolled seen bitmap in is_permutation_of_range. Co-authored-by: Cursor <cursoragent@cursor.com>

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

+static auto is_permutation_of_range(
+  const std::vector<int>& order,
+  const int count) -> bool
+{
+  std::vector<int> sorted(order);
+  std::ranges::sort(sorted);
+
+  const auto expected = std::views::iota(0, count);
+
+  std::vector<int> common;
+  common.reserve(static_cast<unsigned>(count));
+  std::ranges::set_intersection(
+    sorted, expected, std::back_inserter(common));
+
+  return static_cast<int>(common.size()) == count;
+}


Emscripten's libc++ lacks C++20 ranges; use iterator-based sort, iota, and set_intersection instead. Co-authored-by: Cursor <cursoragent@cursor.com>

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

+ * @param order Optional dispatch order: a permutation of [0, count) giving the
+ *        sequence in which board numbers are handed out (e.g. hardest first to
+ *        shorten the tail). When null/empty, boards are dispatched in index
+ *        order. Only the dispatch order changes; @p process_board still receives
+ *        the real board number, so result placement is unaffected.
 * @return First non-success code from @p process_board, or RETURN_NO_FAULT.


Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Scheduler: :Fanout is private; move the fanout logic to a shared deal_fanout() helper so calc_tables can use it without accessing the scheduler instance. Co-authored-by: Cursor <cursoragent@cursor.com>

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

+static auto is_permutation_of_range(
+  const std::vector<int>& order,
+  const int count) -> bool
+{
+  std::vector<int> sorted(order);
+  std::sort(sorted.begin(), sorted.end());
+
+  std::vector<int> expected(static_cast<unsigned>(count));
+  std::iota(expected.begin(), expected.end(), 0);
+
+  std::vector<int> common;
+  common.reserve(static_cast<unsigned>(count));
+  std::set_intersection(
+    sorted.begin(), sorted.end(),
+    expected.begin(), expected.end(),
+    std::back_inserter(common));
+
+  return static_cast<int>(common.size()) == count;
+}


tameware and others added 5 commits June 27, 2026 18:51

Fix incorrect comment

8ecebc8

Per Copilot. Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

tameware requested a review from Copilot June 29, 2026 02:40

Copilot started reviewing on behalf of tameware June 29, 2026 02:40 View session

Copilot AI reviewed Jun 29, 2026

View reviewed changes

tameware self-assigned this Jun 29, 2026

tameware requested a review from Copilot June 29, 2026 03:30

Copilot started reviewing on behalf of tameware June 29, 2026 03:30 View session

Copilot AI reviewed Jun 29, 2026

View reviewed changes

tameware requested a review from Copilot June 29, 2026 04:02

Copilot started reviewing on behalf of tameware June 29, 2026 04:02 View session

Copilot AI reviewed Jun 29, 2026

View reviewed changes

tameware requested a review from zzcgumn June 29, 2026 12:56

tameware marked this pull request as ready for review June 29, 2026 12:56

tameware mentioned this pull request Jul 2, 2026

Merge DDSS into DDS V3 #139

Open

BSalita mentioned this pull request Jul 2, 2026

solution: add missing parallel_boards.cpp/.hpp to VS projects #222

Merged

zzcgumn reviewed Jul 3, 2026

View reviewed changes

Replace manual permutation check with standard algorithms.

f733867

Use std::ranges::set_intersection and std::views::iota instead of a hand-rolled seen bitmap in is_permutation_of_range. Co-authored-by: Cursor <cursoragent@cursor.com>

tameware requested a review from Copilot July 3, 2026 12:58

Copilot started reviewing on behalf of tameware July 3, 2026 12:58 View session

Copilot AI reviewed Jul 3, 2026

View reviewed changes

Fix WASM build: avoid std::ranges in permutation check.

fa2c84c

Emscripten's libc++ lacks C++20 ranges; use iterator-based sort, iota, and set_intersection instead. Co-authored-by: Cursor <cursoragent@cursor.com>

tameware marked this pull request as draft July 3, 2026 13:54

tameware requested a review from Copilot July 3, 2026 13:54

Copilot started reviewing on behalf of tameware July 3, 2026 13:55 View session

Copilot AI reviewed Jul 3, 2026

View reviewed changes

tameware and others added 2 commits July 3, 2026 15:04

Avoid reimplementing deal_fanout()

07a33cd

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Extract deal_fanout as a public free function.

5e34528

Scheduler: :Fanout is private; move the fanout logic to a shared deal_fanout() helper so calc_tables can use it without accessing the scheduler instance. Co-authored-by: Cursor <cursoragent@cursor.com>

tameware requested a review from Copilot July 3, 2026 14:13

Copilot started reviewing on behalf of tameware July 3, 2026 14:13 View session

Copilot AI reviewed Jul 3, 2026

View reviewed changes

Copilot AI mentioned this pull request Jul 3, 2026

Restore scheduler declaration in calc_tables #225

Closed

Fix --define=scheduler=true builds.

6ead0ed

tameware marked this pull request as ready for review July 3, 2026 14:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve parallelism by solving most difficult deals first#216

Improve parallelism by solving most difficult deals first#216
tameware wants to merge 11 commits into
dds-bridge:developfrom
tameware:opus-improve-parallelism

tameware commented Jun 29, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

zzcgumn Jul 3, 2026

Uh oh!

tameware Jul 3, 2026

Uh oh!

tameware Jul 3, 2026

Uh oh!

tameware Jul 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

tameware commented Jun 29, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

zzcgumn Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

tameware Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

tameware Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

tameware Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants