Skip to content

Commit cdf70ba

Browse files
authored
Refactor candidates interface (#840)
Preparation to fix #795, #796 and #798. ### Background As part of the broader effort toward lazy, on-demand evaluation of candidate sets (see linked issues), this PR removes the dependency on the eagerly pre-computed, fully materialized `exp_rep` and `comp_rep` public attributes of `SubspaceDiscrete`. The old design forced the full candidate space to be computed and cached upfront, even when only a subset is needed — blocking future subsampling policies and backend-agnostic mechanisms required for handling large spaces. The key step is giving `get_candidates` a clean, single-purpose method signature as the sole access point for the experimental representation, so that transformation into the computational representation happens only when explicitly requested and only on the relevant data (the subset selection will be enabled later). ### Out of scope The return type of `get_candidates` is expected to be elevated to a higher-level object in a follow-up (e.g. `TableCandidates` or a similar abstraction). This PR deliberately keeps the return type as `pd.DataFrame` to stay focused on the interface decoupling. ### Changes **Make `exp_rep` private** (`baybe/searchspace/discrete.py`) Renames the field to `_exp_rep` (with `alias="exp_rep"` for serialization compatibility), updates the validator reference, and replaces all internal accesses. **Simplify `get_candidates` signature** (`baybe/searchspace/discrete.py`) Returns only the experimental representation (`pd.DataFrame`) instead of a `tuple[pd.DataFrame, pd.DataFrame]`, avoiding wasteful upfront computation of the computational representation. **Update all internal call sites** (`baybe/campaign.py`, `baybe/recommenders/`, `baybe/simulation/`, `baybe/searchspace/core.py`, `baybe/acquisition/`) Replaces tuple-unpacking calls to `get_candidates` and direct `exp_rep`/`comp_rep` accesses with the new API throughout. Computational representation is now computed on-demand, at the point of use. **Update examples and tests** (`examples/`, `tests/`) Adapts all affected examples and tests to the new `get_candidates` return type. **Deprecate `exp_rep` and `comp_rep` properties** (`baybe/searchspace/discrete.py`) Adds `DeprecationWarning` shims for `exp_rep` (pointing to `get_candidates()`) and `comp_rep` (pointing to `transform(get_candidates())`), with corresponding deprecation tests.
2 parents 7ad1cec + fdbe7ba commit cdf70ba

28 files changed

Lines changed: 250 additions & 270 deletions

CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1515
- `candidates_exp` argument removed from `SubspaceDiscrete.subset_masks`,
1616
`SubspaceDiscrete.sample_subset_masks`, `SearchSpace.subsets`, and
1717
`SearchSpace.sample_subsets`
18+
- `SubspaceDiscrete.get_candidates` now returns only the experimental representation
19+
instead of a tuple of experimental and computational representations
1820

1921
### Added
2022
- `narwhals` as a hard dependency
@@ -50,6 +52,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
5052
access batch-level constraints; filtering constraints are only needed during subspace
5153
construction and are thus no longer stored).
5254
- `SubspaceDiscrete.constraints_batch` property (use `batch_constraints` instead)
55+
- `SubspaceDiscrete.exp_rep` attribute (use `get_candidates()` instead)
56+
- `SubspaceDiscrete.comp_rep` attribute (use `transform(get_candidates())` instead)
5357

5458
## [0.15.0] - 2026-06-11
5559
### Breaking Changes

baybe/acquisition/acqfs.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,9 @@ def get_integration_points(self, searchspace: SearchSpace) -> pd.DataFrame:
104104

105105
# Discrete part
106106
if not searchspace.discrete.is_empty:
107-
candidates_discrete = searchspace.discrete.comp_rep
107+
candidates_discrete = searchspace.discrete.transform(
108+
searchspace.discrete.get_candidates()
109+
)
108110
n_candidates = self.sampling_n_points or math.ceil(
109111
self.sampling_fraction * len(candidates_discrete) # type: ignore[operator]
110112
)

baybe/campaign.py

Lines changed: 19 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -206,39 +206,7 @@ def _default_recommended_experiments(self) -> pd.DataFrame:
206206

207207
@override
208208
def __str__(self) -> str:
209-
recommended_count = len(self._recommended_experiments)
210-
exp_rep = self.searchspace.discrete.exp_rep
211-
if self._measurements.empty or exp_rep.empty:
212-
measured_count = 0
213-
else:
214-
measured_count = len(
215-
fuzzy_row_match(exp_rep, self._measurements, self.parameters)
216-
)
217-
excluded_count = len(self._excluded_experiments)
218-
n_elements = len(exp_rep)
219-
searchspace_fields = [
220-
to_string(
221-
"Recommended:",
222-
f"{recommended_count}/{n_elements}",
223-
single_line=True,
224-
),
225-
to_string(
226-
"Measured:",
227-
f"{measured_count}/{n_elements}",
228-
single_line=True,
229-
),
230-
to_string(
231-
"Excluded:",
232-
f"{excluded_count}/{n_elements}",
233-
single_line=True,
234-
),
235-
]
236-
metadata_fields = [
237-
to_string("Discrete Subspace Meta Data", *searchspace_fields),
238-
]
239-
metadata = to_string("Meta Data", *metadata_fields)
240-
fields = [metadata, self.searchspace, self.objective, self.recommender]
241-
209+
fields = [self.searchspace, self.objective, self.recommender]
242210
return to_string(self.__class__.__name__, *fields)
243211

244212
@property
@@ -460,7 +428,7 @@ def toggle_discrete_candidates( # noqa: DOC501
460428
# * Additional shortcuts might be possible.
461429
self.clear_cache()
462430

463-
df = self.searchspace.discrete.exp_rep
431+
df = self.searchspace.discrete.get_candidates()
464432

465433
if isinstance(constraints, pd.DataFrame):
466434
# Determine the candidate subset to be toggled
@@ -561,12 +529,12 @@ def recommend(
561529
if self.searchspace.type is SearchSpaceType.DISCRETE:
562530
# TODO: This implementation should at some point be hidden behind an
563531
# appropriate public interface, like `SubspaceDiscrete.filter()`
564-
exp_rep = self.searchspace.discrete.exp_rep
565-
mask_todrop = pd.Series(False, index=exp_rep.index)
532+
candidates = self.searchspace.discrete.get_candidates()
533+
mask_todrop = pd.Series(False, index=candidates.index)
566534
if not self._excluded_experiments.empty:
567535
mask_todrop |= (
568536
pd.merge(
569-
exp_rep,
537+
candidates,
570538
self._excluded_experiments,
571539
indicator=True,
572540
how="left",
@@ -580,7 +548,7 @@ def recommend(
580548
):
581549
mask_todrop |= (
582550
pd.merge(
583-
exp_rep,
551+
candidates,
584552
self._recommended_experiments,
585553
indicator=True,
586554
how="left",
@@ -593,7 +561,7 @@ def recommend(
593561
and not self._measurements.empty
594562
):
595563
measured_idxs = fuzzy_row_match(
596-
exp_rep, self._measurements, self.parameters
564+
candidates, self._measurements, self.parameters
597565
)
598566
mask_todrop.loc[measured_idxs] = True
599567
if (
@@ -602,7 +570,7 @@ def recommend(
602570
):
603571
mask_todrop |= (
604572
pd.merge(
605-
exp_rep,
573+
candidates,
606574
pending_experiments,
607575
indicator=True,
608576
how="left",
@@ -613,7 +581,7 @@ def recommend(
613581
searchspace = evolve(
614582
self.searchspace,
615583
discrete=evolve(
616-
self.searchspace.discrete, exp_rep=exp_rep.loc[~mask_todrop]
584+
self.searchspace.discrete, exp_rep=candidates.loc[~mask_todrop]
617585
),
618586
)
619587
else:
@@ -1100,13 +1068,16 @@ def _structure_campaign(d: dict, cl: type) -> Campaign:
11001068

11011069
# >>>>>>>>>> Deprecation
11021070
# Post-structure reconstruction from legacy metadata indices
1103-
if legacy_recommended_idxs is not None:
1104-
rec_df = campaign.searchspace.discrete.exp_rep.loc[legacy_recommended_idxs]
1105-
campaign._recommended_experiments = rec_df.reset_index(drop=True)
1106-
1107-
if legacy_excluded_idxs is not None:
1108-
excl_df = campaign.searchspace.discrete.exp_rep.loc[legacy_excluded_idxs]
1109-
campaign._excluded_experiments = excl_df.reset_index(drop=True)
1071+
if legacy_recommended_idxs is not None or legacy_excluded_idxs is not None:
1072+
candidates = campaign.searchspace.discrete.get_candidates()
1073+
if legacy_recommended_idxs is not None:
1074+
campaign._recommended_experiments = candidates.loc[
1075+
legacy_recommended_idxs
1076+
].reset_index(drop=True)
1077+
if legacy_excluded_idxs is not None:
1078+
campaign._excluded_experiments = candidates.loc[
1079+
legacy_excluded_idxs
1080+
].reset_index(drop=True)
11101081

11111082
# Fix schema of empty DataFrames from legacy serialization
11121083
if campaign._measurements.columns.empty:

baybe/recommenders/naive.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -117,7 +117,7 @@ def recommend(
117117

118118
# Get one random discrete point that will be attached when evaluating the
119119
# acquisition function in the discrete space.
120-
disc_part = searchspace.discrete.comp_rep.loc[disc_rec.index].sample(1)
120+
disc_part = searchspace.discrete.transform(disc_rec).sample(1)
121121
disc_part_tensor = to_tensor(disc_part).unsqueeze(-2)
122122

123123
# Setup a fresh acquisition function for the continuous recommender

baybe/recommenders/pure/base.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -271,7 +271,9 @@ def _recommend_with_discrete_parts(
271271

272272
# Check if enough candidates are left
273273
# TODO [15917]: This check is not perfectly correct.
274-
if (not is_hybrid_space) and (len(searchspace.discrete.exp_rep) < batch_size):
274+
if (not is_hybrid_space) and (
275+
len(searchspace.discrete.get_candidates()) < batch_size
276+
):
275277
raise NotEnoughPointsLeftError(
276278
f"Using the current settings, there are fewer than {batch_size} "
277279
f"possible data points left to recommend."

baybe/recommenders/pure/bayesian/botorch/discrete.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@ def recommend_discrete_with_subsets(
4343
"""
4444
import torch
4545

46+
candidates = subspace_discrete.get_candidates()
4647
masks: Iterable[npt.NDArray[np.bool_]]
4748
if subspace_discrete.n_subsets <= recommender.max_n_subsets:
4849
masks = subspace_discrete.subset_masks(min_candidates=batch_size)
@@ -56,9 +57,7 @@ def make_callable(
5657
mask: np.ndarray,
5758
) -> Callable[[], tuple[pd.DataFrame, Tensor]]:
5859
def optimize() -> tuple[pd.DataFrame, Tensor]:
59-
subset_subspace = evolve(
60-
subspace_discrete, exp_rep=subspace_discrete.exp_rep.loc[mask]
61-
)
60+
subset_subspace = evolve(subspace_discrete, exp_rep=candidates.loc[mask])
6261

6362
rec = recommend_discrete_without_subsets(
6463
recommender, subset_subspace, batch_size
@@ -118,7 +117,8 @@ def recommend_discrete_without_subsets(
118117
from botorch.optim import optimize_acqf_discrete
119118

120119
# Determine the next set of points to be tested
121-
candidates_comp = subspace_discrete.comp_rep
120+
candidates = subspace_discrete.get_candidates()
121+
candidates_comp = subspace_discrete.transform(candidates)
122122
points, _ = optimize_acqf_discrete(
123123
recommender._botorch_acqf, batch_size, to_tensor(candidates_comp)
124124
)
@@ -137,4 +137,4 @@ def recommend_discrete_without_subsets(
137137
)["index"]
138138
)
139139

140-
return subspace_discrete.exp_rep.loc[idxs]
140+
return candidates.loc[idxs]

baybe/recommenders/pure/bayesian/botorch/hybrid.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -81,8 +81,8 @@ def recommend_hybrid_without_subsets(
8181
from botorch.optim import optimize_acqf_mixed
8282

8383
# Transform discrete candidates
84-
# (Create a shallow copy to avoid in-place modifications of the original dataframe)
85-
candidates_comp = searchspace.discrete.comp_rep.copy(deep=False)
84+
candidates = searchspace.discrete.get_candidates()
85+
candidates_comp = searchspace.discrete.transform(candidates)
8686

8787
# Calculate the number of samples from the given percentage
8888
n_candidates = math.ceil(
@@ -144,7 +144,7 @@ def recommend_hybrid_without_subsets(
144144
).set_index("index")
145145

146146
# Get experimental representation of discrete part
147-
rec_disc_exp = searchspace.discrete.exp_rep.loc[merged.index]
147+
rec_disc_exp = candidates.loc[merged.index]
148148

149149
# Combine discrete and continuous parts
150150
rec_exp = pd.concat(
@@ -186,6 +186,7 @@ def recommend_hybrid_with_subsets(
186186
# NOTE: No min_discrete_candidates filtering in hybrid spaces because
187187
# optimize_acqf_mixed can produce multiple recommendations from a single
188188
# discrete candidate by varying continuous parameters.
189+
candidates = searchspace.discrete.get_candidates()
189190
combined_masks: Iterable[tuple[np.ndarray, frozenset[str]]]
190191
if searchspace.n_subsets <= recommender.max_n_subsets:
191192
combined_masks = searchspace.subsets()
@@ -201,7 +202,7 @@ def optimize() -> tuple[pd.DataFrame, Tensor]:
201202

202203
mod_disc = evolve(
203204
searchspace.discrete,
204-
exp_rep=searchspace.discrete.exp_rep.loc[d_mask],
205+
exp_rep=candidates.loc[d_mask],
205206
)
206207
mod_cont = (
207208
subspace_c._enforce_cardinality_constraints(c_inactive_params)

baybe/recommenders/pure/nonpredictive/clustering.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -107,13 +107,13 @@ def _recommend_discrete(
107107
from sklearn.preprocessing import StandardScaler
108108

109109
# TODO [Scaling]: scaling should be handled by search space object
110+
candidates = subspace_discrete.get_candidates()
111+
candidates_comp = subspace_discrete.transform(candidates)
110112
scaler = StandardScaler()
111-
scaler.fit(subspace_discrete.comp_rep)
113+
scaler.fit(candidates_comp)
112114

113115
# Scale candidates
114-
candidates_scaled = np.ascontiguousarray(
115-
scaler.transform(subspace_discrete.comp_rep)
116-
)
116+
candidates_scaled = np.ascontiguousarray(scaler.transform(candidates_comp))
117117

118118
# Set model parameters and perform fit
119119
model = self._get_model_cls()(
@@ -129,7 +129,7 @@ def _recommend_discrete(
129129
selection = self._make_selection_default(model, candidates_scaled)
130130

131131
# Select rows by positional indices and return the corresponding subset
132-
return subspace_discrete.exp_rep.iloc[selection]
132+
return candidates.iloc[selection]
133133

134134
@override
135135
def __str__(self) -> str:

baybe/recommenders/pure/nonpredictive/sampling.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ def _recommend_hybrid(
4141
if searchspace.type is SearchSpaceType.CONTINUOUS:
4242
return cont_random
4343

44-
candidates_exp = searchspace.discrete.exp_rep
44+
candidates_exp = searchspace.discrete.get_candidates()
4545

4646
# Restrict to a random subset if subset-generating constraints are present
4747
if searchspace.discrete.n_subsets > 0:
@@ -152,11 +152,12 @@ def _recommend_discrete(
152152
from sklearn.preprocessing import StandardScaler
153153

154154
# TODO [Scaling]: scaling should be handled by search space object
155+
candidates = subspace_discrete.get_candidates()
156+
candidates_comp = subspace_discrete.transform(candidates)
155157
scaler = StandardScaler()
156-
scaler.fit(subspace_discrete.comp_rep)
158+
scaler.fit(candidates_comp)
157159

158160
# Scale and sample
159-
candidates_comp = subspace_discrete.comp_rep
160161
candidates_scaled = np.ascontiguousarray(scaler.transform(candidates_comp))
161162

162163
if active_settings.use_fpsample:
@@ -173,7 +174,7 @@ def _recommend_discrete(
173174
initialization=self.initialization.value,
174175
random_tie_break=self.random_tie_break,
175176
)
176-
return subspace_discrete.exp_rep.iloc[ilocs]
177+
return candidates.iloc[ilocs]
177178

178179
@override
179180
def __str__(self) -> str:

baybe/searchspace/core.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
from collections.abc import Collection, Iterable, Iterator, Sequence
77
from enum import Enum
88
from itertools import product
9-
from typing import TYPE_CHECKING, ClassVar, cast
9+
from typing import TYPE_CHECKING, ClassVar
1010

1111
import numpy as np
1212
import numpy.typing as npt
@@ -263,7 +263,7 @@ def task_idx(self) -> int | None:
263263
# appear first in the computational dataframe.
264264
# 3. It assumes there exists exactly one task parameter
265265
# --> Fix this when refactoring the data
266-
return cast(int, self.discrete.comp_rep.columns.get_loc(task_param.name))
266+
return self.discrete.comp_rep_columns.index(task_param.name)
267267

268268
@property
269269
def n_tasks(self) -> int:

0 commit comments

Comments
 (0)