Here's output with conversation with Opus 4.7
Good question. Lines 129–130 are exactly the lever to pull if you want to keep multiblock's solid rectangles but let the NPZ noise library shape where the rectangles land, rather than (as multinoise does) shape what's inside them. Here are concrete, mutually compatible integration ideas, from minimal to deeper.
Background: what's available from the NPZ library
From multinoise.py, the relevant ingredients are reusable verbatim:
# -- Color Noise Initialization
self.color_mask_ratio = color_mask_ratio
for name, val in [("enc_drop_order", enc_drop_order), ("pred_drop_order", pred_drop_order)]:
if val not in ("lowest", "highest"):
raise ValueError(f"{name} must be 'lowest' or 'highest', got {val!r}")
self.enc_drop_order = enc_drop_order
self.pred_drop_order = pred_drop_order
self.trans_sequence = transforms.Compose([
transforms.RandomCrop(self.height), # Crop to [self.height, self.width] which is [14, 14] for ViT/14
transforms.RandomHorizontalFlip(p=0.5),
transforms.RandomVerticalFlip(p=0.5),
NormalizeBySliceMax()
])
self._load_color_pattern(color_noise_path)
def _extract_noise_windows(self, B: int) -> torch.Tensor:
"""
Produce B augmented noise grids from the stored noise pattern collection.
...
Returns:
Tensor of shape [B, height, width].
"""
So a per-batch [B, H, W] noise field is essentially free to obtain inside multiblock.MaskCollator.
Option A — Noise-weighted top-left sampling (smallest change to lines 129–130)
Instead of uniform sampling of the top-left corner, draw it from a categorical distribution whose logits depend on the noise grid. Two natural variants:
A.1 Score = noise at the top-left pixel
# replace the two uniform draws around lines 129–130
valid = noise_grid[: self.height - h, : self.width - w] # [H', W']
probs = valid.flatten().clamp_min(1e-8)
probs = probs / probs.sum()
flat_idx = torch.multinomial(probs, 1).item()
top = flat_idx // valid.shape[1]
left = flat_idx % valid.shape[1]
A.2 Score = sum of noise inside the would-be box (true "energy" of placement)
Pre-compute an integral image of the noise grid once per (B, image) and read box sums in O(1):
ii = noise_grid.cumsum(0).cumsum(1) # integral image; do once per image
def box_sum(t, l):
return ii[t+h-1, l+w-1] - (ii[t-1, l+w-1] if t else 0) \
- (ii[t+h-1, l-1] if l else 0) \
+ (ii[t-1, l-1] if t and l else 0)
scores = torch.stack([
box_sum(t, l)
for t in range(self.height - h)
for l in range(self.width - w)
])
Then torch.multinomial(softmax(scores / T), 1) (with a temperature T) gives you a smooth control between "uniform" (T → ∞) and "always pick the noisiest region" (T → 0). You can also flip the sign of scores to bias toward low-noise regions instead.
Effect: rectangles still solid, but they preferentially land on (or away from) noise-rich locations. This gives you a structured curriculum of where to predict without the irregularity that multinoise introduces.
Option B — Use noise to derive acceptable_regions
multiblock already has acceptable_regions-driven rejection sampling. You can plug a thresholded noise mask in as a permanent acceptable region, requiring blocks to fall on patches in (say) the top 50% of the noise field:
threshold = torch.quantile(noise_grid, 0.5)
noise_region = (noise_grid >= threshold).to(torch.int32) # [H, W]
# pass this as the FIRST entry of acceptable_regions, before mask complements
This integrates cleanly with multiblock's existing relaxation loop:
tries: Number of constraint relaxations (0 = enforce all, 1 = ignore one, etc.)
Process:
- acceptable_regions is a list of 2D binary masks (1 = acceptable, 0 = not acceptable)
- Element-wise multiplication: mask *= region zeros out pixels where region=0
- This "crops" the block to only overlap with acceptable regions
- As tries increases, fewer regions are enforced (gradual constraint relaxation)
"""
N = max(int(len(acceptable_regions) - tries), 0)
for k in range(N):
mask *= acceptable_regions[k] # Element-wise: 1*1=1 (keep), 1*0=0 (remove)
Caveat: this breaks the "solid rectangle" property the same way multinoise does — the constraint multiplication can carve holes. If you want to preserve solidity, do not add the noise mask to acceptable_regions; only use it to bias the top-left draw (Option A) and reject samples whose box sum is below some quantile (Option C).
Option C — Rejection sampling against a noise score
Stay almost identical to current multiblock lines 129–130, but reject draws whose box sum is below a percentile of all possible box sums:
top = torch.randint(0, self.height - h, (1,))
left = torch.randint(0, self.width - w, (1,))
if box_sum(top, left) < threshold:
continue # resample
Cheap, preserves rectangles, and uses the noise field as a soft spatial prior. With a slowly-rising threshold across training, you get a built-in curriculum.
Option D — Noise-conditioned block size or count
Instead of touching lines 129–130, use the per-image noise statistics to:
- Modulate
mask_scale (currently sampled by _sample_block_size): images with high noise variance → larger predictor blocks.
- Modulate
npred: spawn an extra predictor block on images whose noise field has a strong second mode.
- Choose between enc / pred drop orders dynamically per image.
These don't change geometry, only schedule, and so are very safe to A/B against vanilla multiblock.
Option E — Noise to break ties between candidate top-lefts in acceptable_regions
When acceptable_regions produces many feasible (top, left) positions, currently the loop just keeps trying random corners. You can replace that random retry with: enumerate feasible corners (boolean grid of valid placements), score them by box_sum against the noise grid, sample one. This kills the timeout/relaxation loop for the common case and gives noise-aware enc/pred separation without dropping any patches.
Recommendation
If your goal is "use the NPZ library, but keep mask blocks solid":
- Wire up the existing noise infra (
NormalizeBySliceMax, _load_color_pattern, _extract_noise_windows) into multiblock.MaskCollator.__init__ and __call__ — no logic changes inside the sampling loop yet. This is purely plumbing and lets you log the noise grids alongside masks.
- Replace lines 129–130 with Option A.2 (integral-image-weighted top-left sampling), with a temperature hyper-parameter
noise_temp that defaults to +∞ (uniform — bit-identical to current behavior). That makes the change a strict superset of vanilla multiblock.
- Optionally layer Option C on top for a stronger prior, controlled by a single quantile knob.
That gives you a meaningful, novel use of the noise library that is orthogonal to multinoise.py: multinoise modulates what is masked inside a box; this would modulate where the box lives, while keeping the rectangles solid.
If you want, I can sketch the exact diff for Option A.2 + the plumbing in step 1 (still in Ask mode — sketch only, no edits).
Here's output with conversation with Opus 4.7
Good question. Lines 129–130 are exactly the lever to pull if you want to keep
multiblock's solid rectangles but let the NPZ noise library shape where the rectangles land, rather than (asmultinoisedoes) shape what's inside them. Here are concrete, mutually compatible integration ideas, from minimal to deeper.Background: what's available from the NPZ library
From
multinoise.py, the relevant ingredients are reusable verbatim:So a per-batch
[B, H, W]noise field is essentially free to obtain insidemultiblock.MaskCollator.Option A — Noise-weighted top-left sampling (smallest change to lines 129–130)
Instead of uniform sampling of the top-left corner, draw it from a categorical distribution whose logits depend on the noise grid. Two natural variants:
A.1 Score = noise at the top-left pixel
A.2 Score = sum of noise inside the would-be box (true "energy" of placement)
Pre-compute an integral image of the noise grid once per
(B, image)and read box sums in O(1):Then
torch.multinomial(softmax(scores / T), 1)(with a temperatureT) gives you a smooth control between "uniform" (T → ∞) and "always pick the noisiest region" (T → 0). You can also flip the sign ofscoresto bias toward low-noise regions instead.Effect: rectangles still solid, but they preferentially land on (or away from) noise-rich locations. This gives you a structured curriculum of where to predict without the irregularity that
multinoiseintroduces.Option B — Use noise to derive
acceptable_regionsmultiblockalready hasacceptable_regions-driven rejection sampling. You can plug a thresholded noise mask in as a permanent acceptable region, requiring blocks to fall on patches in (say) the top 50% of the noise field:This integrates cleanly with
multiblock's existing relaxation loop:Caveat: this breaks the "solid rectangle" property the same way
multinoisedoes — the constraint multiplication can carve holes. If you want to preserve solidity, do not add the noise mask toacceptable_regions; only use it to bias the top-left draw (Option A) and reject samples whose box sum is below some quantile (Option C).Option C — Rejection sampling against a noise score
Stay almost identical to current
multiblocklines 129–130, but reject draws whose box sum is below a percentile of all possible box sums:Cheap, preserves rectangles, and uses the noise field as a soft spatial prior. With a slowly-rising threshold across training, you get a built-in curriculum.
Option D — Noise-conditioned block size or count
Instead of touching lines 129–130, use the per-image noise statistics to:
mask_scale(currently sampled by_sample_block_size): images with high noise variance → larger predictor blocks.npred: spawn an extra predictor block on images whose noise field has a strong second mode.These don't change geometry, only schedule, and so are very safe to A/B against vanilla
multiblock.Option E — Noise to break ties between candidate top-lefts in
acceptable_regionsWhen
acceptable_regionsproduces many feasible(top, left)positions, currently the loop just keeps trying random corners. You can replace that random retry with: enumerate feasible corners (boolean grid of valid placements), score them bybox_sumagainst the noise grid, sample one. This kills the timeout/relaxation loop for the common case and gives noise-aware enc/pred separation without dropping any patches.Recommendation
If your goal is "use the NPZ library, but keep mask blocks solid":
NormalizeBySliceMax,_load_color_pattern,_extract_noise_windows) intomultiblock.MaskCollator.__init__and__call__— no logic changes inside the sampling loop yet. This is purely plumbing and lets you log the noise grids alongside masks.noise_tempthat defaults to+∞(uniform — bit-identical to current behavior). That makes the change a strict superset of vanillamultiblock.That gives you a meaningful, novel use of the noise library that is orthogonal to
multinoise.py:multinoisemodulates what is masked inside a box; this would modulate where the box lives, while keeping the rectangles solid.If you want, I can sketch the exact diff for Option A.2 + the plumbing in step 1 (still in Ask mode — sketch only, no edits).