Problem
Currently, the cell relocation process processes all cells regardless of their deletion ratio or size, which can lead to inefficient processing of cells that have minimal benefit from relocation.
Proposed Solutions
Option 1: Deletion Ratio Cut-off
Add a cut-off at the cell level. If a cell has very few entries to remove (e.g., < 10% deletion ratio), skip relocating the entire cell. We can count bloom filter misses as a proxy for deletions:
let mut likely_removals = 0;
let total_entries = index.iter().count();
if let Some(bloom) = bloom_filters.get(&cell_ref.keyspace_desc.id()) {
for (key, position) in index.iter() {
let reduced_key = cell_ref.keyspace_desc.reduce_key(key);
if !bloom.contains(&LargeTable::bloom_key(&reduced_key, *position)) {
likely_removals += 1;
}
}
let removal_ratio = likely_removals as f32 / total_entries as f32;
if removal_ratio < 0.1 {
return Ok(context); // Skip cell
}
}
Option 2: Cell Size Threshold
Skip cells smaller than a certain threshold size:
if index.iter().count() < CELL_SIZE_THRESHOLD {
// Skip this cell entirely
return Ok(context);
}
Problem
Currently, the cell relocation process processes all cells regardless of their deletion ratio or size, which can lead to inefficient processing of cells that have minimal benefit from relocation.
Proposed Solutions
Option 1: Deletion Ratio Cut-off
Add a cut-off at the cell level. If a cell has very few entries to remove (e.g., < 10% deletion ratio), skip relocating the entire cell. We can count bloom filter misses as a proxy for deletions:
Option 2: Cell Size Threshold
Skip cells smaller than a certain threshold size: