Skip to content

Limit the cache based on arbitrary measures #38

Description

@rphmeier

Motivation:
My use-case for lru-cache involves inspecting items' heap size. Rather than hack it in, I think it would be beneficial to make the cache adaptable to whatever limit needs to be imposed.

You might have a set of heterogeneous items, each using a different amount of allocation on the heap. Maybe a more general Meter trait might help:

trait Meter<T> {
    fn measure(&self, item: &T) -> usize;
}

The default Meter could be a Count:

struct Count;

impl<T> Meter<T> for Count { 
    fn measure(&self, _: &T) -> usize { 1 }
}

Here's an implementation that would use the HeapSizeOf trait (I'd recommend this one for inclusion under a feature gate as it's generally useful.)

struct HeapSize;

impl<T: HeapSizeOf> Meter<T> for HeapSize {
    fn measure(&self, item: &T) -> usize { item.heap_size_of_children() + ::std::mem::size_of::<T>() }
}

When you add or remove an item, you add or subtract meter.measure(&item) from the maximum respectively. If the metered total exceeds the maximum while adding an item, other items must be removed in a loop until the total falls below the limit.

Caveats:

  • get_mut or interior mutability could allow items to be altered such that their measure at insertion time differs from their measure at removal. We could add a UniformMeter<T> which takes no reference to individual items when measuring -- indicating that the measurement is unrelated to item size. get_mut might only be implemented when the meter is uniform, while fn get(&mut self, key: &Q) -> Option<&T> would serve as an alternative implemented for all caches. This doesn't solve the interior mutability issue, but that can be addressed the same way as in HashMap: documentation specifying that it's a logic error to modify the value such that its measure would be altered.
  • insert can remove multiple items -- how do we return each of them to the caller efficiently? (maybe an iterator?)

Alternatively:

  • check the measure before and after get_mut and adjust the total correspondingly. The issue here is that it would need to be done through a CacheHandle<'a, T: 'a> which contains that logic in its destructor. This would break backwards compatibility, and would need to handle returning displaced items if the size grows.
  • re-measure every item in the cache whenever necessary. this seems prohibitively expensive

Looking forward to your comments and naming suggestions below.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions