Skip to content

Smarter watching #8

Description

@shinzlet

Right now we just use filesystem event monitoring to copy files over. This isn't sufficient, though. Files get missed! It's possible to start softcopy halfway into a source array's production, and miss a bunch of files. To deal with this, we perform an integrity check once the observers stop: we enumerate over all the file names we expect, and queue the ones that are missing. This is extremely slow, and does a lot of unnecessary work.

Ideally, we would have some way to concretely know "have we copied this file yet? What are we missing?" - but this is difficult due to the number of files (~30M in some cases). There are a few fixes to this, we can use PackedName to do memory compression, we can use an on-disk queue, we could use zarr 3 sharding in the future. But I think the most elegant is probably to use a combination of in-memory compression (packedname) and exploiting the "copy head" - we know timepoints are written sequentially, and so the files which aren't yet copied will always be in either the currently stalling timepoint or a timepoint close behind. This way, we can store a timepoint index and a list of files that haven't been copied yet. Every time we start a timepoint, we create a list of expected files. When we get a file, we take it off that list. When the timepoint advances, we take the elements that remain in that list and add them to our tardy queue. That way, we end up with a list of files that we know we never got a filesystem event about, and when there's downtime, we can check on them (or just save them until the integrity check at the end).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions