Skip to content

Polynomial runtime with DictStore #227

@publik-void

Description

@publik-void

Several functions on a DictStore are written in a way where they have O(n^2) time complexity, I believe. In any case, a DictStore-backed ZGroup with a large number of files can become quite slow – more than one would expect if the runtime would just scale linearly with the number of files. I've seen this happening so far with consolidate_metadata, zopen, and (repeated) zgroup/zcreate.

From what I can tell, zgroup and zcreate won't be super easy to fix without changing how DictStore itself works. I haven't looked more deeply into zopen yet. But consolidate_metadata's complexity can be improved to O(n) by defining this method:

function consolidate_metadata(s::DictStore, d, prefix)
  # Make the prefix have a trailing slash and pre-escape any `\Q` and `\E`
  _prefix = isempty(prefix) ? "" : replace(rstrip(prefix, '/'),
    "\\E" => "\\\\E", "\\Q" => "\\\\Q") * '/'

  # Matches any `.zattrs`, `.zarray`, or `.zgroup` below `prefix`
  regex = Regex("^\\Q$_prefix\\E(?:.*/)?(?:\\.zattrs|\\.zarray|\\.zgroup)\$")

  for (k, v) in s.a
    occursin(regex, k) || continue
    d[k] = JSON.parse(String(copy(v)); dicttype = Dict{String, Any})
  end

  return d
end

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions