Skip to content

space efficient storage for a million EDG binaries #293

@milahu

Description

@milahu

the "million EDG binaries" (30MB zipped, 140MB raw) would compress well with git

transfer size would stay the same, but storage size would be much smaller = no need for amazon S3 server

migrate tarballs to git:

#!/bin/sh

if [ -d gitrepo ]; then
  echo "error: folder exists: gitrepo. to run test again, run: rm -rf gitrepo"
  exit 1
fi

mkdir gitrepo

git -C gitrepo init

# https://github.com/rose-compiler/rose/blob/weekly/src/frontend/CxxFrontend/EDG_VERSION
release_list="$(cat <<EOF
roseBinaryEDG-5-0-x86_64-pc-linux-gnu-gnu-10-5.0.11.77.1
roseBinaryEDG-5-0-x86_64-pc-linux-gnu-gnu-10-5.0.11.78.1
roseBinaryEDG-5-0-x86_64-pc-linux-gnu-gnu-10-5.0.11.79.1
roseBinaryEDG-5-0-x86_64-pc-linux-gnu-gnu-10-5.0.11.80.1
roseBinaryEDG-5-0-x86_64-pc-linux-gnu-gnu-10-5.0.11.81.1
roseBinaryEDG-5-0-x86_64-pc-linux-gnu-gnu-10-5.0.11.82.1
roseBinaryEDG-5-0-x86_64-pc-linux-gnu-gnu-10-5.0.11.82.2
roseBinaryEDG-5-0-x86_64-pc-linux-gnu-gnu-10-5.0.11.82.3
EOF
)"

for release in $release_list
do
  echo adding $release
  [ -e $release.tar.gz ] || wget http://edg-binaries.rosecompiler.org/$release.tar.gz
  [ -d $release ] || tar -xf $release.tar.gz
  cp -r $release/* $release/.libs gitrepo/

  # TODO use release date for commit + tag
  git -C gitrepo add .
  git -C gitrepo commit -m "$release"
  git -C gitrepo tag "$release"

  rm -rf $release
done

echo raw size
du -sh gitrepo/.git
echo
echo compressing ...
time git -C gitrepo gc
echo
echo compressed size
du -sh gitrepo/.git
echo
echo total size of tarballs
du -shc roseBinaryEDG-*.tar.gz | tail -n1
raw size
247M	gitrepo/.git

compressing ...
Enumerating objects: 35, done.
Counting objects: 100% (35/35), done.
Delta compression using up to 4 threads
Compressing objects: 100% (34/34), done.
Writing objects: 100% (35/35), done.
Total 35 (delta 15), reused 0 (delta 0), pack-reused 0

real	0m57.203s
user	0m52.688s
sys	0m3.180s

compressed size
34M	gitrepo/.git

total size of tarballs
213M	total

fetching a tarball would be as simple as

wget https://github.com/rose-compiler/edg-binaries/archive/roseBinaryEDG-5-0-x86_64-pc-linux-gnu-gnu-10-5.0.11.82.3.tar.gz

compression can be optimized by

compiling object code with the -ffunction-sections and -fdata-sections compiler flags. This has the effect that if you 'insert' a function into a translation unit, the insertion does not cause all of the addresses to change across the whole object file.

https://github.com/elfshaker/elfshaker#applicability

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions