Skip to content

Use DANE version with no-tar option (WIP)#48

Open
Veldhoen wants to merge 14 commits into
mainfrom
41-dont-tar
Open

Use DANE version with no-tar option (WIP)#48
Veldhoen wants to merge 14 commits into
mainfrom
41-dont-tar

Conversation

@Veldhoen

@Veldhoen Veldhoen commented Jul 26, 2024

Copy link
Copy Markdown
Contributor

Instead of tarring the worker's output before uploading to S3, files are added separately with prefixes to maintain the directory structure. This mostly relies on a proposed update in DANE core; https://github.com/CLARIAH/DANE/tree/23-update-s3_util-to-allow-for-untarred-uploading

  • Make sure to have a .env file with AWS credentials to be able to write to the bucket you specify in config
  • Run either locally:
    • install the environment with poetry install
    • Copy ./config/config.yl to the root of the repo (.) and adjust the necessary settings: BASE_MOUNT and PATHS , S3_endpoint_URL, TRANSFER_ON_COMPLETION
    • poetry run python worker.py --run-test-file
  • Or run containerized:
    • Build the image docker build -t dane-visual-feature-extraction-worker .
    • Adjust the config in ./config/config.yl (set S3_endpoint_URL and TRANSFER_ON_COMPLETION)
    • Run docker-compose up
  • Run with TAR_OUTPUT: false and TAR_OUTPUT: true and confirm that the output is transferred in the appropriate format

Bonus points: check out the DANE PR too (https://github.com/CLARIAH/DANE/tree/23-update-s3_util-to-allow-for-untarred-uploading)

@Veldhoen Veldhoen linked an issue Jul 26, 2024 that may be closed by this pull request
@Veldhoen Veldhoen marked this pull request as ready for review August 6, 2024 14:36
@Veldhoen Veldhoen requested a review from KleinRana August 6, 2024 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dont tar

1 participant