Skip to content

Data Management

Dan MacGuigan edited this page Nov 20, 2024 · 7 revisions

General data management workflow.

graph TD;

  GenoHub[**GenoHub**
Demultiplexed and compressed sequence reads in FASTQ format. Files should end in “.fastq.gz” or “.fq.gz”. Is there any consistent naming scheme?]
  Analyses(Run quality/adapter trimming, mitogenome assembly, etc)
  Scratch[(**Hydra Scratch**
/scratch/???/USER_ID
40 TB. Not backed up. Might need automatic file purging to keep space clean.)]
  Store[(**Hydra Store**
/store/???/USER_ID
40 TB. Not backed up. For non-active projects or large raw data files. Drive system is slower, can't be used for active analysis)]
  XDrive[(**P Drive**
P:\NMNH-OCEAN-DNA
Initially 5 TB. Incrementally backed up daily, fully backed up weekly. Only accessible from SI computers.)]
  GDrive[(**NOAA Google Drive**
Unlimited storage space on NSL Google Shared Drive. Only accessible by NOAA employees.)]
  
  Rename("`Dan? renames FASTQ files following Best Practices Guide`")

  Move1[/Dan? downloads raw FASTQ files/]
  Move2[/Dan? copies renamed raw FASTQs/]
  Move3[/copy clean reads and final results/]
  Move4[/Dan runs monthly backup/]

  GenoHub-->Move1
  Move1-->Scratch
  subgraph " "
    Scratch-->Rename
    Rename-->Analyses
    Rename-->Move2
    Move2-->Store
    Analyses-->Move3
    Move3-->Store
  end
  Store-->Move4
  Move4-->XDrive
  Move4-->GDrive

  classDef process stroke:black,color:white,fill:#159BD7,stroke-dasharray: 5 5
  classDef storage stroke:black,color:white,fill:#159BD7
  classDef ccr stroke:black,color:white,fill:#159BD7

  class Rename,Analyses,Move1,Move2,Move3,Move4 process
  class GenoHub,Scratch,Store,XDrive,GDrive storage

  click Rename "https://github.com/dmacguigan/SI-Ocean-DNA/wiki/Best-Practices#sequence-data-file"

Loading

Clone this wiki locally