-
Notifications
You must be signed in to change notification settings - Fork 1
Data Management
Dan MacGuigan edited this page Nov 20, 2024
·
7 revisions
General data management workflow.
graph TD;
GenoHub[**GenoHub**
Demultiplexed and compressed sequence reads in FASTQ format. Files should end in “.fastq.gz” or “.fq.gz”. Is there any consistent naming scheme?]
Analyses(Run quality/adapter trimming, mitogenome assembly, etc)
Scratch[(**Hydra Scratch**
/scratch/???/USER_ID
40 TB. Not backed up. Might need automatic file purging to keep space clean.)]
Store[(**Hydra Store**
/store/???/USER_ID
40 TB. Not backed up. For non-active projects or large raw data files. Drive system is slower, can't be used for active analysis)]
XDrive[(**P Drive**
P:\NMNH-OCEAN-DNA
Initially 5 TB. Incrementally backed up daily, fully backed up weekly. Only accessible from SI computers.)]
GDrive[(**NOAA Google Drive**
Unlimited storage space on NSL Google Shared Drive. Only accessible by NOAA employees.)]
Rename("`Dan? renames FASTQ files following Best Practices Guide`")
Move1[/Dan? downloads raw FASTQ files/]
Move2[/Dan? copies renamed raw FASTQs/]
Move3[/copy clean reads and final results/]
Move4[/Dan runs monthly backup/]
GenoHub-->Move1
Move1-->Scratch
subgraph " "
Scratch-->Rename
Rename-->Analyses
Rename-->Move2
Move2-->Store
Analyses-->Move3
Move3-->Store
end
Store-->Move4
Move4-->XDrive
Move4-->GDrive
classDef process stroke:black,color:white,fill:#159BD7,stroke-dasharray: 5 5
classDef storage stroke:black,color:white,fill:#159BD7
classDef ccr stroke:black,color:white,fill:#159BD7
class Rename,Analyses,Move1,Move2,Move3,Move4 process
class GenoHub,Scratch,Store,XDrive,GDrive storage
click Rename "https://github.com/dmacguigan/SI-Ocean-DNA/wiki/Best-Practices#sequence-data-file"