-
Notifications
You must be signed in to change notification settings - Fork 5
Add metadata specification #77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 9 commits
564e120
661b32f
d516b6a
5ddce0a
fd96fc4
23635ed
8598ec3
f2680ae
870b062
b15f766
3c9bf2e
32be071
7116f77
da9c7a5
2ec75c2
17ffbf2
a8507b2
f4d377a
f91942d
81097a2
f2e3f5e
a85c570
890ed9f
70936f3
91cb6b2
790eae1
3535610
7fa0a1d
d6f2945
d970d64
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,297 @@ | ||
| :orphan: | ||
| # Metadata | ||
|
|
||
| Metadata is additional data that describes the project data itself. Metadata can | ||
| be high-level (e.g. a general overview of the study and its purpose) or | ||
| low-level (acquisition parameters for extracellular electrophysiology | ||
| setup, or microscope). | ||
|
|
||
| A number of detailed metadata standards exist, including | ||
|
JoeZiminski marked this conversation as resolved.
Outdated
|
||
| [BIDS](https://bids-specification.readthedocs.io/en/stable/introduction.html), | ||
| [openMinds](https://github.com/openMetadataInitiative) and | ||
|
JoeZiminski marked this conversation as resolved.
Outdated
|
||
| [Allen](https://github.com/AllenNeuralDynamics/aind-data-schema), | ||
|
JoeZiminski marked this conversation as resolved.
Outdated
|
||
| each differing in its structure, level of detail and the datatypes they cover. | ||
|
|
||
| Here, we provide a simple metadata organisation scheme that you can use to | ||
|
JoeZiminski marked this conversation as resolved.
Outdated
|
||
| get started with adding metadata to your project. You are free to add | ||
|
JoeZiminski marked this conversation as resolved.
Outdated
|
||
| metadata fields if you wish, but at the end of this guide we recommend fields | ||
|
JoeZiminski marked this conversation as resolved.
Outdated
|
||
| that can go in each section. | ||
|
|
||
| Please get in touch if you would like additional keys added to the metadata fields. | ||
|
JoeZiminski marked this conversation as resolved.
Outdated
|
||
|
|
||
| ## Metadata Organisation Description | ||
|
|
||
| At each level of the project, a metadata file can be included that describes that level: | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just seen that this comes later, but I think some intro would be better higher up in case people haven't come across yaml files. |
||
|
|
||
| ```yaml | ||
| └── my_project/ | ||
| ├── my_project_metadata.yml | ||
| └── rawdata/ | ||
| ├── rawdata_metadata.yml | ||
| ├── sub-001/ | ||
| │ ├── sub-001_metadata.yml | ||
| │ └── ses-001/ | ||
| │ ├── ses-001_metadata.yml | ||
| │ ├── behav/ | ||
| │ │ └── behav_metadata.yml | ||
| │ └── ephys/ | ||
| │ └── ephys_metadata.yml | ||
| ├── sub-002/ | ||
| │ ├── sub-002_metadata.yml | ||
| │ └── ... | ||
| └── ... | ||
| ``` | ||
|
|
||
| **``project_metadata.yml``** | ||
| - This file contains high-level information about the project, for example its overall purpose, | ||
| who is involved in the project. See [project metadata](project-metadata). | ||
|
|
||
| **``rawdata_metadata.yml``** | ||
| - This file contains information about the data collection, for example the species of animal used | ||
| in the project. It may also contain specific sections for datatypes, that apply to all subjects | ||
| in the project. For example, if `ephys` data was collected at a sampling rate of 30kHz for each subject, | ||
|
JoeZiminski marked this conversation as resolved.
Outdated
|
||
| it may contain an `ephys` section with a `samplingRate` field. See [rawdata metadata](rawdata-metadata). | ||
|
|
||
| **``sub-<value>_metadata.yml``** | ||
|
JoeZiminski marked this conversation as resolved.
Outdated
|
||
| - This contains information about an individual subject, for example the date of birth, | ||
| identifiers, genotype or other key information. See [subject metadata](sub-metadata). | ||
|
|
||
| **``ses-<value>_metadata.yml``** | ||
| - This file contains information related to the particular experimental session. For example, | ||
| the date, additional notes on what happened in the session. See [session metadata](ses-metadata). | ||
|
|
||
| **``<datatype>_metadata.yml``** | ||
| - This file can contain metadata specific to the datatype acquisition. See the [datatype keys](datatype-keys) | ||
| section for details on keys to include for particular datatypes. | ||
|
|
||
| # YAML file format | ||
|
|
||
| Metadata files should use the [YAML](https://yaml.org/) file format (`.yml` or `.yaml`). | ||
| YAML is a human-readable text format designed to be easy to read and edit. | ||
|
|
||
| YAML stores information as **key-value pairs** and uses indentation (spaces) to represent structure. | ||
|
|
||
| For example, a simple metadata file may look like: | ||
|
|
||
| ```yaml | ||
| projectName: "Visual Decision Making Study" | ||
| species: "Mus musculus" | ||
|
|
||
| ephys: | ||
| samplingRate: 30000 | ||
| probeType: "Neuropixels 2.0" | ||
|
|
||
| experimenters: | ||
| - "Jane Smith" | ||
| - "John Doe" | ||
| ``` | ||
|
|
||
| # Inheritance | ||
|
|
||
| It may be that a particular metadata entry is the same for all sub-folders in a project. | ||
| For example, the sampling rate used for the `ephys` data may be the same across all sessions | ||
| in the project. | ||
|
|
||
| In this case, we can place the metadata entries for lower levels as keys at a higher level. | ||
| For example, if your `ephys` sampling rate for all subjects was `30 kHz`, you could structure | ||
| your `rawdata_metadata.yml` file as: | ||
|
|
||
| ```yaml | ||
| SomeKey: someValue | ||
| ephys: | ||
| samplingRate: 30000 | ||
| ``` | ||
|
|
||
| This would then apply to all subjects in the `rawdata` folder. | ||
|
|
||
| However, this can be overwritten for particular cases e.g. if due to an error, a different | ||
| sampling rate was used in the acquisition. To do this, a metadata file should be included | ||
| for the case of interest in the relevant folder. For example, if `sub-005` used a different | ||
| sampling rate for all sessions, a `sub-005_metadata.yml` file could be included to overwrite the | ||
| information for this particular subject. e.g. | ||
|
|
||
| ```yaml | ||
| samplingRate: 30500 | ||
| notes: "A mistake was made during acquisition, leading to a sampling rate of 30500 Hz." | ||
| ``` | ||
|
|
||
| The folder structure may look like: | ||
|
|
||
| ``` | ||
| . | ||
| └── my_project/ | ||
| └── rawdata/ | ||
| ├── rawdata_metadata.yml # contains the `ephys` entry applying to all subjects | ||
| └── sub-001/ | ||
| ├── ses-001/ | ||
| │ └── ephys/ | ||
| │ ├── ephys_metadata.yml # contains the overwriting entry | ||
|
JoeZiminski marked this conversation as resolved.
Outdated
|
||
| │ └── ... | ||
| └── ses-002/ | ||
| └── ... | ||
| ``` | ||
|
|
||
| This was inspired by the similar inheritance principle in [BIDS](https://bids-validator.readthedocs.io/en/stable/validation-model/inheritance-principle.html) | ||
|
JoeZiminski marked this conversation as resolved.
Outdated
|
||
|
|
||
| # Recommended Metadata Keys | ||
|
|
||
| To ensure alignment across and within projects, we recommend using metadata keys from | ||
| a predefined set. Here we use BIDS as an existing source of metadata keys for each section. | ||
|
|
||
| Please get in touch if you would like us to add new metadata fields to this list. | ||
|
JoeZiminski marked this conversation as resolved.
Outdated
|
||
|
|
||
| (project-metadata)= | ||
| ## Project Metadata | ||
|
|
||
| We use the BIDS `dataset_description.json` fields as a starting point for project-level metadata. | ||
|
|
||
| See the full specification for detailed descriptions: | ||
| [BIDS Dataset Description Specification](https://bids-specification.readthedocs.io/en/stable/modality-agnostic-files/dataset-description.html#dataset_descriptionjson) | ||
|
JoeZiminski marked this conversation as resolved.
Outdated
|
||
|
|
||
| Recommended keys: | ||
|
JoeZiminski marked this conversation as resolved.
Outdated
|
||
|
|
||
| ```yaml | ||
| Name: | ||
| BIDSVersion: | ||
| DatasetType: | ||
| License: | ||
| Authors: | ||
| Acknowledgements: | ||
| HowToAcknowledge: | ||
| Funding: | ||
| EthicsApprovals: | ||
| ReferencesAndLinks: | ||
| DatasetDOI: | ||
| ``` | ||
|
|
||
| (rawdata-metadata)= | ||
| ## Rawdata Metadata | ||
|
|
||
| This file is primarily intended for metadata that applies across the whole dataset, | ||
| including inherited datatype-specific metadata (for example `ephys` acquisition settings). | ||
|
|
||
| Example structure: | ||
|
|
||
| ```yaml | ||
| species: | ||
| strain: | ||
|
|
||
| ephys: | ||
| samplingRate: | ||
| probeType: | ||
|
|
||
| behav: | ||
| taskName: | ||
| ``` | ||
|
|
||
| (sub-metadata)= | ||
| ## Sub Metadata | ||
|
|
||
| We use the BIDS participant fields as a starting point for subject-level metadata. | ||
|
|
||
| See the full specification for detailed descriptions: | ||
| [BIDS Participants Specification](https://bids-specification.readthedocs.io/en/stable/modality-agnostic-files/data-summary-files.html) | ||
|
|
||
| Recommended keys: | ||
|
|
||
| ```yaml | ||
| subject_id: | ||
| age: | ||
| sex: | ||
| handedness: | ||
| species: | ||
| strain: | ||
| strain_rrid: | ||
| genotype: | ||
| dateOfBirth: | ||
| ``` | ||
|
|
||
| (ses-metadata)= | ||
| ## Ses Metadata | ||
|
|
||
| We use the BIDS session fields as a starting point for session-level metadata. | ||
|
|
||
| See the full specification for detailed descriptions: | ||
| [BIDS Sessions Specification](https://bids-specification.readthedocs.io/en/stable/modality-agnostic-files/data-summary-files.html#sessions-file) | ||
|
|
||
| Recommended keys: | ||
|
|
||
| ```yaml | ||
| session_id: | ||
| sessionDate: | ||
| age: | ||
| weight: | ||
| notes: | ||
| experimenter: | ||
| ``` | ||
|
|
||
| (datatype-keys)= | ||
| ## Datatype keys | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm slightly worried about the suggested datatype keys listed here per datatype. There is a risk of them getting out of sync with the upstream linked BIDS / BEP specs. The dataset description, subject and session keys should not change much (if at all) so we should keep the suggested sets for those. But datatypes are more fragile (especially the BEPs ones not yet merged). How about, for each datatype we begin by linking to the relevant BIDS/BEP pages, as you already do, but then in a snippet we provide just a few example fields for each datatype, making it clear that this is not the full set?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hmm this is true, my hesitation with linking to the spec is that it's quite confusing in terms of what are valid metadata keys, as they have the PascalCase keys for their .json (which we copy) but the snake case keys for the .json (which we don't). Rather than explaining this distinction and telling them to only use PascalCase it seems more natural to provide a list of keys and tell them to find them in the spec for more information on those specific keys. Maybe we just include keys from merged BEPs (with the exception of electrophysiology as its nearly merged) and assume the key names themselves will not change? If they add keys but we miss them its not the end of the world? |
||
|
|
||
| Datatype metadata files contain acquisition-specific metadata for each modality. | ||
|
|
||
| (ephys-metadata)= | ||
| ## `ephys` | ||
|
|
||
| We use BIDS electrophysiology metadata fields as a starting point. | ||
|
|
||
| See the full specification for detailed descriptions: | ||
| [BEP032 Electrophysiology Metadata Specification](https://bep032tools.readthedocs.io/en/latest/) | ||
|
|
||
| Recommended keys: | ||
|
|
||
| ```yaml | ||
| samplingRate: | ||
| probeType: | ||
| manufacturer: | ||
| hardwareFilters: | ||
| softwareFilters: | ||
| electrodeCount: | ||
| referenceChannel: | ||
| groundChannel: | ||
| amplifier: | ||
| ``` | ||
|
|
||
| (behav-metadata)= | ||
| ## `behav` | ||
|
|
||
| We use the BIDS behavioural experiment metadata fields as a starting point. | ||
|
|
||
| See the full specification for detailed descriptions: | ||
| [BIDS Behavioural Experiments Specification](https://bids-specification.readthedocs.io/en/stable/modality-specific-files/behavioral-experiments.html) | ||
|
|
||
| Recommended keys: | ||
|
|
||
| ```yaml | ||
| taskName: | ||
| taskDescription: | ||
| instructions: | ||
| stimulusPresentation: | ||
| responseDevice: | ||
| samplingRate: | ||
| softwareName: | ||
| softwareVersion: | ||
| ``` | ||
|
|
||
| (anat-metadata)= | ||
| ## `anat` | ||
|
|
||
| We use the BIDS microscopy metadata fields as a starting point. | ||
|
|
||
| See the full specification for detailed descriptions: | ||
| [BIDS Microscopy Specification](https://bids-specification.readthedocs.io/en/stable/modality-specific-files/microscopy.html) | ||
|
|
||
| Recommended keys: | ||
|
|
||
| ```yaml | ||
| sampleFixation: | ||
| staining: | ||
| microscopeManufacturer: | ||
| microscopeModel: | ||
| objectiveLens: | ||
| magnification: | ||
| numericalAperture: | ||
| immersionMedium: | ||
| voxelSize: | ||
| imageFormat: | ||
| ``` | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is a placeholder, but this page should be made far more accessible on the website eventually.