Skip to content
Open
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
564e120
Add placeholder page.
JoeZiminski Sep 15, 2025
661b32f
Flesh this out a bit.
JoeZiminski Sep 16, 2025
d516b6a
Small note.
JoeZiminski Sep 16, 2025
5ddce0a
Fill out text.
JoeZiminski Sep 17, 2025
fd96fc4
fix links.
JoeZiminski Sep 18, 2025
23635ed
Minor changes.
JoeZiminski May 20, 2026
8598ec3
Add first full draft.
JoeZiminski May 20, 2026
f2680ae
More tidy ups.
JoeZiminski May 20, 2026
870b062
Fix linting.
JoeZiminski May 20, 2026
b15f766
Update docs/source/metadata.md
JoeZiminski May 21, 2026
3c9bf2e
Update docs/source/metadata.md
JoeZiminski May 21, 2026
32be071
Update docs/source/metadata.md
JoeZiminski May 21, 2026
7116f77
Update docs/source/metadata.md
JoeZiminski May 21, 2026
da9c7a5
Update docs/source/metadata.md
JoeZiminski May 21, 2026
2ec75c2
Update docs/source/metadata.md
JoeZiminski May 21, 2026
17ffbf2
Responding to review.
JoeZiminski May 21, 2026
a8507b2
Merge branch 'add_metdata_spec' of https://github.com/neuroinformatic…
JoeZiminski May 21, 2026
f4d377a
fix linting.
JoeZiminski May 21, 2026
f91942d
Move YAML section.
JoeZiminski May 21, 2026
81097a2
Revising the doc.
JoeZiminski May 21, 2026
f2e3f5e
Update for new metadata org.
JoeZiminski May 22, 2026
a85c570
Fix linting.
JoeZiminski May 22, 2026
890ed9f
both neuroblueprint.
JoeZiminski Jun 4, 2026
70936f3
Update docs/source/metadata.md
JoeZiminski Jun 4, 2026
91cb6b2
Update docs/source/metadata.md
JoeZiminski Jun 4, 2026
790eae1
Update docs/source/metadata.md
JoeZiminski Jun 4, 2026
3535610
Update docs/source/metadata.md
JoeZiminski Jun 4, 2026
7fa0a1d
Update docs/source/metadata.md
JoeZiminski Jun 4, 2026
d6f2945
Update docs/source/metadata.md
JoeZiminski Jun 4, 2026
d970d64
Update docs/source/metadata.md
JoeZiminski Jun 4, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
297 changes: 297 additions & 0 deletions docs/source/metadata.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,297 @@
:orphan:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is a placeholder, but this page should be made far more accessible on the website eventually.

# Metadata

Metadata is additional data that describes the project data itself. Metadata can
be high-level (e.g. a general overview of the study and its purpose) or
low-level (acquisition parameters for extracellular electrophysiology
setup, or microscope).

A number of detailed metadata standards exist, including
Comment thread
JoeZiminski marked this conversation as resolved.
Outdated
[BIDS](https://bids-specification.readthedocs.io/en/stable/introduction.html),
[openMinds](https://github.com/openMetadataInitiative) and
Comment thread
JoeZiminski marked this conversation as resolved.
Outdated
[Allen](https://github.com/AllenNeuralDynamics/aind-data-schema),
Comment thread
JoeZiminski marked this conversation as resolved.
Outdated
each differing in its structure, level of detail and the datatypes they cover.

Here, we provide a simple metadata organisation scheme that you can use to
Comment thread
JoeZiminski marked this conversation as resolved.
Outdated
get started with adding metadata to your project. You are free to add
Comment thread
JoeZiminski marked this conversation as resolved.
Outdated
metadata fields if you wish, but at the end of this guide we recommend fields
Comment thread
JoeZiminski marked this conversation as resolved.
Outdated
that can go in each section.

Please get in touch if you would like additional keys added to the metadata fields.
Comment thread
JoeZiminski marked this conversation as resolved.
Outdated

## Metadata Organisation Description

At each level of the project, a metadata file can be included that describes that level:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just seen that this comes later, but I think some intro would be better higher up in case people haven't come across yaml files.


```yaml
└── my_project/
├── my_project_metadata.yml
└── rawdata/
├── rawdata_metadata.yml
├── sub-001/
│ ├── sub-001_metadata.yml
│ └── ses-001/
│ ├── ses-001_metadata.yml
│ ├── behav/
│ │ └── behav_metadata.yml
│ └── ephys/
│ └── ephys_metadata.yml
├── sub-002/
│ ├── sub-002_metadata.yml
│ └── ...
└── ...
```

**``project_metadata.yml``**
- This file contains high-level information about the project, for example its overall purpose,
who is involved in the project. See [project metadata](project-metadata).

**``rawdata_metadata.yml``**
- This file contains information about the data collection, for example the species of animal used
in the project. It may also contain specific sections for datatypes, that apply to all subjects
in the project. For example, if `ephys` data was collected at a sampling rate of 30kHz for each subject,
Comment thread
JoeZiminski marked this conversation as resolved.
Outdated
it may contain an `ephys` section with a `samplingRate` field. See [rawdata metadata](rawdata-metadata).

**``sub-<value>_metadata.yml``**
Comment thread
JoeZiminski marked this conversation as resolved.
Outdated
- This contains information about an individual subject, for example the date of birth,
identifiers, genotype or other key information. See [subject metadata](sub-metadata).

**``ses-<value>_metadata.yml``**
- This file contains information related to the particular experimental session. For example,
the date, additional notes on what happened in the session. See [session metadata](ses-metadata).

**``<datatype>_metadata.yml``**
- This file can contain metadata specific to the datatype acquisition. See the [datatype keys](datatype-keys)
section for details on keys to include for particular datatypes.

# YAML file format

Metadata files should use the [YAML](https://yaml.org/) file format (`.yml` or `.yaml`).
YAML is a human-readable text format designed to be easy to read and edit.

YAML stores information as **key-value pairs** and uses indentation (spaces) to represent structure.

For example, a simple metadata file may look like:

```yaml
projectName: "Visual Decision Making Study"
species: "Mus musculus"

ephys:
samplingRate: 30000
probeType: "Neuropixels 2.0"

experimenters:
- "Jane Smith"
- "John Doe"
```

# Inheritance

It may be that a particular metadata entry is the same for all sub-folders in a project.
For example, the sampling rate used for the `ephys` data may be the same across all sessions
in the project.

In this case, we can place the metadata entries for lower levels as keys at a higher level.
For example, if your `ephys` sampling rate for all subjects was `30 kHz`, you could structure
your `rawdata_metadata.yml` file as:

```yaml
SomeKey: someValue
ephys:
samplingRate: 30000
```

This would then apply to all subjects in the `rawdata` folder.

However, this can be overwritten for particular cases e.g. if due to an error, a different
sampling rate was used in the acquisition. To do this, a metadata file should be included
for the case of interest in the relevant folder. For example, if `sub-005` used a different
sampling rate for all sessions, a `sub-005_metadata.yml` file could be included to overwrite the
information for this particular subject. e.g.

```yaml
samplingRate: 30500
notes: "A mistake was made during acquisition, leading to a sampling rate of 30500 Hz."
```

The folder structure may look like:

```
.
└── my_project/
└── rawdata/
├── rawdata_metadata.yml # contains the `ephys` entry applying to all subjects
└── sub-001/
├── ses-001/
│ └── ephys/
│ ├── ephys_metadata.yml # contains the overwriting entry
Comment thread
JoeZiminski marked this conversation as resolved.
Outdated
│ └── ...
└── ses-002/
└── ...
```

This was inspired by the similar inheritance principle in [BIDS](https://bids-validator.readthedocs.io/en/stable/validation-model/inheritance-principle.html)
Comment thread
JoeZiminski marked this conversation as resolved.
Outdated

# Recommended Metadata Keys

To ensure alignment across and within projects, we recommend using metadata keys from
a predefined set. Here we use BIDS as an existing source of metadata keys for each section.

Please get in touch if you would like us to add new metadata fields to this list.
Comment thread
JoeZiminski marked this conversation as resolved.
Outdated

(project-metadata)=
## Project Metadata

We use the BIDS `dataset_description.json` fields as a starting point for project-level metadata.

See the full specification for detailed descriptions:
[BIDS Dataset Description Specification](https://bids-specification.readthedocs.io/en/stable/modality-agnostic-files/dataset-description.html#dataset_descriptionjson)
Comment thread
JoeZiminski marked this conversation as resolved.
Outdated

Recommended keys:
Comment thread
JoeZiminski marked this conversation as resolved.
Outdated

```yaml
Name:
BIDSVersion:
DatasetType:
License:
Authors:
Acknowledgements:
HowToAcknowledge:
Funding:
EthicsApprovals:
ReferencesAndLinks:
DatasetDOI:
```

(rawdata-metadata)=
## Rawdata Metadata

This file is primarily intended for metadata that applies across the whole dataset,
including inherited datatype-specific metadata (for example `ephys` acquisition settings).

Example structure:

```yaml
species:
strain:

ephys:
samplingRate:
probeType:

behav:
taskName:
```

(sub-metadata)=
## Sub Metadata

We use the BIDS participant fields as a starting point for subject-level metadata.

See the full specification for detailed descriptions:
[BIDS Participants Specification](https://bids-specification.readthedocs.io/en/stable/modality-agnostic-files/data-summary-files.html)

Recommended keys:

```yaml
subject_id:
age:
sex:
handedness:
species:
strain:
strain_rrid:
genotype:
dateOfBirth:
```

(ses-metadata)=
## Ses Metadata

We use the BIDS session fields as a starting point for session-level metadata.

See the full specification for detailed descriptions:
[BIDS Sessions Specification](https://bids-specification.readthedocs.io/en/stable/modality-agnostic-files/data-summary-files.html#sessions-file)

Recommended keys:

```yaml
session_id:
sessionDate:
age:
weight:
notes:
experimenter:
```

(datatype-keys)=
## Datatype keys

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slightly worried about the suggested datatype keys listed here per datatype. There is a risk of them getting out of sync with the upstream linked BIDS / BEP specs. The dataset description, subject and session keys should not change much (if at all) so we should keep the suggested sets for those.

But datatypes are more fragile (especially the BEPs ones not yet merged).

How about, for each datatype we begin by linking to the relevant BIDS/BEP pages, as you already do, but then in a snippet we provide just a few example fields for each datatype, making it clear that this is not the full set?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm this is true, my hesitation with linking to the spec is that it's quite confusing in terms of what are valid metadata keys, as they have the PascalCase keys for their .json (which we copy) but the snake case keys for the .json (which we don't). Rather than explaining this distinction and telling them to only use PascalCase it seems more natural to provide a list of keys and tell them to find them in the spec for more information on those specific keys.

Maybe we just include keys from merged BEPs (with the exception of electrophysiology as its nearly merged) and assume the key names themselves will not change? If they add keys but we miss them its not the end of the world?


Datatype metadata files contain acquisition-specific metadata for each modality.

(ephys-metadata)=
## `ephys`

We use BIDS electrophysiology metadata fields as a starting point.

See the full specification for detailed descriptions:
[BEP032 Electrophysiology Metadata Specification](https://bep032tools.readthedocs.io/en/latest/)

Recommended keys:

```yaml
samplingRate:
probeType:
manufacturer:
hardwareFilters:
softwareFilters:
electrodeCount:
referenceChannel:
groundChannel:
amplifier:
```

(behav-metadata)=
## `behav`

We use the BIDS behavioural experiment metadata fields as a starting point.

See the full specification for detailed descriptions:
[BIDS Behavioural Experiments Specification](https://bids-specification.readthedocs.io/en/stable/modality-specific-files/behavioral-experiments.html)

Recommended keys:

```yaml
taskName:
taskDescription:
instructions:
stimulusPresentation:
responseDevice:
samplingRate:
softwareName:
softwareVersion:
```

(anat-metadata)=
## `anat`

We use the BIDS microscopy metadata fields as a starting point.

See the full specification for detailed descriptions:
[BIDS Microscopy Specification](https://bids-specification.readthedocs.io/en/stable/modality-specific-files/microscopy.html)

Recommended keys:

```yaml
sampleFixation:
staining:
microscopeManufacturer:
microscopeModel:
objectiveLens:
magnification:
numericalAperture:
immersionMedium:
voxelSize:
imageFormat:
```
23 changes: 2 additions & 21 deletions docs/source/specification.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# The specification

The current version of **NeuroBlueprint** mainly aims to enforce a uniform and consistent [project folder structure](#project-folder-structure).
In addition, it also includes some non-mandatory conventions for [naming files](#file-naming-conventions) and storing [tabular metadata](#tabular-metadata).
In addition, it also includes some non-mandatory conventions for [naming files](#file-naming-conventions) and storing [metadata](metadata).

:::{note}
We mark requirements with italicised *keywords* that should be interpreted as described by the [Network Working Group](https://www.ietf.org/rfc/rfc2119.txt). In decreasing order of requirement, these are: *must* {octicon}`alert;1em;sd-text-danger`, *should* {octicon}`info;1em;sd-text-warning`, and *may* {octicon}`check-circle;1em;sd-text-success`.
Expand Down Expand Up @@ -258,23 +258,4 @@ Below we provide some example file names adhering to the **NeuroBlueprint** nami

## Metadata conventions

**NeuroBlueprint** imposes no absolute requirements on how to store metadata. That said, we do outline some best practices, in accordance with the [BIDS specification on tabular files](https://bids-specification.readthedocs.io/en/stable/02-common-principles.html#tabular-files).

### Tabular metadata
Tabular metadata, e.g. a table describing the animals in the project, *should* be saved as a tab-separated value file (TSV, ending with `.tsv`) , that is, a CSV file where commas are replaced by tabs. The tab character is a less ambiguous delimiter compared to commas, as it is less likely to appear in data. This makes TSV less prone to parsing errors.

If you are using TSV files, we recommend adhering to the following conventions:
* The first row of the file *should* contain descriptive column names, formatted as snake_case (e.g. `participant_id`, `species`, `date_of_birth`, `sex`, `group`). Avoid blank (that is, an empty string) or duplicate columns names.
* Missing and non-applicable values *should* be coded as `n/a`.
* Numerical values *should* employ the dot (.) as decimal separator and *may* be specified in scientific notation, using e or E to separate the significand from the exponent (e.g. `1.23e-4`)
* TSV files *should* be in UTF-8 encoding.


Here is an example table containing metadata for animal subjects:

```
subject_id species sex group
sub-01 mus musculus M control
sub-02 mus musculus F control
sub-03 mus musculus M treatment
```
See our [metadata](metadata.md) page for details on the metadata specification.