Skip to content

Enforce data structure and data type consistency for JSON metadata#1421

Open
svogt0511 wants to merge 120 commits intomasterfrom
pb325-json-metadata-validation
Open

Enforce data structure and data type consistency for JSON metadata#1421
svogt0511 wants to merge 120 commits intomasterfrom
pb325-json-metadata-validation

Conversation

@svogt0511
Copy link
Copy Markdown
Contributor

@svogt0511 svogt0511 commented Oct 30, 2025

Purpose

closes: https://github.com/datacite/product-backlog/issues/325

Approach

See #1341 for the approach

Open Questions and Pre-Merge TODOs

Learning

Types of changes

  • Bug fix (non-breaking change which fixes an issue)

  • New feature (non-breaking change which adds functionality)

  • Breaking change (fix or feature that would cause existing functionality to change)

Reviewer, please remember our guidelines:

  • Be humble in the language and feedback you give, ask don't tell.
  • Consider using positive language as opposed to neutral when offering feedback. This is to avoid the negative bias that can occur with neutral language appearing negative.
  • Offer suggestions on how to improve code e.g. simplification or expanding clarity.
  • Ensure you give reasons for the changes you are proposing.

Summary by CodeRabbit

  • New Features

    • Expanded JSON Schema coverage for DOI metadata (updated draft) with richer controlled vocabularies and new schema validations.
  • Refactor

    • Validation now resolves schemas dynamically and centralizes JSON-schema checks for many metadata fields.
  • Chore

    • Upgraded a JSON validation dependency to a newer version series.
  • Tests

    • Fixtures and specs adjusted to array-based metadata shapes and stricter validation/error expectations.

@svogt0511 svogt0511 self-assigned this Oct 30, 2025
…led vocabulary term because checking happens prior to save and the API adds null as the value.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/models/schemas/doi/title.json`:
- Around line 6-8: The "title" schema currently only enforces type:string so
empty or whitespace-only titles are allowed; update the "title" property (in
app/models/schemas/doi/title.json) to require non-empty, non-blank values by
adding constraints such as "minLength": 1 and a "pattern" that requires at least
one non-whitespace character (e.g., \\\\S), and ensure the existing required
declaration that references "title" still applies; apply the same constraints
for any other title occurrence mentioned in the schema.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1c82f3ce-0cef-4020-b4ee-a276dd35c1ef

📥 Commits

Reviewing files that changed from the base of the PR and between 30cf06e and 13e2288.

📒 Files selected for processing (12)
  • app/models/schemas/doi/controlled_vocabularies/contributor_type.json
  • app/models/schemas/doi/controlled_vocabularies/date_type.json
  • app/models/schemas/doi/controlled_vocabularies/description_type.json
  • app/models/schemas/doi/controlled_vocabularies/funder_identifier_type.json
  • app/models/schemas/doi/controlled_vocabularies/name_type.json
  • app/models/schemas/doi/controlled_vocabularies/number_type.json
  • app/models/schemas/doi/controlled_vocabularies/related_identifier_type.json
  • app/models/schemas/doi/controlled_vocabularies/related_item_type.json
  • app/models/schemas/doi/controlled_vocabularies/relation_type.json
  • app/models/schemas/doi/controlled_vocabularies/resource_type_general.json
  • app/models/schemas/doi/controlled_vocabularies/title_type.json
  • app/models/schemas/doi/title.json
🚧 Files skipped from review as they are similar to previous changes (8)
  • app/models/schemas/doi/controlled_vocabularies/funder_identifier_type.json
  • app/models/schemas/doi/controlled_vocabularies/resource_type_general.json
  • app/models/schemas/doi/controlled_vocabularies/title_type.json
  • app/models/schemas/doi/controlled_vocabularies/number_type.json
  • app/models/schemas/doi/controlled_vocabularies/date_type.json
  • app/models/schemas/doi/controlled_vocabularies/related_identifier_type.json
  • app/models/schemas/doi/controlled_vocabularies/relation_type.json
  • app/models/schemas/doi/controlled_vocabularies/contributor_type.json

Comment on lines +6 to +8
"title": {
"type": "string"
},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Disallow empty/blank required titles.

Line 6Line 8 only enforce type: "string", so "" (or whitespace-only values) will still pass even though Line 17Line 19 marks title as required. Please add content constraints.

Suggested schema patch
     "title": { 
-      "type": "string"
+      "type": "string",
+      "minLength": 1,
+      "pattern": ".*\\S.*"
     },

Also applies to: 17-19

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/models/schemas/doi/title.json` around lines 6 - 8, The "title" schema
currently only enforces type:string so empty or whitespace-only titles are
allowed; update the "title" property (in app/models/schemas/doi/title.json) to
require non-empty, non-blank values by adding constraints such as "minLength": 1
and a "pattern" that requires at least one non-whitespace character (e.g.,
\\\\S), and ensure the existing required declaration that references "title"
still applies; apply the same constraints for any other title occurrence
mentioned in the schema.

svogt0511 added 27 commits March 5, 2026 13:12
…'uri' for uris, use long form of dependentRequired (if)
…, xml xs:anyURI, dependencyRequired (documentation 'suggestions' vs xsd.
…, xml xs:anyURI, dependencyRequired (documentation 'suggestions' vs xsd. Also, just use default minitems in the array of objects instead of explicitly specifying minitems: 0
…1 date ranges, and the standard vocab for unknown information do validate date fields (as is in our documentation).
…ts.json. (Breaks spec/requests/repositories_spec.rb:458 otherwise).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants