Skip to content

[#73373] Background job for migrating pre-existing data to semantic identifiers#22566

Open
thykel wants to merge 135 commits intodevfrom
poc/71645-convert-to-wp-semantic-ids
Open

[#73373] Background job for migrating pre-existing data to semantic identifiers#22566
thykel wants to merge 135 commits intodevfrom
poc/71645-convert-to-wp-semantic-ids

Conversation

@thykel
Copy link
Copy Markdown
Contributor

@thykel thykel commented Mar 29, 2026

Ticket

https://community.openproject.org/projects/communicator-stream/work_packages/71645/activity

What are you trying to accomplish?

Add a background procedure to convert project & work package identifiers from classic (project cool_project, WP 1) to semantic (project CP, WP CP-1).

The procedure is invoked by switching to the semantic ("alphanumeric") identifier format via the admin settings.

Screenshots

What approach did you choose and why?

The original noop background job is now replaced by ProjectIdentifiers::ConvertInstanceToSemanticIdsJob. This is the flow:

  1. Click the admin button to switch identifier mode to semantic.
  2. Invoke ProjectIdentifiers::ConvertInstanceToSemanticIdsJob
  3. If the database does not require any identifier transformation (e.g. proj => PROJ), simply set the the identifier mode to semantic and exit with success. Otherwise, continue further.
  4. Spawn a batch of per-project jobs that fix the project identifier & populate work package identifiers (ProjectIdentifiers::BackfillProjectJob)
  5. Once the batch has finished processing, GOTO 1. If this has already happened 10 times, abort the job as we're likely in an infinite loop.

Additionally, tweak the interface of the admin setting controller to have some proper shape -- previously it required presence of settings hash + a separate confirm_dangerous_action param, now it only needs a properly set settings hash.

Merge checklist

  • Added/updated tests
  • Added/updated documentation in Lookbook (patterns, previews, etc)
  • Tested major browsers (Chrome, Firefox, Edge, ...)

Copy link
Copy Markdown
Contributor

@judithroth judithroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is coming along nicely! I'm sorry I found another edge case that keeps me from approving 😅

# or by GoodJob as an on_success batch callback with (batch, params).
def perform(_batch = nil, params = nil)
iteration = params.to_h.with_indifferent_access.fetch(:iteration, 0).to_i
remaining = project_ids_needing_backfill
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I needed a moment to understand what's going on here with the initial dispatch or callback 😅
You coupled these to catch all work packages and projects that were created while the job was running. That's hard to understand for someone not so deep into the topic and I am not sure how well this will work if we really have to iterate over all projects / work packages to catch the moves in the switch back-and-forth scenario.
I would prefer to handle those separately - in case of "the work package / project was created while the conversion migration was already running" we really can check for empty identifier values, in case of "initial dispatch" we can not. An extra "check and cleanup" method after the setting was flipped should be fast enough and easy to undertstand, no?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough -- I will run a proper batched parallel processing for the first pass, and then just invoke a quick synchronous scanner in a dedicated final batch job.

@thykel thykel requested a review from akabiru April 9, 2026 09:28
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 6 comments.

Comments suppressed due to low confidence (1)

app/models/work_package/semantic_identifier.rb:103

  • The removal of private makes alias_rows_for_sequence_number a public instance method on WorkPackage, which looks like an internal helper used only by allocate_and_register_semantic_id. Consider restoring private (or marking just this method private) to avoid accidentally expanding the public API surface.

  # Builds alias rows for every identifier this project has ever used at the given sequence (including the current one).
  # This also includes "ghost identifiers" -- i.e. those that weren't ever actually generated, but should work
  # as a historical alias (e.g. OLDPROJ-42 should work even if WP #42 was created after rename to NEWPROJ)
  def alias_rows_for_sequence_number(seq)
    project.slugs
           .pluck(:slug)
           .map { |prefix| { identifier: "#{prefix}-#{seq}", work_package_id: id } }
  end

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants