Skip to content

Reexamine Nexus image pull process #4917

@rudolphjacksonm

Description

@rudolphjacksonm

Is your feature request related to a problem? Please describe.
Pipeline runs fail with a 429: TOOMANYREQUESTS error whenever we hit rate limits on Docker Hub. I've seen this happen several times over the past week, blocking pipeline runs for 6 hours. This blocks all development and CICD pipeline runs until that limit expires. Here's an example.

We need to rethink how often we pull images from Docker Hub into ACR and whether we do a preflight check to ensure the image is already in ACR (and corresponding SHA matches). At present it's only the Nexus image that I can see but I would expect a good solution to provide a standard procedure for any public image we pull.

At present the flow is:

  1. GitHub Actions publishes shared bundles. The Nexus bundle is included in .github/workflows/deploy_tre_reusable.yml at templates/shared_services/sonatype-nexus-vm/, and the job runs: make bundle-build bundle-publish DIR=${{ matrix.BUNDLE_DIR }}
  2. make bundle-build calls devops/scripts/bundle_runtime_image_build.sh:8.
  3. The Nexus Porter file declares an imported runtime image:
templates/shared_services/sonatype-nexus-vm/porter.yaml:9
     custom:
       runtime_image:
         name: sonatype/nexus3
         import:
           source: docker.io/sonatype/nexus3
           tag: "3.77.2"
  1. Because that import block exists, bundle_runtime_image_build.sh runs:
devops/scripts/bundle_runtime_image_build.sh:14
az acr import --name "${ACR_NAME}" \                                                                                                                
       --source "${source_image}:${version}" \                                                                                                           
       --image "${image_name}:${version}" \                                                                                                              
       --force                                                                                                                                           

At deploy time, Terraform creates an Ubuntu VM. The VM uses a normal Canonical marketplace image in templates/shared_services/sonatype-nexus-vm/terraform/vm.tf:123. Cloud-init then runs /shared_services/sonatype-nexus-vm/scripts/deploy_nexus_container.sh:7, which pulls: ${ACR_NAME}.azurecr.io/sonatype/nexus3:${NEXUS_IMAGE_TAG}.

Describe the solution you'd like
For our pipelines, implement a form of a cache hit mechanism where we check ACR prior to pulling images from Docker Hub which already exist in ACR. This would reduce the number of pulled images to those which are needed at build time.

At present, if the container image import worked, the VM pulls from ACR. It only falls back to Docker Hub if ACR_NAME is empty. Before az acr import is run in devops/scripts/bundle_runtime_image_build.sh:14, we should add a check to see whether the relevant Sonatype Nexus image exists (e.g. sonatype/nexus3:3.77.2) in the target ACR, then skip the import if the image is present:

if az acr repository show-tags \                                                                                                                       
  --name "${ACR_NAME}" \                                                                                                                               
  --repository "${image_name}" \                                                                                                                       
  --query "[?@=='${version}'] | [0]" \                                                                                                                 
  --output tsv | grep -qx "${version}"; then                                                                                                           
  echo "Image ${image_name}:${version} already exists in ACR ${ACR_NAME}; skipping import"                                                             
  exit 0                                                                                                                                               
fi                                                                                                                                                     

That would avoid hitting Docker Hub on every publish when the image is already cached in ACR.

Another thing to consider is in templates/shared_services/sonatype-nexus-vm/scripts/deploy_nexus_container.sh:48: after ACR login but before docker pull, run docker manifest inspect "$NEXUS_IMAGE" and fail with a clear “missing from ACR” message.

Describe alternatives you've considered
I haven't considered any alternatives other than the above just yet, happy to be challenged.

Additional context
Add any other context or screenshots about the feature request here.

Metadata

Metadata

Labels

No labels
No labels
No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions