Is your feature request related to a problem? Please describe.
Pipeline runs fail with a 429: TOOMANYREQUESTS error whenever we hit rate limits on Docker Hub. I've seen this happen several times over the past week, blocking pipeline runs for 6 hours. This blocks all development and CICD pipeline runs until that limit expires. Here's an example.
We need to rethink how often we pull images from Docker Hub into ACR and whether we do a preflight check to ensure the image is already in ACR (and corresponding SHA matches). At present it's only the Nexus image that I can see but I would expect a good solution to provide a standard procedure for any public image we pull.
At present the flow is:
- GitHub Actions publishes shared bundles. The Nexus bundle is included in
.github/workflows/deploy_tre_reusable.yml at templates/shared_services/sonatype-nexus-vm/, and the job runs: make bundle-build bundle-publish DIR=${{ matrix.BUNDLE_DIR }}
make bundle-build calls devops/scripts/bundle_runtime_image_build.sh:8.
- The Nexus Porter file declares an imported runtime image:
templates/shared_services/sonatype-nexus-vm/porter.yaml:9
custom:
runtime_image:
name: sonatype/nexus3
import:
source: docker.io/sonatype/nexus3
tag: "3.77.2"
- Because that import block exists,
bundle_runtime_image_build.sh runs:
devops/scripts/bundle_runtime_image_build.sh:14
az acr import --name "${ACR_NAME}" \
--source "${source_image}:${version}" \
--image "${image_name}:${version}" \
--force
At deploy time, Terraform creates an Ubuntu VM. The VM uses a normal Canonical marketplace image in templates/shared_services/sonatype-nexus-vm/terraform/vm.tf:123. Cloud-init then runs /shared_services/sonatype-nexus-vm/scripts/deploy_nexus_container.sh:7, which pulls: ${ACR_NAME}.azurecr.io/sonatype/nexus3:${NEXUS_IMAGE_TAG}.
Describe the solution you'd like
For our pipelines, implement a form of a cache hit mechanism where we check ACR prior to pulling images from Docker Hub which already exist in ACR. This would reduce the number of pulled images to those which are needed at build time.
At present, if the container image import worked, the VM pulls from ACR. It only falls back to Docker Hub if ACR_NAME is empty. Before az acr import is run in devops/scripts/bundle_runtime_image_build.sh:14, we should add a check to see whether the relevant Sonatype Nexus image exists (e.g. sonatype/nexus3:3.77.2) in the target ACR, then skip the import if the image is present:
if az acr repository show-tags \
--name "${ACR_NAME}" \
--repository "${image_name}" \
--query "[?@=='${version}'] | [0]" \
--output tsv | grep -qx "${version}"; then
echo "Image ${image_name}:${version} already exists in ACR ${ACR_NAME}; skipping import"
exit 0
fi
That would avoid hitting Docker Hub on every publish when the image is already cached in ACR.
Another thing to consider is in templates/shared_services/sonatype-nexus-vm/scripts/deploy_nexus_container.sh:48: after ACR login but before docker pull, run docker manifest inspect "$NEXUS_IMAGE" and fail with a clear “missing from ACR” message.
Describe alternatives you've considered
I haven't considered any alternatives other than the above just yet, happy to be challenged.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
Pipeline runs fail with a
429: TOOMANYREQUESTSerror whenever we hit rate limits on Docker Hub. I've seen this happen several times over the past week, blocking pipeline runs for 6 hours. This blocks all development and CICD pipeline runs until that limit expires. Here's an example.We need to rethink how often we pull images from Docker Hub into ACR and whether we do a preflight check to ensure the image is already in ACR (and corresponding SHA matches). At present it's only the Nexus image that I can see but I would expect a good solution to provide a standard procedure for any public image we pull.
At present the flow is:
.github/workflows/deploy_tre_reusable.yml at templates/shared_services/sonatype-nexus-vm/, and the job runs:make bundle-build bundle-publish DIR=${{ matrix.BUNDLE_DIR }}make bundle-buildcallsdevops/scripts/bundle_runtime_image_build.sh:8.bundle_runtime_image_build.shruns:At deploy time, Terraform creates an Ubuntu VM. The VM uses a normal Canonical marketplace image in
templates/shared_services/sonatype-nexus-vm/terraform/vm.tf:123. Cloud-init then runs/shared_services/sonatype-nexus-vm/scripts/deploy_nexus_container.sh:7, which pulls:${ACR_NAME}.azurecr.io/sonatype/nexus3:${NEXUS_IMAGE_TAG}.Describe the solution you'd like
For our pipelines, implement a form of a cache hit mechanism where we check ACR prior to pulling images from Docker Hub which already exist in ACR. This would reduce the number of pulled images to those which are needed at build time.
At present, if the container image import worked, the VM pulls from ACR. It only falls back to Docker Hub if ACR_NAME is empty. Before
az acr importis run indevops/scripts/bundle_runtime_image_build.sh:14, we should add a check to see whether the relevant Sonatype Nexus image exists (e.g.sonatype/nexus3:3.77.2) in the target ACR, then skip the import if the image is present:That would avoid hitting Docker Hub on every publish when the image is already cached in ACR.
Another thing to consider is in
templates/shared_services/sonatype-nexus-vm/scripts/deploy_nexus_container.sh:48: after ACR login but beforedocker pull, rundocker manifest inspect "$NEXUS_IMAGE"and fail with a clear “missing from ACR” message.Describe alternatives you've considered
I haven't considered any alternatives other than the above just yet, happy to be challenged.
Additional context
Add any other context or screenshots about the feature request here.