Skip to content

revise the helm chart and update the poerty install lists#24

Open
xianglic wants to merge 5 commits intomasterfrom
server_ver_update
Open

revise the helm chart and update the poerty install lists#24
xianglic wants to merge 5 commits intomasterfrom
server_ver_update

Conversation

@xianglic
Copy link
Copy Markdown
Collaborator

Revise the helm chart to make it compatible with sinphonia

Update the poetry toml for ultralytics from ^8.0.1 to ^8.0.145 to update the ultralytics package for new yolov5 model to use.

@teiszler
Copy link
Copy Markdown
Contributor

The changes to the Dockerfile and pyproject.toml look fine. @jaharkes and or @jblakley, Can you review the helm chart changes. It mostly seems like removal of the namespace tag from every chart.

@jblakley
Copy link
Copy Markdown
Contributor

jblakley commented Aug 15, 2023 via email

@jaharkes
Copy link
Copy Markdown
Member

jaharkes commented Aug 15, 2023 via email

@jblakley
Copy link
Copy Markdown
Contributor

jblakley commented Aug 15, 2023 via email

@xianglic
Copy link
Copy Markdown
Collaborator Author

xianglic commented Aug 15, 2023

Hi Jan and Jim,

Sorry for the confusion with hard-coded lines. There are the lines with vm039.elijah in the init container section of object engine deployment file. They are for downloading ultralytics model frames and models from the edge cloudlet(vm039.elijah) I am running.

For model frames, in this case master.zip file under prepare-model-2 section, I added that just for testing the compatibility of different versions of model frames with different models and ultralytics library verison. This whole section could be deleted entirely later on.

For models, in this case coco.pt under prepare-model-1 section, I think openscout object engine expects there are some models like coco.pt, robomaster.pt to be in the /openscout-server/models dir to start. Therefore, I have init container downloaded some model that I have exposed from the edge cloudlet. Users could just change the model url specified in the values.yml file if they need to choose other models.

Ant

@jaharkes
Copy link
Copy Markdown
Member

What do you mean by "helm can define it"? The pre-PR method is to specify namespace in values.yaml. How is it set post-PR?

helm install --namespace ... and you can even add a --create-namespace flag to have helm create the namespace if it does not yet exist.

@jaharkes
Copy link
Copy Markdown
Member

Hi Jan and Jim,

Sorry for the confusion with hard-coded lines. There are the lines with vm039.elijah in the init container section of object engine deployment file. They are for downloading ultralytics model frames and models from the edge cloudlet(vm039.elijah) I am running.

For model frames, in this case master.zip file under prepare-model-2 section, I added that just for testing the compatibility of different versions of model frames with different models and ultralytics library verison. This whole section could be deleted entirely later on.

For models, in this case coco.pt under prepare-model-1 section, I think openscout object engine expects there are some models like coco.pt, robomaster.pt to be in the /openscout-server/models dir to start. Therefore, I have init container downloaded some model that I have exposed from the edge cloudlet. Users could just change the model url specified in the values.yml file if they need to choose other models.

It may be useful to have a generic model like coco.pt as a default in the Docker image, basically anything we download anyway when the container is started, probably should be part of the container. If it is part of the image, it is effectively cached between invocations, but if we have to download anything on startup we always have to wait for that download to complete. If it is small download is cheap, but so is adding it to the container. And if it is large, having it pre-cached is better than having to re-download it every time.

@xianglic xianglic closed this Aug 15, 2023
@xianglic xianglic deleted the server_ver_update branch August 15, 2023 16:29
@xianglic xianglic restored the server_ver_update branch August 15, 2023 16:29
@xianglic
Copy link
Copy Markdown
Collaborator Author

Hi Jan and Jim,
Sorry for the confusion with hard-coded lines. There are the lines with vm039.elijah in the init container section of object engine deployment file. They are for downloading ultralytics model frames and models from the edge cloudlet(vm039.elijah) I am running.
For model frames, in this case master.zip file under prepare-model-2 section, I added that just for testing the compatibility of different versions of model frames with different models and ultralytics library verison. This whole section could be deleted entirely later on.
For models, in this case coco.pt under prepare-model-1 section, I think openscout object engine expects there are some models like coco.pt, robomaster.pt to be in the /openscout-server/models dir to start. Therefore, I have init container downloaded some model that I have exposed from the edge cloudlet. Users could just change the model url specified in the values.yml file if they need to choose other models.

It may be useful to have a generic model like coco.pt as a default in the Docker image, basically anything we download anyway when the container is started, probably should be part of the container. If it is part of the image, it is effectively cached between invocations, but if we have to download anything on startup we always have to wait for that download to complete. If it is small download is cheap, but so is adding it to the container. And if it is large, having it pre-cached is better than having to re-download it every time.

IC. I'll try to make some default model in the image, and maybe add the if condition saying that if there is no specific url is defined by user, the openscout will load that default model.

@xianglic xianglic reopened this Aug 15, 2023
@jblakley
Copy link
Copy Markdown
Contributor

jblakley commented Aug 15, 2023

I tested the basic object detection function of the PR on the ARM JIT Cloudlet. I did no ELK breakage test but those containers deployed OK. Object detection is working.

The deployment, however, is going into default namespace rather than the openscout namespace. I'm assuming that the intent of the change is to force use of the --namespace option on the helm command line. If so, I don't like that since it creates a more complex deployment process and breaks the k8s/helm declarative model -- i.e., the deployment should be defined by the yaml files as much as possible. Or, am I missing something and there is a declarative (non-command line) way of specifying namespace?

Regardless, the PR broke my deployment by putting it in the wrong namespace.

A few other things I noticed:

  • The GPU resource usage declaration was removed from serve/charts/openscout/templates/openscout-object-engine.yaml While it was commented out, I have used this as a deployment time option that gets uncommented on a GPU machine. Admittedly, that is the wrong way to do it but I've always intended to go back and put in a values.yaml option to select if and how many GPUs.
          #resources:
           #limits:
           #   nvidia.com/gpu: 1
  • The arm docker build is failing with error: Unable to find installation candidates for nvidia-cuda-nvrtc-cu11. I tried backing out the base image from nvidia/cuda:11.5.2-cudnn8-devel-ubuntu18.04 to nvidia/cuda:11.5.1-cudnn8-devel-ubuntu18.04 but that image seems to be gone now! This is also a problem for the current master branch.
  • The default values.yaml file is pointing to xianglic/openscout image. Why can't this be cmusatyalab/openscout since the arm images are there?
  • The comment on vm039 hardcoding has already been noted. As a longer term thing, it seems like the models and the packages needed for models (e.g., ultralytics) needs a better solution.
  • On the positive side, I like how you cleaned up the volume mounts and the entrypoint arguments.

@xianglic
Copy link
Copy Markdown
Collaborator Author

xianglic commented Aug 15, 2023

I tested the basic object detection function of the PR on the ARM JIT Cloudlet. I did no ELK breakage test but those containers deployed OK. Object detection is working.

The deployment, however, is going into default namespace rather than the openscout namespace. I'm assuming that the intent of the change is to force use of the --namespace option on the helm command line. If so, I don't like that since it creates a more complex deployment process and breaks the k8s/helm declarative model -- i.e., the deployment should be defined by the yaml files as much as possible. Or, am I missing something and there is a declarative (non-command line) way of specifying namespace?

Regardless, the PR broke my deployment by putting it in the wrong namespace.

A few other things I noticed:

  • The GPU resource usage declaration was removed from serve/charts/openscout/templates/openscout-object-engine.yaml While it was commented out, I have used this as a deployment time option that gets uncommented on a GPU machine. Admittedly, that is the wrong way to do it but I've always intended to go back and put in a values.yaml option to select if and how many GPUs.
          #resources:
           #limits:
           #   nvidia.com/gpu: 1
  • The arm docker build is failing with error: Unable to find installation candidates for nvidia-cuda-nvrtc-cu11. I tried backing out the base image from nvidia/cuda:11.5.2-cudnn8-devel-ubuntu18.04 to nvidia/cuda:11.5.1-cudnn8-devel-ubuntu18.04 but that image seems to be gone now! This is also a problem for the current master branch.
  • The default values.yaml file is pointing to xianglic/openscout image. Why can't this be cmusatyalab/openscout since the arm images are there?
  • The comment on vm039 hardcoding has already been noted. As a longer term thing, it seems like the models and the packages needed for models (e.g., ultralytics) needs a better solution.
  • On the positive side, I like how you cleaned up the volume mounts and the entrypoint arguments.

Hi Jim,

I'm not sure about the namespace either. I'll take a look on it, and keep u updated.

For the things that u have noted:

  • I'll put the GPU specification back on using if defined condition.

  • Yes the nvidia/cuda:11.5.1-cudnn8-devel-ubuntu18.04 is not existed in the docker hub anymore (but I believe Tom's has that version cached in his local docker hub). I don't know how exactly the arm one fails to build. I'll take a look into it. But just from my building experience with 11.5.2, could u try using sudo docker build?

  • I have rebuild the docker image for openscout just for updating the verison of ultralytics lib, as the older version is not compatible with the updated yolov5 model frame anymore. It is now currently in my repo just for testing purpose. The docker image will be pushed to the cmusatyalab/openscout if it is tested to be okay. And I'll change the pointing address from values.yaml.

  • I'll revise the hard-coded lines following the advise from Jan.

Ant

@jblakley
Copy link
Copy Markdown
Contributor

Anthony, that sounds good. I will look at creating a helmfile to allow namespace declaration. Once @jaharkes has rebuilt the ARM container, I can test out in the JIT Cloudlet.

Comment thread git_diff.txt Outdated
Comment thread server/charts/openscout/index.yaml Outdated
Comment thread server/charts/openscout/openscout-1.0.0.tgz Outdated
name: openscout
repository: cmusatyalab/openscout
tag: latest
repository: xianglic/openscout
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not refer to a non-cmusatyalab docker registry here

Comment thread server/charts/openscout/values.yaml Outdated
Comment thread server/charts/openscout/values.yaml Outdated
Comment thread server/charts/openscout/templates/openscout-object-engine.yaml Outdated
@jblakley
Copy link
Copy Markdown
Contributor

I looked at helmfile this morning and it looks like it can easily serve as the mechanism for customizing the namespace outside of the charts themselves. For example, this helmfile.yaml worked for me to deploy openscout and openrtist in their own namespaces:

values:
- openscout_namespace: openscout1
- openrtist_namespace: openrtist1
releases:
- name: openscout
  namespace: {{ .Values.openscout_namespace }}
  chart: ./openscout
  installed: true
  values:
  - openscout/values_arm64.yaml
- name: openrtist
  namespace: {{ .Values.openrtist_namespace }}
  chart: ./openrtist
  installed: true
  values:
  - openrtist/values.yaml

To deploy this, it becomes simply:

helmfile sync

To change the namespace on the command line:

helmfile sync --state-values-set openscout_namespace=my_openscout_namespace

With this capability, I'm comfortable removing the namespace definition from all of the charts. Up to you whether you want to include a default helmfile.yaml in the repository.

Anthony Chen added 3 commits August 16, 2023 17:06
…te the prepare model2 init container; changed the prepare model 1 init container based on the rebuilt docker image; reposition the model url, name specification in the values.yml file
…models dir with models (coco.pt; robomaster.pt, and yolov5m) in the docker image; commented out the volume mount of ./models in the docker-compose file since we have now have one in the image itself
@xianglic
Copy link
Copy Markdown
Collaborator Author

xianglic commented Aug 16, 2023

Hi Jan, Jim, Tom,

I have revised several things as follow:

For the docker aspect:

  1. Rebuild the docker image by bring the models dir in the docker container running time under openscout-server dir (noted: currently there are three models in the ./models dir, they are coco.pt, robomaster.pt and yolov5m. I'll trim them down to one default based on what u want to keep in the future)
  2. delete the ./models in dockerignore file
  3. comment out the volume mount for the ./models for object engine service

For the helm chart aspect:

  1. Bring back the GPU specification
  2. delete the prepare model 2 init container
  3. delete the modeldir and cachedir volume mount in the object engine deployment file as we already have ./models in the rebuilt image and we don't need cachedir for prepare model 2 init container anymore
  4. change the nodeport back for httpd service in the values.yml (I have overrided the cluster ip for this on recipes for sinphonia usage)
  5. commented out the url under model, and I have re positioned the model section under images.objectengine as I presume it would be more logically related.
  6. deleted tgz and index file

For other aspect:

  1. deleted the git diff file

@jblakley
Copy link
Copy Markdown
Contributor

My main remaining issue is being able to build a working ARM container image. I pushed the old base image to my dockerhub account so we don't lose the ability to recreate the current version. But, need to have a working updated version.

I don't have strong input on the implementation of the model folders. But, here are some thoughts on design requirements:

  1. Should be a default "hello world" model bundled in the image. (e.g., coco.pt)
  2. Should be a mounted folder where additional models can be provisioned from the host at deployment/runtime.
  3. Fetching models at runtime from torch.hub or other central/public repositories should be deprecated. Specific required models should be loaded into the mounted folder by the user.
  4. Documentation to access commonly used models (e.g., yolov5, yolov8) should be included in the README
  5. I'm not sure the best way to handle packages required to access specific models (e.g., ultralytics). OOH, pre-installing all of them seems difficult and unbounded. OTOH, runtime installation into the container seems bad.

These are certainly debatable requirements. They assume a paradigm of an evolving and growing number of models as the cognitive engines evolve. Another paradigm assumes that we lock things down to a specific set of models and packages and leave extensions to a developer. Given the challenge of #5, that may be the only option anyway. This seems somewhat tied to the openscout approach of modular cognitive engines. If I'm a developer adding a new engine, what's the process?

@jaharkes
Copy link
Copy Markdown
Member

My main remaining issue is being able to build a working ARM container image. .... But, need to have a working updated version.

I have a somewhat ugly but working fix for that, when i tried to do it the official way by adding the download.pytorch.org repositories it was still unable to find the right aarch64 torch wheels, so I just hardcoded the download urls for different python releases.

poetry lock is taking about 10-20 minutes to pin down the right package versions though, so right now I'm trying to see what is causing that and will probably tighten up some of the versions (or add some additional restrictions) it seems like it is cycling in the dependency resolver on botocore and roboflow which are indirect dependencies.

I don't have strong input on the implementation of the model folders. But, here are some thoughts on design requirements:

Complete agreement on the models folder, just include one example model and hopefully a relatively small one at that, but generic enough that people can test their openscout client/backend with common objects (person/keyboard/tv) which I guess would be the coco.pt model.

3. Fetching models at runtime from torch.hub or other central/public repositories should be deprecated. Specific required models should be loaded into the mounted folder by the user.

Actually there are 3 possible methods, each with their own advantages and disadvantages.

  • include model in Docker image. People can create their own container with a tiny dockerfile that is just 'FROM cmusatyalab/openscout COPY mymodel.pt /openscout/models', publish that container image and then deploy with right helm values override to install their own container.
    Advantage: Automatically benefits from docker caching, it just adds the minimal possible layer and will persist on the host node between executions.
    Disadvantage: Need to build/publish customized container.
  • Specify one or more urls to fetch from with helm values override.
    Advantage: Very easy to add custom models.
    Disadvantage: Models needs to be refetched every single time the container is restarted/redeployed. Models need to be (publically?) available for download.
  • Expose local directory as a mounted volume.
    Advantage: Easy if you already have models on your host, avoids duplications.
    Disadvantage: Will never work in an actual cloudlet environment because you simply don't get to choose which cloudlet to deploy to and even then you don't actually get access to the cloudlet's host environment to copy the models over.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants