Name	Name	Last commit message	Last commit date
parent directory ..
driver	driver
knative_yamls/s3	knative_yamls/s3
local	local
metatrainer	metatrainer
proto	proto
reducer	reducer
trainer	trainer
Dockerfile	Dockerfile
Makefile	Makefile
README.md	README.md
diagram.png	diagram.png
docker-compose-s3-tracing.yml	docker-compose-s3-tracing.yml
docker-compose-s3.yml	docker-compose-s3.yml
requirements.txt	requirements.txt

Name

Last commit message

Last commit date

docker-compose-s3-tracing.yml

docker-compose-s3.yml

requirements.txt

Stacking Training

This is an ensemble training benchmark consisting of four functions:

The Driver orchestrates the entire flow. It starts by uploading the dataset for the trainers and the metatrainer, collects the final models.
a set of Trainers that fit a model each (tested with 4 and 16 trainers, sequentially and in parallel)
The Reducer collects the models and predictions from each trainer.
The Metatrainer trains together with the trained models' layer, finalizing the 2-layer model.

The driver is the interface function and is invoked with a helloworld grpc call as standard. This benchmark is unique in that it relies on S3 transfer for saving and loading models, so inline transfer will not work.

Running this Benchmark

Make sure to set the BUCKET_NAME, AWS_ACCESS_KEY, and AWS_SECRET_KEY environment variables. The kn_deploy script will then substitute these values into the knative manifests. Example:
```
export AWS_ACCESS_KEY=ABCDEFGHIJKLMNOPQRST
export AWS_SECRET_KEY=ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMN
```
Deploy the necessary functions using the kn_deploy script.
```
../../tools/kn_deploy.sh ./knative_yamls/s3/*
```
Only one set of manifests is provided by default for this benchmark. All 4 of the manifests in the knative_yamls/s3 folder must be deployed. These default manifests deploy functions with the s3 transfer type enabled, and with tracing turned off.
Invoke the benchmark. The interface function of this benchmark is named driver. It can be invoked using the invoker or our test client, as described in the running benchmarks document.

Instances

Number of instances per function in a stable flow:

Function	Instances	Is Configurable
Driver	1	No
Trainer	4	Yes - Set in trainer knative manifest and must equal `TrainersNum` driver env var
Reducer	1	No
Metatrainer	1	No

Parameters

Flags

tAddr - The address of the Trainer
rAddr - The address of the Reducer
mAddr - The address of the Metatrainer
trainersNum - The number of training models
sp - The port to which the driver will listen (which is used for invokation)
zipkin - Address of the zipkin span collector

Environment Variables

TRANSFER_TYPE - The transfer type to use. Can be INLINE (default), S3, or XDT. Not all benchmarks support all transfer types.
AWS_ACCESS_KEY, AWS_SECRET_KEY, AWS_REGION - Standard s3 keys, only needed if the s3 transfer type is used
BUCKET_NAME - Set custom s3 bucket name, only needed if the s3 transfer type is used, default bucket name is set as 'vhive-stacking'
ENABLE_TRACING - Toggles tracing.
TrainersNum - The number of trainers to be used.
CONCURRENT_TRAINING - Toggles concurrent training. When disabled, training is carried out for one model at a time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Stacking Training

Running this Benchmark

Instances

Parameters

Flags

Environment Variables

FilesExpand file tree

stacking-training

Directory actions

More options

Directory actions

More options

Latest commit

History

stacking-training

Folders and files

parent directory

README.md

Stacking Training

Running this Benchmark

Instances

Parameters

Flags

Environment Variables