Name	Name	Last commit message	Last commit date
parent directory ..
lib	lib
python	python
scripts	scripts
test	test
README.md	README.md
external-tools.json	external-tools.json
package.json	package.json
vitest.config.mts	vitest.config.mts

Name

Last commit message

Last commit date

codet5-models-builder

Downloads the CodeT5 model from HuggingFace, converts it to ONNX format, and quantizes it so it can run efficiently via ONNX Runtime inside a Node.js process. CodeT5 produces code-aware embeddings used by Socket for similarity search and classification tasks.

The output gets consumed by the models package, which bundles this alongside MiniLM.

Build

pnpm --filter codet5-models-builder run build        # dev build (INT8 quantization)
pnpm --filter codet5-models-builder run build --int4 # prod build (INT4, smaller)

First run downloads ~900MB from HuggingFace and converts to ONNX; subsequent runs hit the checkpoint cache.

Prereqs: Python 3.11+ and the pinned transformers/torch/onnx pip packages. The preflight auto-creates a venv at ~/.socket-btm-venv and installs the pinned versions from external-tools.json — no manual pip install needed.

Output: build/<mode>/<platform-arch>/<int4|int8>/output/ containing encoder.onnx, decoder.onnx, and tokenizer.json (CodeT5 is a seq2seq model, so the encoder and decoder ship as separate ONNX graphs).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

codet5-models-builder

Build

FilesExpand file tree

codet5-models-builder

Directory actions

More options

Directory actions

More options

Latest commit

History

codet5-models-builder

Folders and files

parent directory

README.md

codet5-models-builder

Build