Skip to content

Latest commit

 

History

History
138 lines (101 loc) · 4.77 KB

File metadata and controls

138 lines (101 loc) · 4.77 KB

oss-llm-tools

OSS LLM Tools for Conversion, Evaluation, Numerical Debugging and Benchmarking

lm-eval bisect tools

Find first bad commit that dropped the accuracy of a model.

cd <target repo>
python ../oss-llm-tools/bisect_accuracy.py --good <Good Commit> --bad <Bad Commit> --model google/gemma-3-12b-it --task gsm8k --target 0.4 --limit 100 --bisect_log_file --model_args '{"tensor_parallel_size": 4}'  --eval_args '{"num_fewshot":5}' --stop_with_exception --bisect_log /tmp/bisect.log

create_dummy_model

Create a dummy transformer model with optional custom weights. This script combines model initialization from a config file or Hugging Face model with the ability to add or update specific tensors. This is useful for:

  • Testing code that requires a model with a specific architecture without needing the actual trained weights
  • Creating smaller test models by overriding parameters like number of layers
  • Generating placeholder models with specific tensor shapes for development and testing
  • Creating sharded safetensors files for large model testing

Usage

# Basic usage with local model directory path
python create_dummy_model.py --model_path /path/to/model/dir --output_dir /path/to/output

# Using a Hugging Face model ID directly (downloads necessary files automatically)
python create_dummy_model.py --model_path meta-llama/Llama-3-8B --output_dir ./llama3_dummy

# Create a smaller model with only 3 hidden layers
python create_dummy_model.py --model_path /path/to/model/dir --output_dir /path/to/output \
  --config_override '{"num_hidden_layers": 3}'

# Create a model with custom weights from a JSON file
python create_dummy_model.py --model_path /path/to/model/dir --output_dir /path/to/output \
  --weights_json example_weights.json

# Create a sharded model with custom weights
python create_dummy_model.py --model_path /path/to/model/dir --output_dir /path/to/output \
  --weights_json example_weights.json --max_shard_size "500MB"

Requirements

  • Python 3.6+
  • PyTorch
  • Hugging Face Transformers
  • huggingface_hub
  • safetensors

Parameters

  • --model_path: Path to a model directory or Hugging Face model ID (e.g., 'meta-llama/Llama-3-8B')
  • --output_dir: Directory to save the model
  • --config_override: (Optional) JSON string with config parameters to override
  • --weights_json: (Optional) JSON file containing weights info (name, shape, dtype)
  • --max_shard_size: (Optional) Maximum size of each shard (e.g., '2GB', '500MB')

Weights Format

When using --weights_json, the weights can be specified in several formats:

  1. Full format with shape and dtype specified:
"model.embed_tokens.weight": {
  "shape": [151552, 5120],
  "dtype": "float16"
}
  1. Simple format with just the shape as a list:
"model.layers.0.input_layernorm.weight": [5120]
  1. String format for shapes using 'x' as separator:
"model.layers.0.self_attn.q_proj.weight": "12288x5120"

create_safetensors

Create safetensors files with specified tensor names and shapes. This is useful for:

  • Creating dummy models with specific tensor shapes and dtypes
  • Testing model loading and processing code without real weights
  • Generating sharded model files for large model testing
  • Creating placeholder weights for development and testing

Usage

# Basic usage with weights specified as a JSON string
python create_safetensors.py --weights_dict '{"model.layers.0.self_attn.q_proj.weight": [1024, 1024], "model.layers.0.self_attn.k_proj.weight": [1024, 1024]}'

# Using a JSON file containing weights information
python create_safetensors.py --weights_json example_weights.json --output_dir ./dummy_weights

# Creating sharded safetensors files for large models
python create_safetensors.py --weights_json example_weights.json --output_dir ./sharded_model --max_shard_size "2GB"

Requirements

  • Python 3.6+
  • PyTorch
  • safetensors

Parameters

  • --output_dir: Directory to save the safetensors file(s)
  • --weights_json: JSON file containing weights info (name, shape, dtype)
  • --weights_dict: JSON string with weights dictionary (name: shape)
  • --max_shard_size: Maximum size of each shard (e.g., '2GB', '500MB')

Weights Format

The weights can be specified in several formats as shown in the example_weights.json file:

  1. Full format with shape and dtype specified:
"model.embed_tokens.weight": {
  "shape": [151552, 5120],
  "dtype": "float16"
}
  1. Simple format with just the shape as a list:
"model.layers.0.input_layernorm.weight": [5120]
  1. String format for shapes using 'x' as separator:
"model.layers.0.self_attn.q_proj.weight": "12288x5120"

The repository includes an example weights JSON file (example_weights.json) that demonstrates all supported formats for specifying tensor shapes and dtypes.