容器昇腾NPU跑通llama2-7B

作者：@Yikun

### 0. 前置条件

根据 https://github.com/cosdt/cosdt.github.io/issues/7 完成pytorch环境搭建



```shell
(.llm-venv) # npu-smi info

(.llm-venv) # python3 -c "import torch;import torch_npu; a = torch.randn(3, 4).npu(); print(a + a);"
Warning: Device do not support double dtype now, dtype cast repalce with float.
tensor([[ 1.2800,  1.3105,  0.4513, -1.1650],
        [ 3.5199, -0.2590,  2.6664, -1.9602],
        [ 2.3262, -2.4671,  2.3252, -2.1502]], device='npu:0')
```

### 1. 安装Transformer
```
python3 -m pip install --upgrade pip
pip install transformers accelerate xformers
# Need "sentencepiece" and "protobuf==3.20.0" when convert_llama_weights_to_hf
pip install sentencepiece protobuf==3.20.0
```

### 2. 准备llama模型
准备模型：

```
# tree llama/llama-2-7b/
llama/llama-2-7b/
├── checklist.chk
├── consolidated.00.pth
└── params.json

cd llama/llama-2-7b
mkdir 7B
mv *.* 7B
cp ../tokenizer.model .

# tree -h llama/llama-2-7b/
llama/llama-2-7b/
|-- [4.0K]  7B
|   |-- [ 100]  checklist.chk
|   |-- [ 13G]  consolidated.00.pth
|   `-- [ 102]  params.json
`-- [488K]  tokenizer.model
```

转换模型：
```
# find / -name convert_llama_weights_to_hf.py
/root/.llm-venv/lib/python3.8/site-packages/transformers/models/llama/convert_llama_weights_to_hf.py
```

```
python  /root/.llm-venv/lib/python3.8/site-packages/transformers/models/llama/convert_llama_weights_to_hf.py --input_dir llama/llama-2-7b --model_size 7B --output_dir transformer/llama-2-7b
```

生成的模型结构如下：
```
# tree -h transformer/llama-2-7b/
transformer/llama-2-7b/
|-- [ 578]  config.json
|-- [ 132]  generation_config.json
|-- [9.3G]  pytorch_model-00001-of-00002.bin
|-- [3.3G]  pytorch_model-00002-of-00002.bin
|-- [ 26K]  pytorch_model.bin.index.json
|-- [ 411]  special_tokens_map.json
|-- [1.8M]  tokenizer.json
|-- [488K]  tokenizer.model
`-- [ 745]  tokenizer_config.json
```

### 3. 运行模型
```
from transformers import AutoTokenizer, LlamaForCausalLM
import torch
import torch_npu

# Avoid ReduceProd operator core dump, see more in: https://github.com/cosdt/llm/issues/4
option={}
option["NPU_FUZZY_COMPILE_BLACKLIST"]="ReduceProd"
torch.npu.set_option(option)

npu_id = 0
torch.npu.set_device(0)

device = "npu:{}".format(npu_id)
model_path = "/opt/yikun/transformer/llama-2-7b"
model = LlamaForCausalLM.from_pretrained(model_path).to(device)

tokenizer = AutoTokenizer.from_pretrained(model_path)

prompt = "Deep learning is"
inputs = tokenizer(prompt, return_tensors="pt").to(device)
generate_ids = model.generate(inputs.input_ids, max_length=50)

tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
'Deep learning is a branch of machine learning that is based on artificial neural networks. Deep learning is a subset of machine learning that is based on artificial neural networks. Neural networks are a type of machine learning algorithm that is inspired by the structure and'
```

### 踩到的坑：
1. torch.npu.set_device: 设置错NPU ID后，会一直报错，即使改回来也会报错: https://github.com/cosdt/llm/issues/3
2. torch ReduceProd算子问题：https://github.com/cosdt/llm/issues/4
3. import transformer必须先于torch和torch_npu: https://github.com/cosdt/llm/issues/5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

容器昇腾NPU跑通llama2-7B #8

0. 前置条件

1. 安装Transformer

2. 准备llama模型

3. 运行模型

踩到的坑：

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

容器昇腾NPU跑通llama2-7B #8

Description

0. 前置条件

1. 安装Transformer

2. 准备llama模型

3. 运行模型

踩到的坑：

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions