前言

在《AI教程 LLaMA 模型部署运行(一) 第2章》,我们已介绍过使用Google Colab的免费CPU和GPU资源安装运行LLaMA 模型,但由于只有12GB的免费内存空间且不支持SWAP分区,LLaMA模型权重加载失败。下面使用公有云资源——P40显卡 24GB显存 70元/15天 部署LLaMA模型。

准备CPU和GPU资源

购买时选择Ubuntu18.04操作系统,购买成功后登录控制台——点击实例,找到购买的GPU资源实例,点击更多——密码/密钥——重置密码,填入自己的常用密码。

使用SSH工具连接到GPU资源实例,填写公网IP地址、默认用户名ubuntu,点击OK连接输入刚才重置的密码。使用下面命令查看GPU实例的资源详情。

1
2
3
4
5
6
7
8
#切换到root用户
sudo -i
#查看GPU,显卡型号Tesla P40 ,显存24G
nvidia-smi
#查看磁盘大小,100GB空间
df -h
#查看内存
free -h

准备模型

下载LLaMA模型代码仓库

[点击这里] 下载LLaMA模型代码仓库,将 **llama-main.zip** 上传到GPU资源实例的/opt目录下。或使用下面命令下载。
1
2
3
4
5
6
7
8
#切换到root用户
sudo -i
#进入/opt目录
cd /opt
#下载LLaMA模型代码仓库
wget https://www.mysteryforai.com/cn/3852225535/llama-main.zip
#解压
unzip llama-main.zip

下载LLaMA模型权重

使用下面命令下载

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#切换到root用户
sudo -i
#新建目录
mkdir -p /opt/llama
#进入目录
cd /opt/llama
#下载LLaMA模型权重
wget https://agi.gpt4.org/llama/LLaMA/tokenizer.model
wget https://agi.gpt4.org/llama/LLaMA/tokenizer_checklist.chk
wget https://agi.gpt4.org/llama/LLaMA/7B/params.json
wget https://agi.gpt4.org/llama/LLaMA/7B/checklist.chk
#约13GB,使用后台下载
nohup wget https://agi.gpt4.org/llama/LLaMA/7B/consolidated.00.pth &
#查看下载状态,是否已下载完成
jobs

安装LLaMA模型依赖和运行

升级python环境

操作系统ubuntu18.04,默认Python版本是Python 3.6.9,要升级到Python 3.8。若使用Python 3.6.9,安装requirements.txt依赖时会报错。

1
2
3
4
5
6
7
8
9
10
11
12
13
#升级前
root@VM-0-13-ubuntu:/opt/llama-main# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.4 LTS"
root@VM-0-13-ubuntu:~# python3 -V
Python 3.6.9

#升级
apt install python3.8
apt install python3.8-venv
python3.8 -V

安装依赖

1
2
3
4
5
6
7
8
9
10
11
12
13
#创建python环境
cd /opt/llama-main
python3.8 -m venv venv
source venv/bin/activate

#查看Python版本
pip install --upgrade pip
pip -V
python -V

#安装依赖
pip install -r requirements.txt
pip install -e .

安装成功后,如下图所示。

运行模型

1
2
3
4
5
6
#进入环境
cd /opt/llama-main
source venv/bin/activate

#运行模型
torchrun --nproc_per_node 1 example.py --ckpt_dir /opt/llama --tokenizer_path /opt/llama/tokenizer.model

上图可以看出,运行LLaMA模型时用了约22GB显存,GPU使用率达到100%,显卡功耗约150W。example.py的输出结果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
(venv) root@VM-0-13-ubuntu:/opt/llama-main# torchrun --nproc_per_node 1 example.py --ckpt_dir /opt/llama --tokenizer_path /opt/llama/tokenizer.model
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
Loading
Loaded in 10.31 seconds #加载13GB模型权重到内存只用了11秒,性能很好
I believe the meaning of life is to find happiness and be satisfied with what you have.
But sometimes we have to struggle to find it. So, do we know the best way to achieve happiness?
Is happiness merely a mental state?
To be happy, you need to accept yourself.
I’m sure everyone has heard that self-acceptance is the best way to achieve happiness.
But is it really the case? I’m going to show you why self-acceptance is not the right way to be happy.
Accepting yourself means embracing all aspects of you. You don’t need to change anything about you, you need to accept your flaws, weaknesses, and strengths.
But is it really so? Accepting yourself means to love yourself unconditionally, even when you fail or make mistakes.
You might think that embracing all aspects of you is the best way to be happy. You will feel more secure about yourself and love yourself more.
However, I strongly believe that accepting yourself is not the best way to be happy. Let me show you why.
I believe that in order to find happiness, you need to find and build your self-esteem.
As we all know, self-este

==================================

Simply put, the theory of relativity states that 1) there is no absolute time or space and 2) the speed of light in a vacuum is the fastest speed possible. There are two key principles in relativity:
(1) The laws of physics are the same in all inertial reference frames.
(2) The speed of light is constant in all inertial reference frames.
The second of these principles has allowed us to prove the first.
Before Einstein, scientists believed that the speed of light was constant in all frames, but that the speed of light was not constant. This was called the luminiferous aether and was used to explain why light has a finite speed. However, with the advent of special relativity, we can now explain why the speed of light is constant in all inertial reference frames, without the need for the luminiferous aether.
The special theory of relativity is based on two postulates:
1) The laws of physics are the same in all inertial reference frames.
2) The speed of light is constant in all inertial reference frames.
The first of these postulates is based on the assumption that the laws of physics are the same everywhere in the universe

==================================

Building a website can be done in 10 simple steps:
Dream up the idea for the website.
Think of a name.
Set up a hosting account.
Design the site.
Write the text.
Edit the text.
Build the site.
In this article, we are going to look at how to create a website that fits in with the rest of your business. We'll look at what you need to do to get the name, host your site, design it and build it.
Creating a website is not the same as writing a website. A website is a tool for communicating with your customers or clients. It should be designed to build relationships and to create a positive experience.
If you are writing a brochure or a magazine article, the story that you want to tell is important. You need to be clear about your purpose and your audience. If you're writing a website, you need to have the same clarity about what you are trying to achieve. You're not writing to yourself or to your friends. Your audience is your potential customers, clients or patients.
If you can't answer these questions, then you need to go back and work out what you are trying to achieve by having a website. If you can't describe

==================================

Tweet: "I hate it when my phone battery dies."
Sentiment: Negative
###
Tweet: "My day has been 👍"
Sentiment: Positive
###
Tweet: "This is the link to the article"
Sentiment: Neutral
###
Tweet: "This new music video was incredibile"
Sentiment: Positive
###
Tweet: "No workout today #feelinglazy"
Sentiment: Negative
###
Tweet: "I want to be a #Boss in my workplace"
Sentiment: Positive
###
Tweet: "Feeling sad today"
Sentiment: Negative
###
Tweet: "It was a great day"
Sentiment: Positive
###
Tweet: "I can't wait to go for a run"
Sentiment: Positive
###
Tweet: "This new website is so cool"
Sentiment: Positive
###
Tweet: "My weekend was so great"
Sentiment: Positive
###
Tweet: "I'm so hungry now"
Sentiment: Negative
###
Tweet: "I'm really excited about this project"
Sentiment: Positive
###
Tweet: "I wish I could go to bed right now"
Sentiment: Negative
###
Tweet: "I'm really excited about this project"


==================================

Translate English to French:

sea otter => loutre de mer

peppermint => menthe poivrée

plush girafe => girafe peluche

cheese => fromage

Translate French to English:

sushi => sushi

soup => soupe

snails => escargot

### Glossary

translation (noun) = traduction

reference (noun) = référence

reference (verb) = référer

reference (verb) = référer

reference (verb) = référer

head (noun) = tête

head (noun) = tête

head (noun) = tête

head (noun) = tête

head (noun) = tête

head (noun) = tête

head (noun) = tête

head (noun) = tête

head (noun) = tête

head (noun) = tête

head (noun) = tête

head (noun) = tête

head (noun) = tête

head (noun) = tête

head (noun) = tête

head (noun) = tête

head (noun) = tête

head (noun) = tête
==================================

小结

本章完成了LLaMA 模型在公有云GPU环境下的安装部署,成功执行example.py并看到相应推理输出。