Kimi-K2.5 / Kimi-K2.6
- 源码地址:https://github.com/jd-opensource/xllm
- 国内可用: https://gitcode.com/xLLM-AI/xllm
- Kimi-K2.5 W8A8权重下载: modelscope-Kimi-K2.5-W8A8-xLLM
- Kimi-K2.6 W8A8权重下载: modelscope-Kimi-K2.6-w8a8-xllm
P.S. Kimi-K2.5 与 Kimi-K2.6 模型结构相同,后文以Kimi-K2.5 做例子说明整体部署流程
0.权重准备
Section titled “0.权重准备”从魔乐上下载权重
Section titled “从魔乐上下载权重”export MODELSCOPE_CACHE=path-to-model # 默认 ~/.cache/modelscope/hubpip install modelscopemodelscope download --model Eco-Tech/Kimi-K2.5-W8A8-xLLM1.拉取镜像环境
Section titled “1.拉取镜像环境”首先下载xLLM提供的镜像:
# A3 armdocker pull quay.io/jd_xllm/xllm-ai:xllm-dev-a3-arm-20260429然后创建对应的容器
sudo docker run -it --ipc=host -u 0 --privileged --name xllm_kimi_k25 --network=host \ -v /var/queue_schedule:/var/queue_schedule \ -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \ -v /usr/local/Ascend/add-ons/:/usr/local/Ascend/add-ons/ \ -v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi \ -v /var/log/npu/conf/slog/slog.conf:/var/log/npu/conf/slog/slog.conf \ -v /var/log/npu/slog/:/var/log/npu/slog \ -v ~/.ssh:/root/.ssh \ -v /var/log/npu/profiling/:/var/log/npu/profiling \ -v /var/log/npu/dump/:/var/log/npu/dump \ -v /runtime/:/runtime/ -v /etc/hccn.conf:/etc/hccn.conf \ -v /export/home:/export/home \ -v /home/:/home/ \ -w /export/home \ quay.io/jd_xllm/xllm-ai:xllm-dev-a3-arm-202604292.拉取源码并编译
Section titled “2.拉取源码并编译”下载官方仓库与模块依赖:
git clone https://github.com/jd-opensource/xllmcd xllmgit checkout maingit submodule initgit submodule update下载安装依赖:
pip install --upgrade pre-commityum install numactl执行编译,在build/下生成可执行文件:
python setup.py build编译产物路径:build/xllm/core/server/xllm
3.启动模型
Section titled “3.启动模型”若机器为重启后初次拉起服务,需先执行以下脚本对device进行初始化
Section titled “若机器为重启后初次拉起服务,需先执行以下脚本对device进行初始化”若不执行且npu未初始化可能导致xllm进程拉起失败
python -c "import torch_npufor i in range(16):torch_npu.npu.set_device(i)"##### 1. 配置依赖路径相关环境变量export PYTHON_INCLUDE_PATH="$(python3 -c 'from sysconfig import get_paths; print(get_paths()["include"])')"export PYTHON_LIB_PATH="$(python3 -c 'from sysconfig import get_paths; print(get_paths()["include"])')"export PYTORCH_NPU_INSTALL_PATH=/usr/local/libtorch_npu/export PYTORCH_INSTALL_PATH="$(python3 -c 'import torch, os; print(os.path.dirname(os.path.abspath(torch.__file__)))')"export LIBTORCH_ROOT="$(python3 -c 'import torch, os; print(os.path.dirname(os.path.abspath(torch.__file__)))')"
export LD_LIBRARY_PATH=/usr/local/Ascend/ascend-toolkit/latest/opp/vendors/xllm/op_api/lib/:$LD_LIBRARY_PATHexport LD_LIBRARY_PATH=/usr/local/libtorch_npu/lib:$LD_LIBRARY_PATHexport LD_PRELOAD=/usr/lib64/libjemalloc.so.2:$LD_PRELOAD
source /usr/local/Ascend/ascend-toolkit/set_env.shsource /usr/local/Ascend/nnal/atb/set_env.sh
##### 2. 配置日志相关环境变量rm -rf /root/atb/log/rm -rf /root/ascend/log/rm -rf core.*export ASDOPS_LOG_LEVEL=ERRORexport ASDOPS_LOG_TO_STDOUT=1export ASDOPS_LOG_TO_FILE=1
##### 3. 配置性能、通信相关环境变量export PYTORCH_NPU_ALLOC_CONF=expandable_segments:Trueexport NPU_MEMORY_FRACTION=0.96export ATB_WORKSPACE_MEM_ALLOC_ALG_TYPE=3export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
export OMP_NUM_THREADS=12export ALLOW_INTERNAL_FORMAT=1
export ATB_LAYER_INTERNAL_TENSOR_REUSE=1export ATB_LLM_ENABLE_AUTO_TRANSPOSE=0export ATB_CONVERT_NCHW_TO_ND=1export ATB_LAUNCH_KERNEL_WITH_TILING=1export ATB_OPERATION_EXECUTE_ASYNC=2export ATB_CONTEXT_WORKSPACE_SIZE=0export INF_NAN_MODE_ENABLE=1export HCCL_EXEC_TIMEOUT=0export HCCL_CONNECT_TIMEOUT=7200export HCCL_OP_EXPANSION_MODE="AIV"export HCCL_IF_BASE_PORT=2864启动命令 - Kimi_k25(双机 16卡32die tp=4, dp=8, ep=32)
Section titled “启动命令 - Kimi_k25(双机 16卡32die tp=4, dp=8, ep=32)”Node0 (master)
Section titled “Node0 (master)”MASTER_NODE_ADDR="11.87.49.110:19990"LOCAL_HOST="11.87.49.110"START_PORT=15890START_DEVICE=0LOG_DIR="logs"NNODES=32LOCAL_NODES=16export HCCL_IF_BASE_PORT=48439unset HCCL_OP_EXPANSION_MODE
for (( i=0; i<$LOCAL_NODES; i++ ))do PORT=$((START_PORT + i)) DEVICE=$((START_DEVICE + i)); LOG_FILE="$LOG_DIR/node_$i.log" nohup numactl -C $((DEVICE*40))-$((DEVICE*40+39)) $XLLM_PATH \ --model $MODEL_PATH \ --host $LOCAL_HOST \ --port $PORT \ --devices="npu:$DEVICE" \ --master_node_addr=$MASTER_NODE_ADDR \ --nnodes=$NNODES \ --node_rank=$i \ --max_memory_utilization=0.85 \ --max_tokens_per_batch=8192 \ --max_seqs_per_batch=20 \ --block_size=128 \ --enable_prefix_cache=false \ --enable_chunked_prefill=false \ --communication_backend="hccl" \ --enable_schedule_overlap=true \ --enable_graph=false \ --enable_shm=true \ --ep_size=32 \ --dp_size=8 \ --input_shm_size=4096 \ --rank_tablefile=/yourPath/ranktable.json \ > $LOG_FILE 2>&1 &doneNode1 (worker)
Section titled “Node1 (worker)”MASTER_NODE_ADDR="11.87.49.110:19990"LOCAL_HOST="11.87.49.111"START_PORT=15890START_DEVICE=0LOG_DIR="logs"NNODES=32LOCAL_NODES=16export HCCL_IF_BASE_PORT=48439unset HCCL_OP_EXPANSION_MODE
for (( i=0; i<$LOCAL_NODES; i++ ))do PORT=$((START_PORT + i)) DEVICE=$((START_DEVICE + i)); LOG_FILE="$LOG_DIR/node_$i.log" nohup numactl -C $((DEVICE*40))-$((DEVICE*40+39)) $XLLM_PATH \ --model $MODEL_PATH \ --host $LOCAL_HOST \ --port $PORT \ --devices="npu:$DEVICE" \ --master_node_addr=$MASTER_NODE_ADDR \ --nnodes=$NNODES \ --node_rank=$((i + LOCAL_NODES)) \ --max_memory_utilization=0.85 \ --max_tokens_per_batch=8192 \ --max_seqs_per_batch=20 \ --block_size=128 \ --enable_prefix_cache=false \ --enable_chunked_prefill=false \ --communication_backend="hccl" \ --enable_schedule_overlap=true \ --enable_graph=false \ --enable_shm=true \ --ep_size=32 \ --dp_size=8 \ --input_shm_size=4096 \ --rank_tablefile=/yourPath/ranktable.json \doneranktable样例
Section titled “ranktable样例”ranktable配置指导:https://www.hiascend.com/document/detail/zh/canncommercial/83RC1/hccl/hcclug/hcclug_000014.html
ln -s /usr/local/Ascend/driver/tools/hccn_tool /usr/sbin/
#device_ipfor i in {0..15};do hccn_tool -i $i -vnic -g; done
#super_device_idfor i in {0..7};do for j in {0..1}; do npu-smi info -t spod-info -i $i -c $j; done; done{ "status": "completed", "version": "1.2", "server_count": "2", "server_list": [ { "server_id": "10.87.191.98", "host_nic_ip": "reserve", "host_ip": "10.87.191.98", "container_ip": "10.87.191.98", "device": [ { "device_id": "0", "device_ip": "192.24.2.199", "super_device_id": "100663296", "rank_id": "16" }, ... { "device_id": "15", "device_ip": "192.24.3.184", "super_device_id": "102563855", "rank_id": "31" } ] }, { "server_id": "10.87.191.102", "host_nic_ip": "reserve", "host_ip": "10.87.191.102", "container_ip": "10.87.191.102", "device": [ { "device_id": "0", "device_ip": "192.28.2.199", "super_device_id": "117440512", "rank_id": "0" }, ... { "device_id": "15", "device_ip": "192.28.3.184", "super_device_id": "119341071", "rank_id": "15" } ] } ], "super_pod_list": [ { "super_pod_id": "2", "server_list": [ { "server_id": "10.87.191.98" }, { "server_id": "10.87.191.102" } ] } ]}日志出现”Application startup complete.”表示服务成功拉起。