Deploy MiniCPM-V-2_6
1. Introduction
MiniCPM-V-2_6 is a multimodal pre-trained model based on the MiniCPM architecture. It is designed for vision and language tasks and has the ability to efficiently process images and text. The model is built on SigLip-400M and Qwen2-7B and has a total of 8B parameters.
1. Features
- Support Chinese.
- Supports uploading single or multiple pictures.
2. Operation steps
1. Clone the project installation environment and download the model
1.1 Clone the LLM-TPU project
git clone https://github.com/sophgo/LLM-TPU.git
Or after downloading, transfer it to the root directory of the board at the path of /data. You can log in to SSH using MobaXterm and directly drag it in via the built-in SFTP, and then unzip it under the /data directory.
1.2 Installation environment, if already installed, please skip, non-Ubuntu system, use yum or other methods to install as appropriate
sudo apt-get update
sudo apt-get install pybind11-dev
pip3 install sentencepiece transformers==4.40.0
pip3 install gradio==3.39.0 mdtex2html==1.2.0 dfss
1.3 Download the model
##First, you need to enter the python_demo directory.
/data/LLM-TPU-main/models/MiniCPM-V-2_6/python_demo/
##Directly download the compiled model.
python3 -m dfss --url=open@sophgo.com:/ext_model_information/LLM/LLM-TPU/minicpmv26_bm1684x_int4_seq1024_imsize448.bmodel
2. Python routine
2.1 Compile library files
mkdir build && cd build ##Create a compilation directory and enter it
cmake .. ##Generate the Makefile using CMake
make ##Compile the project
cp *chat* .. ##Copy the compiled libraries to the running directory
2.2 Run the demo
cd /data/LLM-TPU-main/models/MiniCPM-V-2_6/python_demo/ ##Enter the python_demo directory
python3 pipeline.py --model_path minicpmv26_bm1684x_int4_seq1024_imsize448.bmodel --processor_path ../support/processor_config/ --devid 0##Run the demo
3. Operation Effect
linaro@bm1684:/data/LLM-TPU-main/models/MiniCPM-V-2_6/python_demo$ python3 pipeline.py --model_path minicpmv26_bm1684x_int4_seq1024_imsize448.bmodel --processor_path ../support/processor_config/ --devid 0
Load ../support/processor_config/ ...
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Device [ 0 ] loading .....
bmcpu init: skip cpu_user_defined
open usercpu.so, init user_cpu_init
Model[minicpmv26_bm1684x_int4_seq1024_imsize448.bmodel] loading ....
Done!
=================================================================
1. If you want to quit, please enter one of [q, quit, exit]
2. To create a new chat session, please enter one of [clear, new]
=================================================================
Question: Please analyze it step by step in detail. Are there two dogs in this picture?
Image Num: 1
Image Path 0: cat.png
Please note that currently, variable image sizes are not supported, so the images will be resized. The target size is the image size when exporting the ONNX model.
Please note that if you use a different image size when running export_onnx.py, please modify the following line of code: single_imsize = (448, 448).
Answer:
There is one dog in this picture, not two. The fur color and features of this dog, such as the pointed ears, black muzzle, and red collar, can be identified. The intimate posture between the cat and the dog, along with their similar fur colors, may cause confusion. However, by carefully observing their features, such as the shape of the dog's muzzle and its ears, it is clear that there are differences between them. Therefore, there is one dog in this picture.
FTL: 1.784 s
TPS: 9.662 token/s
4. Author's Environment
Package Version
------------------------- -----------
aiofiles 23.2.1
aiohappyeyeballs 2.4.4
aiohttp 3.10.11
aiosignal 1.3.1
altair 5.4.1
annotated-types 0.7.0
anyio 4.5.2
async-timeout 5.0.1
attrs 25.1.0
certifi 2019.11.28
chardet 3.0.4
click 8.1.3
contourpy 1.1.1
cycler 0.12.1
dbus-python 1.2.16
dfss 1.9.2
distlib 0.3.9
distro 1.9.0
distro-info 0.23ubuntu1
exceptiongroup 1.2.2
fastapi 0.115.8
ffmpy 0.5.0
filelock 3.16.1
Flask 2.2.2
fonttools 4.54.0
frozenlist 1.5.0
fsspec 2025.2.0
gradio 3.39.0
gradio_client 1.3.0
h11 0.14.0
httpcore 1.0.7
httpx 0.28.1
huggingface-hub 0.29.1
idna 2.8
importlib-metadata 6.0.0
importlib_resources 6.4.5
itsdangerous 2.1.2
Jinja2 3.1.2
jiter 0.9.0
jsonschema 4.23.0
jsonschema-specifications 2023.12.1
kiwisolver 1.4.7
latex2mathml 3.77.0
linkify-it-py 2.0.3
Markdown 3.7
markdown-it-py 2.2.0
MarkupSafe 2.1.2
matplotlib 3.7.5
mdit-py-plugins 0.3.3
mdtex2html 1.2.0
mdurl 0.1.2
mpmath 1.3.0
multidict 6.1.0
narwhals 1.28.0
netifaces 0.10.4
networkx 3.1
numpy 1.24.1
openai 1.74.0
orjson 3.10.15
packaging 24.1
pandas 2.0.3
pillow 10.4.0
pip 25.0.1
pkgutil_resolve_name 1.3.10
platformdirs 4.3.6
propcache 0.2.0
psutil 5.9.4
pydantic 2.10.6
pydantic_core 2.27.2
pydub 0.25.1
PyGObject 3.36.0
pymacaroons 0.13.0
PyNaCl 1.3.0
pyparsing 3.1.4
pyserial 3.5
python-apt 2.0.0
python-dateutil 2.9.0.post0
python-multipart 0.0.20
pytz 2025.1
PyYAML 5.3.1
referencing 0.35.1
regex 2024.11.6
requests 2.22.0
requests-unixsocket 0.2.0
rpds-py 0.20.1
safetensors 0.5.2
semantic-version 2.10.0
sentencepiece 0.2.0
setuptools 45.2.0
six 1.14.0
sniffio 1.3.1
sophon-arm 3.10.0
sse-starlette 2.1.3
ssh-import-id 5.10
starlette 0.44.0
sympy 1.13.3
tokenizers 0.19.1
torch 2.4.1
torchaudio 2.4.1
torchvision 0.19.1
tqdm 4.67.1
transformers 4.40.0
typing_extensions 4.12.2
tzdata 2025.1
ubuntu-advantage-tools 20.3
uc-micro-py 1.0.3
unattended-upgrades 0.1
urllib3 1.25.8
uvicorn 0.33.0
virtualenv 20.30.0
websockets 11.0.3
Werkzeug 2.2.2
wheel 0.34.2
yarl 1.15.2
zipp 3.11.0