HOME
Shop
  • English
  • 简体中文
HOME
Shop
  • English
  • 简体中文
  • Product Series

    • FPGA+ARM

      • GM-3568JHF

        • 1. Introduction

          • GM-3568JHF Introduction
        • 2. Quick Start

          • 01 Environment Construction
          • 02 Compilation Instructions
          • 03 Burning Guide
          • 04 Debugging Tools
          • 05 Software Update
          • 06 View information
          • 07 Test Command
          • 08 Application Compilation
          • 09 Source code acquisition
        • 3. Peripherals and Interfaces

          • USB
          • Display and touch
          • Ethernet
          • WIFI
          • Bluetooth
          • TF-Card
          • Audio
          • Serial Port
          • CAN
          • RTC
        • 4. Application Development

          • 01 UART read and write case
          • 02 Key detection case
          • 03 LED light flashing case
          • 04 MIPI screen detection case
          • 05 Read USB device information example
          • 06 FAN Detection Case
          • 07 FPGA FSPI Communication Case
          • 08 FPGA DMA read and write case
          • 09 GPS debugging case
          • 10 Ethernet Test Cases
          • 11 RS485 reading and writing examples
          • 12 FPGA IIC read and write examples
          • 13 PN532 NFC card reader case
          • 14 TF card reading and writing case
        • 5. QT Development

          • 01 ARM64 cross compiler environment construction
          • 02 QT program added automatic startup service
        • 6. Others

          • 01 Modification of the root directory file system
          • 02 System auto-start service
    • ShimetaPi

      • M4-R1

        • Introduction

          • M4-R1 Introduction
        • Get started quickly

          • OpenHarmony概述
          • 镜像烧录
          • 开发环境准备
          • Hello World应用以及部署
        • Application Development

          • getting Started

            • 第一章 ArkTS语言简介
            • 第二章 UI组件介绍和实际应用(上)
            • 第三章 UI组件介绍和实际应用(中)
            • 第四章 UI组件介绍和实际应用(下)
          • Advanced

            • 第一章 入门指引
            • 第二章 三方库的引用和使用
            • 第三章 应用编译以及部署
            • 第四章 命令行恢复出厂设置
            • 第五章 系统调试--HDC调试
            • 第六章 APP 稳定性测试
            • 第七章 应用测试
        • Equipment Development

          • 第一章 环境搭建
          • 第二章 下载源码
          • 第三章 编译源码
        • Peripherals and interfaces

          • 树莓派接口
          • GPIO 接口
          • I2C 接口
          • SPI通信
          • PWM控制
          • 串口通讯
          • TF Card
          • 屏幕
          • 触摸
          • 音频
          • RTC
          • Ethernet
          • M.2
          • MINI-PCIE
          • Camera
          • WIFI&BT
          • 树莓派拓展板
        • Frequently asked questions

          • 资源下载
      • M5-R1

        • Introduction

          • Introduction to ShimetaPi M5-R1
    • OpenHarmony

      • SC-3568HA

        • Introduction

          • SC-3568HA Overview
        • Quick Start Guide

          • OpenHarmony Overview
          • Image Flashing
          • Setting Up the Development Environment
          • Hello World Application and Deployment
        • Application Development

          • ArkUI

            • Chapter 1 Introduction to ArkTS Language
            • Chapter 2 Introduction to UI Components and Practical Applications (Part 1)
            • Chapter 3 Introduction to UI Components and Practical Applications (Part 2)
            • Chapter 4 Introduction to UI Components and Practical Applications (Part 3)
          • Expand

            • Chapter 1 Getting Started Guide
            • Chapter 2 Referencing and Using Third-Party Libraries
            • Chapter 3: Application Compilation and Deployment
            • Chapter 4: Command-Line Factory Reset
            • Chapter 5: System Debugging -- HDC (Huawei Device Connector) Debugging
            • Chapter 6 APP Stability Testing
            • Chapter 7 Application Testing
        • Device Development

          • Chapter 1 Environment Setup
          • Chapter 2 Download Source Code
          • Chapter 3 Compiling Source Code
        • Peripheral And Iinterface

          • Raspberry Pi interface
          • GPIO Interface
          • I2C Interface
          • SPI communication
          • PWM (Pulse Width Modulation) control
          • Serial port communication
          • TF Card
          • Display Screen
          • Touch
          • Audio
          • RTC
          • Ethernet
          • M.2
          • MINI-PCIE
          • Camera
          • WIFI&BT
          • Raspberry Pi expansion board
        • Frequently Asked Questions

          • Resource Downloads
      • M-K1HSE

        • Introduction

          • M-K1HSE Introduction
        • Quick Start

          • Development environment construction
          • Source code acquisition
          • Compilation Notes
          • Burning Guide
        • Peripherals and interfaces

          • 01 Audio
          • 02 RS485
          • 03 Display
        • System customization development

          • System transplant
          • System customization
          • Driver Development
          • System Debugging
          • OTA Update
    • EVS-Camera

      • CF-NRS1

        • 1. Introduction

          • Event Camera Technical Documentation
        • 2. Quick Start

          • Host driver and software installation
        • 3. SDK application development

          • API Usage Instructions
      • CF-CRA2

        • Introduction

          • About CF-NRS1
    • AI-model

      • 1684XB-32T

        • Introduction

          • AIBOX-1684XB-32 Introduction
        • Get started quickly

          • First time use
          • Network Configuration
          • Disk usage
          • Memory allocation
          • Fan Strategy
          • Firmware Upgrade
        • Deployment Tutorial

          • Algorithm deployment
          • Deploy Llama3 Example
        • Application Development

          • Sophgo SDK Development
          • Sophon LLM_api_server development
          • Deploy MiniCPM-V-2_6
          • Qwen-2-5-VL Image and Video Recognition DEMO
          • Qwen3-chat-DEMO
          • Qwen3-Qwen Agent-MCP-Demo
          • Qwen3-langchain-AI Agent
      • 1684X-416T

        • Introduction

          • AIBOX-1684X-416 Introduction
        • Demo simple operation guide

          • Simple instructions for using shimeta smart monitoring demo
    • Core-Board

      • C-3568BQ

        • Introduction

          • C-3568BQ Overview
      • C-3588LQ

        • Introduction

          • C-3588LQ Introduction
      • GC-3568JBAF

        • Introduction

          • GC-3568JBAF Introduction
      • C-K1BA

        • Introduction

          • C-K1BA Introduction

Sophon LLM_api_server development

1. Introduction

The LLM_api_server routine is an Openai_api-like LLM service built on BM1684X, currently supporting ChatGLM3, Qwen, Qwen1.5, and Qwen2.

1. Features

  • Supports BM1684X (PCIe, SoC), BM1688 (SoC).
  • Supports calling of openai library.
  • Support web interface calls.

2. Project Directory

    LLM_api_server
    ├── models
    │   ├── BM1684X
    │   │   ├── chatglm3-6b_int4.bmodel              # BM1684X chatglm3-6b模型
    │   │   ├── qwen2-7b_int4_seq512_1dev.bmodel     # BM1684X qwen2-7b模型	
    ├── python
    │   ├── utils                         # 工具库
    │   ├── api_server.py                 # 服务启动程序
    │   └── config.yaml                   # 服务配置文件
    │   └── request.py                    # 请求示例程序
    │   └── requirements.txt              # python依赖
    └── scripts
        ├── download_model.sh       # 模型下载脚本
        ├── download_tokenizer.sh   # tokenizer下载脚本

2. Operation steps

1. Prepare data and models

1.1 Copy the official sophon_demo project directory of Suanneng (or copy and upload LLM_api_server to /data in the box)

    git clone https://github.com/sophgo/sophon-demo.git
    cd sophon-demo/appliction/LLM_api_server
    cd /data/LLM_api_server  ##If only the LLM_api_server has been uploaded, you only need to enter this directory.

1.2 Install unzip and other environments. If they are already installed, please skip this step. For non-Ubuntu systems, use yum or other methods to install them as appropriate.

    sudo apt-get update
    sudo apt-get install pybind11-dev
    pip3 install sentencepiece transformers==4.30.2
    pip3 install gradio==3.39.0 mdtex2html==1.2.0 dfss
    sudo apt install unzip
    chmod -R +x scripts/
    ./scripts/download_tokenizer.sh  ##Download the tokenizer
    ./scripts/download_model.sh  ##Download the model file

2. Python routine

2.1 Environmental Preparation

    pip3 install -r python/requirements.txt

    ##Since the sophon-sail version required for this routine is relatively new, a usable sophon-sail whl package is provided here. The SoC environment can download it through the following command
    python3 -m dfss --url=open@sophgo.com:sophon-demo/Qwen/sophon_arm-3.8.0-py3-none-any.whl
    python3 -m dfss --install sail  ##Install sophon_sail

2.2 Start the service

Parameter Description

api_server.py uses the config.yaml configuration file for parameter configuration.

The content of config.yaml is as follows

models:  # Model list
  - name: qwen   # Model name, qwen/chatglm3 are optional
    bmodel_path: ../models/BM1684X/qwen2-7b_int4_seq512_1dev.bmodel # Model path, modify it according to the actual situation
    token_path: ./utils/qwen/token_config  # Tokenizer path
    dev_id: 0  #  TPU ID


port: 18080 # Service port
How to use
    cd python  ##Switch the working directory
    python3 api_server.py --config ./config.yaml

3. Service Call

1. You can use the OpenAI library to call

    python3 request.py  ##If you want to use different Q&A for the model, you need to change the content of messages.["content"] in the request.py file.

2. Use http interface to call

The interface information is in request.py and can be modified according to it (such as IP, etc.)

Interface url: ip:port/v1/chat/completions, for example: 172.26.13.98:18080/v1/chat/completions

TOOL

Interface parameters (json format)

{
    "model": "qwen",
    "messages": [
        {"role": "user", "content": "你好"}
    ],
    "stream": true
}

You can use postman to test the interface

Download address: (https://www.postman.com/downloads/)

Usage Examples

!!! The ip address should be the IP of the box!!!

TOOL
Edit this page on GitHub
Last Updated:
Contributors: zsl, zwhuang
Prev
Sophgo SDK Development
Next
Deploy MiniCPM-V-2_6