PP-OCR (Optical Character Recognition)

1. Introduction

PP-OCR is a practical optical character recognition (OCR) toolkit open-sourced by Baidu PaddlePaddle. It aims to provide high-precision, easy-to-use, and flexible text recognition solutions. It integrates Baidu's technical accumulation in the computer vision field and supports multi-language and multi-scenario text detection and recognition. It is widely used in scenarios such as document digitization, license plate recognition, industrial quality control, smart office, etc. Its core features include: balanced high precision and practicality, optimized for actual business scenarios, while ensuring recognition accuracy, through model lightweight design (such as mobile model PP-OCRv3-mobile) it balances speed and deployment costs, supports Chinese and English, multilingual (Japanese, Korean, French, etc.) and special scenarios (curved text, blurred text) recognition.

Project Directory

PP-OCR
├─cpp
│  ├─dependencies		##C++ example dependencies
│  │
│  └─ppocr_bmcv
│      │  CMakeLists.txt	##Files required for cross-compilation
│      │  ppocr_bmcv.soc	##Provided cross-compiled executable
│      │
│      ├─include			##Cross-compilation dependencies
│      │      clipper.h
│      │      postprocess.hpp
│      │      ppocr_cls.hpp
│      │      ppocr_det.hpp
│      │      ppocr_rec.hpp
│      │
│      ├─src				##Cross-compilation source code
│      │      clipper.cpp
│      │      main.cpp
│      │      postprocess.cpp
│      │      ppocr_cls.cpp
│      │      ppocr_det.cpp
│      │      ppocr_rec.cpp
│      │
│      └─thirdparty			##Cross-compilation third-party libraries
│              cnpy.cpp
│              cnpy.h
│
├─docs		##Help documentation
│  │  PP-OCR.md
│  │
│  └─images
├─python	##Python example required files
│      ppocr_cls_opencv.py
│      ppocr_det_opencv.py
│      ppocr_rec_opencv.py
│      ppocr_system_opencv.py
│      requirements.txt
│
├─scripts
│      download.sh		##Script files for downloading datasets and models
│
└─tools					##Files for comparison and evaluation
        compare_statis.py
        eval_icdar.py

2. Running Steps

Before running the test example, you need to download the required datasets and models.

#Install download tool
pip3 install dfss --upgrade
#Execute download script
bash scripts/download.sh

1. Python Examples

1.1 Text Detection Inference Testing

Parameters for ppocr_det_opencv.py are as follows:

usage: ppocr_det_opencv.py [-h] [--dev_id DEV_ID] [--input INPUT] [--bmodel_det BMODEL_DET]

optional arguments:
  -h, --help            show this help message and exit
  --dev_id DEV_ID       tpu card id
  --input INPUT         input image directory path
  --bmodel_det BMODEL_DET
                        bmodel path

Text detection testing example is as follows:

#The program will automatically select 1batch or 4batch based on the number of images in the folder, with priority to 4batch inference.
python3 python/ppocr_det_opencv.py --input datasets/cali_set_det --bmodel_det models/BM1684X/ch_PP-OCRv4_det_fp32.bmodel --dev_id 0

After execution, predicted images will be saved in the results/det_results folder.

seg

1.2 Text Recognition Inference Testing

Parameters for ppocr_rec_opencv.py are as follows:

usage: ppocr_rec_opencv.py [-h] [--dev_id DEV_ID] [--input INPUT] [--bmodel_rec BMODEL_REC] [--img_size IMG_SIZE] [--char_dict_path CHAR_DICT_PATH] [--use_space_char USE_SPACE_CHAR] [--use_beam_search]
                           [--beam_size {1~40}]

optional arguments:
  -h, --help            show this help message and exit
  --dev_id DEV_ID       tpu card id
  --input INPUT         input image directory path
  --bmodel_rec BMODEL_REC
                        recognizer bmodel path
  --img_size IMG_SIZE   You should set inference size [width,height] manually if using multi-stage bmodel.
  --char_dict_path CHAR_DICT_PATH
  --use_space_char USE_SPACE_CHAR
  --use_beam_search     Enable beam search
  --beam_size {1~40}    Only valid when using beam search, valid range 1~40

Text recognition testing example is as follows:

#The program will automatically select 1batch or 4batch based on the number of images in the folder, with priority to 4batch inference.
python3 python/ppocr_rec_opencv.py --input datasets/cali_set_rec --bmodel_rec models/BM1684X/ch_PP-OCRv4_rec_fp32.bmodel --dev_id 0 --img_size [[640,48],[320,48]] --char_dict_path datasets/ppocr_keys_v1.txt

reg

1.3 Full Pipeline Inference Testing

Parameters for ppocr_system_opencv.py are as follows:

usage: ppocr_system_opencv.py [-h] [--input INPUT] [--dev_id DEV_ID] [--batch_size BATCH_SIZE] [--bmodel_det BMODEL_DET] [--det_limit_side_len DET_LIMIT_SIDE_LEN] [--bmodel_rec BMODEL_REC] [--img_size IMG_SIZE]
                              [--char_dict_path CHAR_DICT_PATH] [--use_space_char USE_SPACE_CHAR] [--use_beam_search]
                              [--beam_size {1~40}] [--rec_thresh REC_THRESH] [--use_angle_cls]
                              [--bmodel_cls BMODEL_CLS] [--label_list LABEL_LIST] [--cls_thresh CLS_THRESH]

optional arguments:
  -h, --help            show this help message and exit
  --input INPUT         input image directory path
  --dev_id DEV_ID       tpu card id
  --batch_size BATCH_SIZE
                        img num for a ppocr system process launch.
  --bmodel_det BMODEL_DET
                        detector bmodel path
  --det_limit_side_len DET_LIMIT_SIDE_LEN
  --bmodel_rec BMODEL_REC
                        recognizer bmodel path
  --img_size IMG_SIZE   You should set inference size [width,height] manually if using multi-stage bmodel.
  --char_dict_path CHAR_DICT_PATH
  --use_space_char USE_SPACE_CHAR
  --use_beam_search     Enable beam search
  --beam_size {1~40}    Only valid when using beam search, valid range 1~40
  --rec_thresh REC_THRESH
  --use_angle_cls
  --bmodel_cls BMODEL_CLS
                        classifier bmodel path
  --label_list LABEL_LIST
  --cls_thresh CLS_THRESH

Testing example is as follows:

python3 python/ppocr_system_opencv.py --input datasets/train_full_images_0 \
                           --batch_size 4 \
                           --bmodel_det models/BM1684X/ch_PP-OCRv4_det_fp32.bmodel \
                           --bmodel_rec models/BM1684X/ch_PP-OCRv4_rec_fp32.bmodel \
                           --dev_id 0 \
                           --img_size [[640,48],[320,48]] \
                           --char_dict_path datasets/ppocr_keys_v1.txt

After execution, predicted fields will be printed, visualization results will be saved in results/inference_results folder, and inference results will be saved in results/ppocr_system_results_b4.json.

omni

regseg

2. C++ Examples

1. Cross-compilation Environment Setup

C++ programs need to compile dependency files to run on the board. To save pressure on edge devices, we choose to use an X86 Linux environment for cross-compilation.

Setting up cross-compilation environment, two methods provided:

(1) Install cross-compilation toolchain via apt:

If your system and target SoC platform have the same libc version (can be queried via ldd --version command), you can install using the following command:

sudo apt-get install gcc-aarch64-linux-gnu g++-aarch64-linux-gnu

Uninstall method:

sudo apt remove cpp-*-aarch64-linux-gnu

If your environment does not meet the above requirements, it is recommended to use method (2).

(2) Set up cross-compilation environment via docker:

You can use the provided docker image -- stream_dev.tar as the cross-compilation environment.

If using Docker for the first time, execute the following commands to install and configure (only required for first time):

sudo apt install docker.io
sudo systemctl start docker
sudo systemctl enable docker
sudo groupadd docker
sudo usermod -aG docker $USER
newgrp docker

Load the image in the downloaded image directory

docker load -i stream_dev.tar

You can view loaded images via docker images, default is stream_dev:latest

Create container

docker run --privileged --name stream_dev -v $PWD:/workspace  -it stream_dev:latest
# stream_dev is just an example name, please specify your own container name

The workspace directory in the container will mount to the host directory where you run docker run. You can compile projects in this container. The workspace directory is under root, changes in this directory will map to changes in corresponding files in the local directory.

Note: When creating a container, you need to go to the parent directory of soc-sdk (dependency compilation environment) and above

1.2 Package Dependency Files

Package libsophon

Extract libsophon_soc_x.y.z_aarch64.tar.gz, where x.y.z is the version number.

# Create root directory for dependency files
mkdir -p soc-sdk
# Extract libsophon_soc_x.y.z_aarch64.tar.gz
tar -zxf libsophon_soc_${x.y.z}_aarch64.tar.gz
# Copy related library directories and header file directories to the dependency root directory
cp -rf libsophon_soc_${x.y.z}_aarch64/opt/sophon/libsophon-${x.y.z}/lib soc-sdk
cp -rf libsophon_soc_${x.y.z}_aarch64/opt/sophon/libsophon-${x.y.z}/include soc-sdk

Package sophon-ffmpeg and sophon-opencv

Extract sophon-mw-soc_x.y.z_aarch64.tar.gz, where x.y.z is the version number.

# Extract sophon-mw-soc_x.y.z_aarch64.tar.gz
tar -zxf sophon-mw-soc_${x.y.z}_aarch64.tar.gz
# Copy ffmpeg and opencv library directories and header file directories to soc-sdk directory
cp -rf sophon-mw-soc_${x.y.z}_aarch64/opt/sophon/sophon-ffmpeg_${x.y.z}/lib soc-sdk
cp -rf sophon-mw-soc_${x.y.z}_aarch64/opt/sophon/sophon-ffmpeg_${x.y.z}/include soc-sdk
cp -rf sophon-mw-soc_${x.y.z}_aarch64/opt/sophon/sophon-opencv_${x.y.z}/lib soc-sdk
cp -rf sophon-mw-soc_${x.y.z}_aarch64/opt/sophon/sophon-opencv_${x.y.z}/include soc-sdk

1.3 Perform Cross-compilation

After setting up the cross-compilation environment, use the cross-compilation toolchain to compile and generate executable files:

cd cpp/ppocr_bmcv
mkdir build && cd build
#Please modify -DSDK path according to actual situation, use absolute path.
cmake -DTARGET_ARCH=soc -DSDK=/workspace/soc-sdk/ ..
make

After compilation completes, a .soc file will be generated in the corresponding directory, for example: cpp/ppocr_bmcv/ppocr_bmcv.soc. This file is also provided and can be used directly.

2. Inference Testing

You need to copy the executable files generated from cross-compilation and required models and test data to the SoC platform (i.e., BM1684X development board) for testing.

Parameter Description

The executable program has a default set of parameters. Please pass parameters according to actual situation. ppocr_bmcv.soc specific parameters are as follows:

Usage: ppocr_bmcv.soc [params]

        --batch_size (value:4)
                ppocr system batchsize
        --beam_size (value:3)
                beam size, default 3, available 1-40, only valid when using beam search
        --bmodel_cls (value:../../models/BM1684X/ch_PP-OCRv3_cls_fp32.bmodel)
                cls bmodel file path, unsupport now.
        --bmodel_det (value:../../models/BM1684X/ch_PP-OCRv4_det_fp32.bmodel)
                det bmodel file path
        --bmodel_rec (value:../../models/BM1684X/ch_PP-OCRv4_rec_fp32.bmodel)
                rec bmodel file path
        --dev_id (value:0)
                TPU device id
        --help (value:true)
                print help information.
        --input (value:../../datasets/cali_set_det)
                input path, images directory
        --labelnames (value:../../datasets/ppocr_keys_v1.txt)
                class names file path
        --rec_thresh (value:0.5)
                recognize threshold
        --use_beam_search (value:false)
                beam search trigger

Image Testing

Image testing example is as follows. Supports testing the entire image folder.

#Add executable permission to the file
chmod 755 cpp/ppocr_bmcv/ppocr_bmcv.soc
#Execute file
./cpp/ppocr_bmcv/ppocr_bmcv.soc --input=datasets/train_full_images_0 \
                  --batch_size=4 \
                  --bmodel_det=models/BM1684X/ch_PP-OCRv4_det_fp32.bmodel \
                  --bmodel_rec=models/BM1684X/ch_PP-OCRv4_rec_fp32.bmodel \
                  --labelnames=datasets/ppocr_keys_v1.txt

After testing, predicted images will be saved in results/images, predicted results will be saved in results/, and predicted results, inference time and other information will be printed.

CPPreg