Last Updated on 2024-07-20 by Clay
Introduction
Recently, I have been exploring models used for Optical Character Recognition (OCR). In the past, OCR was a very popular research field as it was one of the earliest practical applications of computer vision. Today, OCR has become a very mature task, and you can easily find high-performance open-source models online.
Today, we will focus on Chinese OCR, specifically introducing the PaddleOCR framework and model. If it were English OCR, there would likely be more models to choose from!
Between 2018 and 2020, while I was in graduate school, I assisted my advisor with OCR research. At that time, we used UNet to identify all the Chinese characters in an image, cutting each character into separate images. After a series of preprocessing steps such as scaling and contrast enhancement, we then used a fine-tuned VGG model for classification. Although we achieved certain results on our dataset, we couldn't challenge datasets from other fields, which was a bit regrettable.
Back to the topic, today I want to introduce PaddleOCR, an open-source OCR framework by Baidu's PaddlePaddle AI Studio. Its performance in Chinese recognition tasks is so impressive that I couldn't help but explore its methods, and I've documented my findings here as a shareable note.
PaddleOCR Architecture
Below, I will introduce the process in two parts. First, I will explain the methods used to identify text boxes in images. Next, I will discuss the different models used to predict the text in images. However, since elaborating on each part would be too lengthy, I might leave the details for future exploration. For me, this has lower priority.
It's worth noting that the process described above is not unique to PaddleOCR; many OCR systems follow a similar workflow.
How to Use PaddleOCR
If you want to use GPU to accelerate OCR recognition, you need to ensure that CUDA and CuDNN libraries are installed before using PaddleOCR. If you prefer to deploy and test using a Docker container, I personally recommend the image nvidia/cuda:12.2.2-cudnn8-devel-ubuntu20.04
, which allows you to use PaddleOCR seamlessly.
Choose one of the following installation methods depending on whether you want the GPU version:
# GPU
pip install paddlepaddle-gpu
# CPU
pip install paddlepaddle
Next, install the PaddleOCR package itself:
pip install paddleocr
You can refer to the following code to test OCR. Interestingly, the official code requires the simfang.ttf
font. If you need it, you can download it here.
from paddleocr import PaddleOCR, draw_ocr
from PIL import Image
# OCR
ocr = PaddleOCR(use_angle_cls=True, lang="ch")
img_path = "./program_data/clay_blog.png"
result = ocr.ocr(img_path, cls=True)
# Print Text
for idx in range(len(result)):
res = result[idx]
for line in res:
print(line)
# Show the image with bounding boxes
result = result[0]
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
im_show = draw_ocr(image, boxes, txts, scores, font_path='./program_data/simfang.ttf')
im_show = Image.fromarray(im_show)
im_show.save("ocr_result.jpg")
Output:
This is a thumbnail and summary of one of my blog articles. You can see that the text detection is quite accurate, even though it missed one line. Given the low resolution I provided, I am very satisfied with the precision achieved.
References
Recognizing Text Boxes
1. EAST (Efficient and Accurate Scene Text Detector)
Paper: EAST: An Efficient and Accurate Scene Text Detector
2. DB (Differentiable Binarization)
Paper: Real-time Scene Text Detection with Differentiable Binarization
3. SAST (Segmentation-based Text Detector)
Paper: A Single-Shot Arbitrarily-Shaped Text Detector based on Context Attended Multi-Task Learning