如何实现Python中的光学字符识别(OCR)功能?

2026-05-24 13:411阅读0评论SEO资源
  • 内容介绍
  • 文章标签
  • 相关推荐

本文共计195个文字,预计阅读时间需要1分钟。

如何实现Python中的光学字符识别(OCR)功能?

使用pip安装Pillow库、pytesseract和Tesseract OCR:

bashpip install Pillowpip install pytesseractpip install tesseract-ocr

pip install pillow
pip install pytesseract
pip install tesseract-ocr## 若安装失败去下载jaist.dl.sourceforge.net/project/tesseract-ocr-alt/
USAGE
try:
import Image
except ImportError:
from PIL import Image
import pytesseract

pytesseract.pytesseract.tesseract_cmd = '<full_path_to_your_tesseract_executable>'
# Include the above line, if you don't have tesseract executable in your PATH
# Example tesseract_cmd: 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract'

print(pytesseract.image_to_string(Image.open('test.png')))
print(pytesseract.image_to_string(Image.open('test-european.jpg'), lang='fra'))
Add the following config, if you have tessdata error like: “Error opening data file…”

tessdata_dir_config = '--tessdata-dir "<replace_with_your_tessdata_dir_path>"'
# Example config: '--tessdata-dir "C:\\Program Files (x86)\\Tesseract-OCR\\tessdata"'
# It's important to add double quotes around the dir path.若该目录下没有chi_sim.traineddata则去jaist.dl.sourceforge.net/project/tesseract-ocr-alt/

pytesseract.image_to_string(image, lang='chi_sim', config=tessdata_dir_config)


如何实现Python中的光学字符识别(OCR)功能?

本文共计195个文字,预计阅读时间需要1分钟。

如何实现Python中的光学字符识别(OCR)功能?

使用pip安装Pillow库、pytesseract和Tesseract OCR:

bashpip install Pillowpip install pytesseractpip install tesseract-ocr

pip install pillow
pip install pytesseract
pip install tesseract-ocr## 若安装失败去下载jaist.dl.sourceforge.net/project/tesseract-ocr-alt/
USAGE
try:
import Image
except ImportError:
from PIL import Image
import pytesseract

pytesseract.pytesseract.tesseract_cmd = '<full_path_to_your_tesseract_executable>'
# Include the above line, if you don't have tesseract executable in your PATH
# Example tesseract_cmd: 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract'

print(pytesseract.image_to_string(Image.open('test.png')))
print(pytesseract.image_to_string(Image.open('test-european.jpg'), lang='fra'))
Add the following config, if you have tessdata error like: “Error opening data file…”

tessdata_dir_config = '--tessdata-dir "<replace_with_your_tessdata_dir_path>"'
# Example config: '--tessdata-dir "C:\\Program Files (x86)\\Tesseract-OCR\\tessdata"'
# It's important to add double quotes around the dir path.若该目录下没有chi_sim.traineddata则去jaist.dl.sourceforge.net/project/tesseract-ocr-alt/

pytesseract.image_to_string(image, lang='chi_sim', config=tessdata_dir_config)


如何实现Python中的光学字符识别(OCR)功能?