Optical character recognition

Optical character recognition (OCR) refers to the process of electronically or mechanically converting images of printed, handwritten, or typed text into machine-encoded text. This can be done by scanning a document, taking a photo of it, capturing text from a scene-photo (such as a picture of a sign or billboard), or extracting text from subtitle overlays on images (such as those used in television broadcasts).

OCR, or optical character recognition, is a popular technology used for digitizing printed text records, such as invoices, bank statements, passport documents, business cards, and mail. It converts images of typed, printed, or handwritten text into machine-encoded text, making it easier to edit, search, store, and display electronically. OCR is widely used for data entry and machine processes such as text-to-speech, machine translation, cognitive computing, and text and data mining. OCR is a subject of research in the fields of artificial intelligence, pattern recognition, and computer vision.

In the past, early versions of OCR required individual training with images of each character and could only recognize one font at a time. However, more advanced systems are now widely available that can achieve high levels of accuracy in recognizing most fonts, and can accept a variety of digital image file formats as input. Some of these systems can even reproduce the original page's format, including images, columns, and other non-textual components.

