How to OCR a .PDF File

Suppose you wanted to digitize a magazine article or a chapter from a textbook. Maybe you’re putting a great research study article up on Moodle, and you want your students or a fellow colleague to read it. You could spend hours retyping the article, or you could simply scan the article and save it as a PDF. However, you still can’t perform a search, because the PDF is simply a scanned image of the real document. To combat that issue, you can use Optical Character Recognition (OCR) technology to make your PDF a searchable document. This allows you to search within a PDF, instead of scrolling through an entire document and scanning for a specific name or term. To OCR a .PDF file, follow these easy steps:

Step 1: Open the .PDF file you wish to OCR in Adobe Acrobat Pro.

Step 2: Locate “Tools” in the toolbar. Click on it.

Step 3: Several different options should appear. Locate “Recognize Text” and click on it.

Step 4: Select “In This File.”

Step 5: Depending on your preferences, you may select all pages, the current page you are on, or a range of pages.

Step 6: When you’ve decided which pages of the PDF you would like to OCR, click “OK.”

Step 7: Adobe Acrobat Pro will begin to recognize text in the file.

Step 8: When the process is finished, save your newly recognized PDF.

Step 9: Open your new PDF in preview and try searching with the PDF using the search tool, located in the top right with a small magnifying glass.

Visit GTS’s tutorial’s page to watch a screenshot of the OCR process.

An important note – When trying to OCR a .PDF file, you may receive this message: “Acrobat could not perform recognition (OCR) on this page because: This page contains renderable text.” If that is the case, safe the .PDF file as a .tiff file, and then save it back to a .PDF file. This should remove renderable text and allow you to OCR your .PDF file.

Comments

Leave a Reply Cancel reply