About Text Recognition in Keep It

Keep It will perform text recognition on PDF documents and images, (including attachments, as of v1.7), using computer vision and machine learning technologies to minimize work and produce the most accurate results. 

Keep It doesn’t modify PDFs or convert images to make them searchable, but rather indexes the text so that it can be found again, and stores that text in iCloud to save repeating the work on other devices.

Keep It has always been able to index the text in PDFs that have selectable text or had OCR performed on them already, and does not perform unnecessary text recognition on those, or on images that do not appear to contain any text.

On iOS, Keep It will temporarily download items to perform text recognition on them, while the app is connected to Wi-Fi (unless the "Use Mobile Data for Indexing" setting is enabled).

Text recognition may take some time — see below for more details.

PDF Documents

For PDFs, Keep It will not perform text recognition if there is indexable text in the document already. Instead, the text stored in the document will be indexed so that it can be searched.

After performing text recognition, Keep It does not modify PDF documents to add an invisible text layer, but instead indexes that text so it can be searched. The text is also stored in iCloud, if in use, to avoid repeating that work on other devices. This data will take up minimal space, typically between 1 and 2 kilobytes per page.

Larger PDF documents and those with more complex layouts may take some time. While it might take a few seconds to recognize the text on a single page, larger and more complex documents could take a few minutes. 

Keep It always performs text recognition in the background (and on Mac, in a completely separate process). To see progress on Mac, choose Window > Activity from the menu and check whether Keep It is performing any “Fetching metadata” operations. On iOS, tap the status bar below either of the lists.

Images

For images, Keep It uses computer vision to detect areas of text in the image, and only performs text recognition on any areas found. Keep It will take steps to refine the quality of the text, but only text where there is a high contrast between foreground and background colors is likely to yield good results.

As with PDFs, image files are not modified, but rather the text is indexed so that it can be searched, and stored in iCloud (if in use) to avoid duplication of work across devices.

Text recognition for images may not be as accurate on macOS High Sierra and iOS 11 as on Mojave and iOS 12 or later, due to advancements in Apple’s computer vision technology.

Screenshots

OCR works best with higher resolution images such as photos, and for text where there is a high contrast between foreground and background colors. Screenshots may produce good results where larger fonts are used, or the screenshot was taken from a higher resolution screen, such as a Retina display.

Attachments

Keep It will perform text recognition if necessary on attachments to notes, RTFD files, and mail messages.

Handwriting

Handwriting is unlikely to produce usable results.

Seeing the Recognized Text

There is no way to see the recognized text in the app, but it will be indexed so that it can be searched.

Languages and Scripts

Text recognition relies on knowing which language it needs to recognize. By default, Keep It uses the same language as your Mac or iOS device. This can be changed to another language, or a script that encompasses many related languages (e.g. Latin). 

It is not possible to override the language on a per-document basis, or to specify multiple languages or scripts, but most of the scripts also include support for English, except Cyrillic.

In cases where languages can be written both horizontally and vertically (e.g. Japanese), the vertical version may be used as a secondary language.

To change the language, or choose a script:

Keep It will offer to reindex documents when the language is changed.

Disabling Text Recognition

Text recognition can be enabled or disabled on a per-device basis. To disable text recognition: