The Department of Internal Affairs

Te Tari Taiwhenua | Department of Internal Affairs

Building a safe, prosperous and respected nation



 

Text corrections new feature for Papers Past


12 September 2024

Text corrections are now available in the Newspapers section on Papers Past to improve the quality of searchable text and experience for users of the National Library platform.

Papers Past delivers digitised full-text New Zealand and Pacific newspapers, magazines and journals, books, and other formats including parliamentary papers and letters, and diaries.

The Papers Past site uses Optical Character Recognition (OCR) software to read the source images that make up the digitised collections. The OCR software generates a text file by recognising the shapes of the letters. This text file is then used to support searching across the collections. However, OCR-generated text isn’t always correct. Issues with poor-quality paper, small print, mixed fonts, multiple-column layouts, or damaged pages may cause poor OCR accuracy.

“Text corrections has been the most requested feature by users on Papers Past,” says Tim Kong the Director, Digital Experience at the National Library.

“This functionality allows logged-in users the ability to correct OCR text so it better matches the original page image. Our intention is to improve the experience across the site and encourage engagement with these historic documents.

“This feature will enable better search results and a richer experience for all users.”

Anyone can help improve Papers Past if they have created an account and are logged in. You can use the same RealMe account used for other government services to create a Papers Past account.

“We have lightly refreshed our page design and content to support text corrections and using Papers Past,” says Tim Kong.

“This includes te reo Māori navigation and page titles and expanded help resources.

“Researchers can continue to use Papers Past as they ordinarily would, without creating an account and correcting text. The search functions haven’t changed.”

Community moderation of the text corrections feature is an established model used successfully by similar sites. Trove, the National Library of Australia’s equivalent to Papers Past, has been using community-moderated text corrections for over a decade. The text corrections are made to an automated search index, not the source files themselves.

“In addition to community moderation, we do have daily reporting set up to pick up unusual text correction activity. Also, the editable index is a copy, and unwanted changes can be rolled back to their original state.

“We’ve involved Papers Past users in creating the new structure and content and will continue to refine the site based on user feedback.”

Explore Papers Past