Google Cloud Vision
2 Medi 2021
,When the collection of questionnaires was first assessed it became clear that there were too many documents for volunteers to transcribe during the length of the project. We considered simply digitising the remaining documents and adding them to the Museum’s Collections Online website, but we really wanted the content of the questionnaires to be searchable. We therefore concluded that a blended solution where we supplemented the work of volunteer transcribers with an automated solution might be the best way forward.
After discussions with our software developers, we decided to look at the Google Cloud Vision API as a possible solution to automating the transcription of the documents. This is a powerful machine learning tool that, among other things, can read printed and hand-written text, and create a metadata set that is indexed and therefore searchable by members of the public. It also records the properties of the document and assesses the nature of the text transcribed.
Proposed Workflow
Currently our object records are stored in our Collections databases and our images are stored in our Digital Asset Management System (DAMs). Images and object records are automatically extracted into a middleware product called the CIIM, designed by Knowledge Integration. Within the CIIM the object records are linked to their images and pushed through to Collections Online.
We needed to create a workflow pipeline that allowed us to identify the required images in our DAMs (iBase), push them through to Google Cloud Vision in the correct order and then store the enhanced media records in the CIIM so that the marked-up document and the transcription metadata can be displayed alongside the image online.
Software developments
None of the developers we work with had any experience of using Google Vision in this way so there was a certain amount of experimentation required to refine the workflow.
We soon discovered that to access the required functionality we needed to upgrade to the latest version of the CIIM middleware and expand our metadata framework, with new corresponding lookup fields within iBase.
Currently, all images are stored within the DAMs on one level, with no hierarchical relationships that allow us to define relationships such as a book and its pages. In addition to this we have only ever been able to identify the ‘main image’ for groups of images, with no functionality in place to sequence multiple pages. We therefore needed to develop a new framework dedicated to pages or ‘connected items’ as we have termed them. When the software developments are complete, we will be able to link pages via the accession number and store a dedicated set of metadata specifically for the connected items.
Results
The workflow is now in place to automatically select images from the DAMs and push them through Google Cloud Vision and back into the middleware.
We have put tags in place to identify which items should be sent to Google Cloud Vision and in what order they should be sequenced, allowing us to test the pipeline and confirm ‘proof of concept’.
CIIM pushes these marked images through Google Vision in batches. We have clearly defined limits on amounts to push through (both in the CIIM and our Google Cloud Vision settings) that prevent us exceeding the free service offered by Google.
This data is then modelled and published on the CIIM User Interface.
Initial results for the transcribed documents are mixed. The documents contain both typed and handwritten text in both Welsh and English, and some contain small sketches. Google Vision coped well with printed text in both languages, reasonably well with English handwritten but less well with Welsh handwritten.
Google Cloud Vision created metadata for both the text and the document itself, and this latter mark-up metadata was enormous, with thousands of lines of code. When displaying this within the CIIM middleware we need to decide how much of the metadata should be made available within the record itself, and where to store the rest.
Volunteer transcribers
Whilst all the software development work was going on a group of volunteer transcribers started working through batches of documents and producing transcriptions in Word document format. It is not possible to edit the Google Cloud Vision documents so the work the volunteers are doing cannot be integrated into the enhanced metadata.
Instead, we are pasting the transcriptions directly into our back-end CMS in the ‘Description’ field. This text is then pulled through directly into Collections Online where it can be displayed alongside the document images.
Next steps
The next phase of the project will be to display the Google Cloud Vision enhancements through Collections Online so that the text can be searched using our standard search interface. We also need to ensure that both the volunteer transcriptions and the automated transcriptions can be viewed and searched in the same way.
We will deliver zoom images using IIPImage, a server system for web-based streamed viewing and zooming of high-resolution images. These images will be IIIF compliant, the universal image standard, allowing us to utilise one of the many available IIIF web viewers to display the Google Cloud Vision enhancements through Collections Online. In addition, we are keen to see if Google Cloud Vision’s transcriptions improve over time. We are experimenting with specifying the language of the document by turning on ‘languageHints’ rather than just enabling auto-language detection. Will this improve the results for the Welsh handwritten material? We are also keen to see if pushing thousands of Welsh documents through the Machine Learning interface has any effect on the quality of the transcription. We plan to re-run the first 100 documents through Google Vision again at the end of the project to compare the quality of the transcription and identify if there are any noticeable improvements.
Long term this could be a hugely important development allowing us to digitise thousands of pages of documents held within our collections and automatically transcribe and make the text searchable in both Welsh and English.