In my experience, optical character recognition (OCR) works best with print of recent vintage. Thus, the older the typeface, the greater the chances that the program will bungle particular letters. In particular, I have found that attempts to digitize words printed in Gothic type (what folk in the German-speaking world call Fraktur) often result in a large number of mistakes. For example, OCR software will read the ‘ch’ combination as a ‘d’ and the long ‘s’, beloved of Benjamin Franklin and his ilk, as an “f’.
As a result of this phenomenon, I found it hard to work with digital copies of older works that have been formatted as ‘.text’ files. Indeed, there were many instances when I found it quicker to type out a fresh copy of a given document than to locate, and correct, the many mangled words in pages produced by OCR.
Recently, when I was in the market for a fresh translation of a long passage from an old German book, I asked ChatGPT to correct the OCR-made text file of the work that I had found on Archive.org. On the whole, I am happy to say, this method worked well. Even though the text contained a number of phrases written in Latin and sentences in seventeenth-century French, ChatGPT did a good job of correcting mistakes made by the OCR program.
Nonetheless, every once in a while, ChatGPT took liberties with the author’s choice of words. In one case, it transformed a well-established word (Überlieferung) into one of its own coinage (Werbelieferung*).1 In a second instance, it replaced a commonly used word (Erdkreis) with one (Erdball) that, while related, was not, strictly speaking, a synonym.2 Worst of all, it changed the meaning of an important noun, turning ‘pikes’ (long pointy things made mostly out of wood) into ‘swords’ (much shorter pointy things made mostly out of steel).
ChatGPT also altered grammatical forms. Thus, ‘had derived’ (geschöpft haben) became ‘derived’ (schöpften) and ‘differed quite significantly’ (recht wesentliche differierend) was truncated to form ‘quite significantly’ (recht wesentliche).3
For Further Reading:
To Subscribe, Share, or Support:
Überlieferung refers to the handing down of something, such as a tradition or a practice, from one generation to the next. If it found a home in the lexicon of the German language, Werbelieferung* would mean something like “delivery of advertising.”
Both Erdkreis and Erdball can be translated into English as ‘globe’. However, while the former refers to the full extent of our planet (as in ‘all over the globe’) the latter describes a spherical model of the Earth (as in ‘the globe on the geographer’s desk’.)
Cousin to the English word ‘scoop’, the verb schöpfen can mean ‘create’, ‘draw’, ‘invent’, or ‘derive’.