Many translators working outside the field of legal translation are surprised how often our source documents in the legal realm are still scanned images, the original Word, Excel or PowerPoint versions of which are unavailable to us.
Not only unavailable to us translators, but unavailable to the litigators too. This is because documents are still often delivered to the attorneys in paper form and then scanned, Bates stamped (although not always in that order despite the ease of electronic stamping with today’s technology) and organized in a database.
E-Discovery is certainly changing things but Blake Miller and Mary Mark of the Utah Bar Journal discuss the fact that many attorneys are slow to embrace the use of electronically stored information (ESI), in its native form, opting instead for image files or even good old fashioned paper.
As many translators can attest, legibility is still often an issue. Scanned files frequently show signs of having been annotated, faxed and re-faxed, 3-hole-punched, xeroxed with little attention paid to the copier’s contrast settings, dog-eared, recopied with post-it notes covering essential parts of a sentence, positioned askew on the scanner cutting off text on the sides of a page, etc., etc. Thus a magnifying glass and the notation “[illegible]” remain useful tools in legal translation.
Now, OCR technology has come a long way and many programs are extremely sensitive in detecting hundreds of fonts and languages even when quality is less than optimal. And the formatting anomalies caused when an OCR program tries to approximate tabs and columns and tables are gradually being worked out. So typing a document from scratch would seem like an old and ineffecient way of doing things.
OCR-ing images is certainly a time saver. But we still have to proceed cautiously. Two past projects come to mind. With the first, the translator generated a Word file from the PDF scan of a 100-page Spanish file using an OCR program, which did a good job despite the font in the source being small and the contrast low. But the final product contained some odd spellings of names, usually caused by eliding an r and n into an m to give us the name Amez instead of Arnez. Or mistaking an e for an o, compounding the problem and giving us Amoz. Also, although the OCR software did its best to guess at indents and margin widths, it made them slightly different on every page.
Problems on the second project resulted when the default setting on a German translator’s Adobe Acrobat automatically OCR’d the ugly looking 200-page scan we had sent him. He did not realize until half-way through the project that he had not even set eyes on the actual source document. This meant he believed the original contained hundreds of typos and spelling errors when it did not. Words appeared that were not at all German and the resulting translation was abysmal. The formatting was abominable to boot. Fortunately we had built enough time into the project to have the translation reworked and reviewed by two other translators.
All project managers have horror stories like this. They remind us, both agencies and translators, that we must remain up-to-date on — and on top of — the tools we use in our jobs.










3 responses so far ↓
1 MT // Nov 14, 2008 at 10:13 pm
Great post!
Unlike machine translation, I think there is real hope for spectacular OCR technology within the next five to ten years. It’s come a long way already, and every year there are new software titles that seem to do an even better job.
The up side to scanned documents, however, is that they are often repetitive but cannot be used with TM software. Combine those features with the rate that a law firm pays (usually a bit higher), then many discovery projects are little gold mines.
-MT
2 Glenn // Nov 14, 2008 at 11:08 pm
Masked,
Thanks for reading despite my lull in posting recently. And thanks for your insightful comment.
Yes OCR has made great strides. I remember working in an office in 1995 where we tested if it would be faster to OCR a page of text and then retouch it or retype it altogether. Retyping was always quicker. No longer.
And you’re right, scanned files precludes using TM, as does the speed and volume law firms often require, which necessitate sharing a translation among many translators.
Not ideal of course, but many on the project management side of legal translation jobs have fine tuned the process to be able to handle these type of jobs pretty well, all the while keeping an eye on the latest technology, which hasn’t assisted us as much as it has other fields of translation.
3 Andres Heuberger // Feb 21, 2009 at 2:35 pm
Good post! I am linking to it from http://blog.fxtrans.com.
Leave a Comment