OCR madness .....

Information and Advice

Moderator: Global Moderators

LesleyB
Posts: 8184
Joined: Fri Mar 18, 2005 12:18 am
Location: Scotland

Post by LesleyB » Sun Dec 03, 2006 6:13 pm

I'm sorry folks, I'm crackin' up here:
According to Reid, automating document capture via the system is helping the company reap big gains on efficiency and accuracy.
They cannot be serious!!!
:lol: :lol: :lol: :lol: :lol: :lol: :lol: :lol: :lol: :lol: :lol: :lol: :lol:

DavidWW
Posts: 5057
Joined: Sat Dec 11, 2004 9:47 pm

Post by DavidWW » Sun Dec 03, 2006 6:13 pm

LesleyB wrote:Hi Sally
With regard to the link you posted above...did you understand the article? If so, you have waaaaay more patience and willing brain cells than I do by this time on a Sunday afternoon ! I dipped into a paragarph ...and promptly gave up:
AncestryDPS then turned to integrator DoxTek, which recommended that the company implement the Ascent platform and INDICIUS solution from Kofax because of these products’ abilities to process and classify unstructured (e.g. handwritten census forms), semistructured (e.g. printed birth certificates containing some handwritten data), and structured (e.g. telephone directories) documents. The availability of an open application program interface (API) from the vendor helped clinch the deal.
:shock: eh??
I think they may already have invented "Tichmeal Phrose" or gobbledegook as it is also known.... :lol:

Best wishes
Lelsey
Whatever the ease or difficulty of understanding the article, the critical component is the reference to Ancestry's use of OCR and allied technology.

David

SarahND
Site Admin
Posts: 5647
Joined: Thu Apr 27, 2006 12:47 am
Location: France

Post by SarahND » Sun Dec 03, 2006 6:25 pm

Okay, with David's comment about how this one may be a comment applying to the schedule as a whole...
John Lindsay in Montrose is a Coal Fitter A Tichmeal Phrose Applud Person And Employ Loal Gar Scholar
How about Coal Fitter A common phrase applied persons employed as coal fire installers

DavidWW
Posts: 5057
Joined: Sat Dec 11, 2004 9:47 pm

Re: OCRs

Post by DavidWW » Sun Dec 03, 2006 6:29 pm

sporran wrote:Hello all,


the indexing by Ancestry has caused much amusement with English and Welsh censuses long before they turned to Scotland. However, I am certain that indexing is done by humans and OCR is not to blame. It would appear that little or no training takes place and quality-control is missing.

OCR has been used for a long time with special characters, such as those printed on the bottom of cheques, and it is becoming more successful with typewritten fonts. PDAs (hand-held computers) also use OCR but generally they have to "learn" a person's writing and the person must write each character separately. The technology used in Palm Pilots seemed better than the Compaq PDA that I formerly used, where anything other than Ladybird-style letters caused problems. Even the best OCR software will have severe problems with cursive handwriting; and lots of lines, such as on census forms, would add to the woe.

If OCR were any good with handwriting, it would be used. As FreeBMD puts it, with my apologies for the dreadful conversion of an acronym into a verb:
"Handwritten records
No-one thinks that OCRing handwritten records is feasible."


Regards,

John
Hi John

See Sally's post re the www reference that she found in relation to proven use by Ancestry of OCR and allied technology ........

This reinforces my belief on the basis of many '51 and '61 entries that I have now seen, that however untrained the personnel, be they in the Indian sub-continent or SE Asia; however deficient their lack of knowledge of the English language and never mind Scottish names, place and personal; however deficient the lack of training; however lacking the provision of lookup tables for occupations, locations, and surnames; and however lacking the quality control procedures involved, it verges on the extremely unlikely that many of the 1851 and 1861 census entries on Ancestry could have been produced other than by a method involving in some manner OCR technology.

That's not to say that previous inanities in earlier English datasets available on Ancestry weren't the result of untrained, unsupervised human beings, most probably furth of the UK, with no proper QC procedures !!

David

DavidWW
Posts: 5057
Joined: Sat Dec 11, 2004 9:47 pm

Post by DavidWW » Sun Dec 03, 2006 6:58 pm

LesleyB wrote:I'm sorry folks, I'm crackin' up here:
According to Reid, automating document capture via the system is helping the company reap big gains on efficiency and accuracy.
They cannot be serious!!!
:lol: :lol: :lol: :lol: :lol: :lol: :lol: :lol: :lol: :lol: :lol: :lol: :lol:
They appear to believe so.....

I can only quote the new Scottish county of Resburlishire as evidence :!: [comp03] [computer]

David

emanday
Global Moderator
Posts: 2927
Joined: Tue May 30, 2006 12:50 am
Location: Born in Glasgow: now in Bristol

Post by emanday » Sun Dec 03, 2006 7:06 pm

I can only quote the new Scottish county of Resburlishire as evidence
Aah! Could that be where aw ma missin rellies are hidin :!: :lol: :lol: :lol: :lol: :lol:
[b]Mary[/b]
A cat leaves pawprints on your heart
McDonald or MacDonald (some couldn't make up their mind!), Bonner, Crichton, McKillop, Campbell, Cameron, Gitrig (+other spellings), Clark, Sloan, Stewart, McCutcheon, Ireland (the surname)

DavidWW
Posts: 5057
Joined: Sat Dec 11, 2004 9:47 pm

Post by DavidWW » Sun Dec 03, 2006 8:05 pm

emanday wrote:
I can only quote the new Scottish county of Resburlishire as evidence
Aah! Could that be where aw ma missin rellies are hidin :!: :lol: :lol: :lol: :lol: :lol:
I can handle and understand transcription errors when the several thousand possible Scottish surnames are involved.

I can equally well handle transcription errors involving the near 1,000 Scottish parish names, never mind the substantial number of additional non-parish place names within those near 1,000 parishes that regularly appear in the censuses, in place of the parish that should be involved ....

But, hopefully, understandably, I have a major problem in understanding how one of the 33 pre-1974 Scottish counties can be so mis-transcribed ....

Just who is kidding whom in terms of the lookup tables provided to the transcribers?, if there were indeed such?, or the quality control that would have us believe that there was a Scottish county by the name of Resburlishire ....

Incidentally, on the image involved, it's very clear that the county name is Roxburghshire, - in other words, by no means is it the case that the hand of the enumerator could be reasonably be interpreted as "Resburlishire".

David
Last edited by DavidWW on Mon Dec 04, 2006 10:15 am, edited 1 time in total.

pinkshoes
Posts: 461
Joined: Thu Aug 11, 2005 6:28 pm
Location: Yorkshire

Post by pinkshoes » Sun Dec 03, 2006 11:48 pm

If anyone's looking for a Grace McGregor born in Bermfshire abt 1781, she's in Glasgow in 1851 :roll:

Bostwychyz
Pinkshoes :lol:

emanday
Global Moderator
Posts: 2927
Joined: Tue May 30, 2006 12:50 am
Location: Born in Glasgow: now in Bristol

Post by emanday » Mon Dec 04, 2006 12:15 am

Thisisgetinridiklus!

I just know someone is going to find out that Scotland is something entirely different. That will be the final straw :!:
[b]Mary[/b]
A cat leaves pawprints on your heart
McDonald or MacDonald (some couldn't make up their mind!), Bonner, Crichton, McKillop, Campbell, Cameron, Gitrig (+other spellings), Clark, Sloan, Stewart, McCutcheon, Ireland (the surname)

DavidWW
Posts: 5057
Joined: Sat Dec 11, 2004 9:47 pm

Post by DavidWW » Mon Dec 04, 2006 10:12 am

Robt H Hall in St Ninians is a Proppirval Phrendogist

Henry L Lewis in Edinburgh St Andrew is a Lectrirer On Phrenotoye

Robert Burn in Stirling is Retired Phr Royalhary

Hugh Connel in Galston is a Sovaneno Phranaker

Joanna W Lennox in HAmilton is (of The Faculty Of Phrjeicitian Boycon)

John Lindsay in Montrose is a Coal Fitter A Tichmeal Phrose Applud
Person And Employ Loal Gar Scholar


Elisabeth Padge in Abbotshall is a Hawker Of Phread & Needles

John Kerr in Annan is a Farmer Of 150 Acceres Feplaying Phree 3 Labourers

Archibald Millar in Leith South is a simple Phryhman


Any more bids :?: [book]

David