Contesters,
Back in December there was talk on this reflector of updating the 10
meter contest records. Ken Harker WM5R decided he wanted to take on
the project - a big project if you try to do the section and DX
records too - since there are about a million categories in the 10
meter contest. OK, there really are only 10 but when you start
generating records for all 10 for each ARRL section it seems like
many more than 10.
The way Ken decided to start was to create a database with all of the
scores from the ARRL web page - I was able to provide him with more
data as I have a database of scores from the ARRL going back a few
more years in an old access database.
Then he started looking at what it would take to get all of the ARRL
10 meter contest scores ever submitted into a database.
This effort has gone well - and Ken has done tons of work and is
getting fairly close to being finished - he has also gotten some help
from N2IC. Ken found that he can OCR the results if he scans them in
from QST and it works pretty well - he then has to do some manual
clean up on the resulting text files. This seems to be faster than
typing all of it in my hand since some of the clean up can be done
with a text editor by doing some find and replace operations as many
of the errors repeat often throughout the file. Ken has also written
a number of perl scripts that aid in the clean up of these files. Some
years work better than others, depending on the point size and font
used for the score listings, the age of the magazine being scanned, or
the quality of the original page layout. Ken has also had help from
NC1L at the DXCC desk to identify a handful of stations whose DXCC
entity status was ambiguous (i.e. /JD1) who submitted scores in the
years when the DXCC entity names were not included in the DX score
listings.
As it turns out this is something that I have been wishing we had for
all contests but the big block has been how to get all of the data
into the database. We have looked at what can be done on the OCR
front - using the QST CD-ROMs - but that so far has not proven to be
very useful as the scans on most if not all of the QST CD-ROMs are not
very high quality. They are on the order of 200dpi or less and all of
the OCR software we have tried so far has done a poor job with those
scans. In fact some of the years are difficult for a human to read.
Perhaps using more up to date or better OCR software we might be able
to lessen the manual work that needs to be done to get from printed
results to electronic data.
While I doubt that WM5R will be able to continue spending as much time
on other contests after he is finished with the 10 meter contest - I
would like to continue working on more contests using the tools that
Ken has created. It would be great if I/we could get some help with
this effort - the bulk of the work is in getting from the printed
results of a contest to electronic data.
Currently I have someone heading up this effort for the ARRL 160 and
ARRL RTTY contests. I am sure he will need some help.
I would then like to move on to other contests. The idea is to create
a database that can be put online and used by anyone to slice and dice
the contest results for the entire life of a contest.
If you think you would have the time to clean up a few text files that
have been created from OCRed scans or if you are an OCR genius and can
create better OCR results I would love to hear from you.
I can coordinate the effort and we can work through the contests until
we have them all in a database.
--
George Fremin III - K5TR
geoiii@kkn.net
http://www.kkn.net/~k5tr
_______________________________________________
CQ-Contest mailing list
CQ-Contest@contesting.com
http://lists.contesting.com/mailman/listinfo/cq-contest
|