Well, here is what I found last night.
With two computers hooked up with both of them generating QSOs
in the simulator debug mode (about 1500 QSOs/hour total rate)
the network works pretty good - for about 30 minutes.
Then things slow to a crawl - exactly the symptoms that were reported
by those having problems.
I am still in the process of finding the exact cause.
Messages sent over the network are saved in a buffer in the computer
that originated the message. When the message makes it around the
network, the originating computer will delete the message in its
buffer. It resends messages left in the buffer after 4, 14, 24 ...
seconds. If it resends 10 times without a response, it puts the
multi network broken message up.
For some reason, this works fine if it is the first or second time
a message was sent. However, once a message is resent the second
time, it NEVER gets deleted from the buffer (even though the data
appears to make it to the other computer eventually). This can
happen once or twice without problem, but at some point a whole
bunch of these occur at the same time, and the network gets jammed
up with all of these repeat messages.
Obviously in my prior testing, I didn't run it long or enough (or
hard enough) to see this failure.
I improved my data collecting ability and ran a new test overnight.
The failure was the same and I found that each computer sent about
2.7 megabytes of data, but it appears only about a third of that
actually was received on the other computer (this is bit by bit).
I hope to analyze this data at lunch time and come up with some
good ideas on root cause.
I am very close to solving this problem AND making the network feature
one of TR's strengths instead of a liability. The basic idea of
doing the retries should eliminate problems from RFI and computers
that go down for a minute. It just has an implementation problem.
Hopefully I will have some good news tomorrow morning - and
a new release well ahead of the CQ WW contest.
73 Tree N6TR
tree@contesting.com
|