| Author |
Message |
|
|
Well it could and should be done on the server side as a round robin function IMO. My DNS server did not in any situation try the working IP, it always tried the non working, and that for days. Flushing the DNS cache made no difference, rebooting made no difference, and I'm sure I'm not alone with this problem, which easily could be fixed on the server side.
Edit, added: It may be as it will with all that, I just have to edit the host file when needed.
AFAIK, it is implemented as round robin DNS - it's always looked that way when I've tracked it down. It's worth trying ipconfig/displaydns to find out what your local machine's DNS resolver currently thinks the IP address should be before/during/after a download request - it shows the current TTL timer countdown as well, which is useful.
If displaydns consistently shows the wrong address, then something upstream (DNS server/proxy/ISP) is mis-handling TTL. Or there might, indeed, be a mis-configuration at SETI - that would affect us all, and we can check that by comparing notes here.
There used to be a bug in BOINC, which Ned Ludd and I finally got them the acknowledge and fix in v6.10.33 (March 2009) - If BOINC had already tried a download, and failed, it carried on attempting to download from the same IP address for evermore, rather than re-querying DNS (which would pick up the round robin). It wasn't BOINC's fault - it was a bug in the underlying libcurl library that handles the TCI/IP layer. And it shouldn't be a problem in any current version of BOINC.
I'm on 6.10.18 on both machines, and I refuse to upgrade, so I just have to live with it :-)
In that case, half your downloads will stall, and you will have to do a (carefully-timed) restart of BOINC to free them while the 'right' server is on DNS duty.
That's what I like about crunching for SETI, rather than other projects - it actually feels like you're doing some of the work yourself, not just leaving it to the computer. ;-) |
|
|
|
|
Well it could and should be done on the server side as a round robin function IMO. My DNS server did not in any situation try the working IP, it always tried the non working, and that for days. Flushing the DNS cache made no difference, rebooting made no difference, and I'm sure I'm not alone with this problem, which easily could be fixed on the server side.
Edit, added: It may be as it will with all that, I just have to edit the host file when needed.
AFAIK, it is implemented as round robin DNS - it's always looked that way when I've tracked it down. It's worth trying ipconfig/displaydns to find out what your local machine's DNS resolver currently thinks the IP address should be before/during/after a download request - it shows the current TTL timer countdown as well, which is useful.
If displaydns consistently shows the wrong address, then something upstream (DNS server/proxy/ISP) is mis-handling TTL. Or there might, indeed, be a mis-configuration at SETI - that would affect us all, and we can check that by comparing notes here.
There used to be a bug in BOINC, which Ned Ludd and I finally got them the acknowledge and fix in v6.10.33 (March 2009) - If BOINC had already tried a download, and failed, it carried on attempting to download from the same IP address for evermore, rather than re-querying DNS (which would pick up the round robin). It wasn't BOINC's fault - it was a bug in the underlying libcurl library that handles the TCI/IP layer. And it shouldn't be a problem in any current version of BOINC.
I'm on 6.10.18 on both machines, and I refuse to upgrade, so I just have to live with it :-)
In that case, half your downloads will stall, and you will have to do a (carefully-timed) restart of BOINC to free them while the 'right' server is on DNS duty.
That's what I like about crunching for SETI, rather than other projects - it actually feels like you're doing some of the work yourself, not just leaving it to the computer. ;-)
Well, it's never happened before, or maybe it has but I've forgotten. It must have happened since I already had edited my host file long time ago, but commented out the boinc server parts.
Don't mind me, it's Alzheimers light I guess :-)
____________
/The grumpy old Swede.
"I'm so old, that 98% of all trees in the forest, are younger than I am" |
|
|
|
|
It is not different, it is the same .13. Well that is for my Win 7 with or without flushdns. On my Vista though it changes every time I do a flushdns, no matter if it's been 5 minutes or not in between. The Vista machine did not have any download problems either.
I have tried it with Win XP in a VirtualBox. I have a BIND as nameserver
in my local network. The IP changes every 5 minutes. Do you have a local
nameserver or do you use the nameserver of your ISP ?
|
|
|
|
|
|
I have posted that in the other thread about this problems, is that maybe something better than the current round robin DNS?
Since it's not the first time that we have problems like that here, I wonder if it would not cause less problems if SETI had two different download server URLs, for example dl1.ssl.berkeley.edu and dl2.ssl.berkeley.edu and send both as possible download locations like rosetta is doing for example:
<url>http://srv3.bakerlab.org/rosetta/download/262/avgE_from_pdb.gz</url>
<url>http://boinc.bakerlab.org/rosetta/download/262/avgE_from_pdb.gz</url>
<url>http://srv4.bakerlab.org/rosetta/download/262/avgE_from_pdb.gz</url>
<url>http://srv3.bakerlab.org/rosetta/download/262/avgE_from_pdb.gz</url>
<url>http://boinc.bakerlab.org/rosetta/download/262/avgE_from_pdb.gz</url>
<url>http://srv4.bakerlab.org/rosetta/download/262/avgE_from_pdb.gz</url>
So for a SETI WU it could be:
<url>http://dl1.ssl.berkeley.edu/sah/download_fanout/61/08ap11ae.3480.1703.14.10.29</url>
<url>http://dl2.ssl.berkeley.edu/sah/download_fanout/61/08ap11ae.3480.1703.14.10.29</url>
Don't know how the load balancing works in that case, if the BOINC client picks just one of them, than that would be pretty easy, not need for any big server side changes. If the client starts from the top and tries one after the other, than the sheduler would have to send dl1,dl2 to all even number results (_0, _2,...) and dl2,dl1 to all odd number results. I think it might work better that the current way... but I might be wrong of course.
____________
.
|
|
|
|
|
Well, it's never happened before, or maybe it has but I've forgotten. It must have happened since I already had edited my host file long time ago, but commented out the boinc server parts.
Don't mind me, it's Alzheimers light I guess :-)
It has happened before, but it's an intermittent problem which keeps cropping up, hanging around for a while, and going away again.
I guess that because downloads are sort-of working, and they all go out over the same link, it doesn't show up as a problem on the lab monitoring tools: and they don't know it needs kicking until we kick up a fuss here, or someone on the 'inside' mailing distribution circuit passes on a message. Hint to mods? |
|
|
|
|
|
Now if we could just get the splitters back in high gear......
Meowgrrrrrrrrr.
____________
******
"Ask not, what your kitty can do for you. Ask what you can do for your kitty."
As it is kitten, so shall it be done.
|
|
|
|
|
|
Could it be???
Did the boyz kick something into gear before locking up the lab for the weekend?
The Cricket graphs just maxxed for the first time in a while and splitter speed is up.
More power, Scotty!!!!
____________
******
"Ask not, what your kitty can do for you. Ask what you can do for your kitty."
As it is kitten, so shall it be done.
|
|
|
|
|
|
Well, it was nice while it lasted.
They're back to producing just a trickle again.
____________
Grant
Darwin NT.
|
|
|
|
|
Well, it was nice while it lasted.
They're back to producing just a trickle again.
Yeah, shucks.
Dunno what's limiting it.
____________
******
"Ask not, what your kitty can do for you. Ask what you can do for your kitty."
As it is kitten, so shall it be done.
|
|
|
|
|
|
Nothing changed with 208.68.240.13. The forwarding of port 80
doesn't work. Interesting is the forwarding of port 443 (https)
is working and connects me to vader.
|
|
|
|
|
|
Well, little work is making it's way down to the kitties.
Not that they cannot connect or anything, but the scheduler is not sending out any tunas.
If my cache is running down, some faster fishes than mine are gonna be flopping on the beach soon.
____________
******
"Ask not, what your kitty can do for you. Ask what you can do for your kitty."
As it is kitten, so shall it be done.
|
|
|
|
|
|
Yep, both my machines are getting work. Just not very much of it & only on every 10-20th request. Both caches are running down.
____________
Grant
Darwin NT.
|
|
|
|
|
|
First time now in 20 hours that the splitters seems to be building up a cache of Results ready to send, and the bandwidth utilization is above 90 Mbits/sec.
Let's hope it can stay this way for a bit longer than the last time.
____________
/The grumpy old Swede.
"I'm so old, that 98% of all trees in the forest, are younger than I am" |
|
|
|
|
|
My cache ran out last night due to "no tasks available", Boinc has managed to grab a few tasks to keep going today and is now filling the cache back up quite well. Pretty much all shorties, my CPU is flying through them. |
|
|
|
|
Let's hope it can stay this way for a bit longer than the last time.
Fingers crossed.
Now if they could sort out the dodgy download server all should be right in time for the next outage.
____________
Grant
Darwin NT. |
|
|
Helli Volunteer tester
 Send message
Joined: 15 Dec 99 Posts: 692 Credit: 66,779,299 RAC: 2,362

|
|
Similar here. Cache ran empty three hours ago, but 1054 WU stuck in
download queque: HTTP error. No Download actually...
Helli |
|
|
|
|
|
Suspend network activity, then re-enable it a couple of seconds later.
Usually gets things going for me.
____________
Grant
Darwin NT.
|
|
|
Helli Volunteer tester
 Send message
Joined: 15 Dec 99 Posts: 692 Credit: 66,779,299 RAC: 2,362

|
|
Yup, a few are flowing. Shorties. 5 minutes each. But if you can do
16 Workunits in five Minutes then you have to be very patiently. ;-)
Helli |
|
|
|
|
|
It was nice while it lasted, looks like about 4 hours ago we ran out of MB work to split.
____________
Grant
Darwin NT.
|
|
|
|
|
|
Well its Monday morning and the splitters are running out of tapes to split, which is a good thing, it means they more or less got the right number of tapes loaded on Friday.
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe? |
|
|