The End of All Things (Oct 30 2008)

Message boards : Technical News : The End of All Things (Oct 30 2008)
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Profile Neil Blaikie
Volunteer tester
Avatar

Send message
Joined: 17 May 99
Posts: 143
Credit: 6,652,341
RAC: 0
Canada
Message 825576 - Posted: 1 Nov 2008, 4:09:20 UTC

Unless someone has kicked the server, I have not had any problems with uploads / downloads at all.

Had 2 work units earlier today that required a manual "retry now" but went through straight away once I did that.

I have a large enough cache to keep me going and that is fine until either Monday or if something major needs sorting until the outage Tuesday.

Patience I think is the key to getting uploads through, they are slow at times but they do work eventually. Downloaded 4 work units on a test and they came straight away.
ID: 825576 · Report as offensive
gomeyer
Volunteer tester

Send message
Joined: 21 May 99
Posts: 488
Credit: 50,370,425
RAC: 0
United States
Message 825584 - Posted: 1 Nov 2008, 4:55:07 UTC

Well, FWIW my 6 Windoz machines are now OK in all respects as is Uploading and Reporting on all 10 machines. However 3 of my 4 Linux boxes still cannot recieve a single byte of download. The machines complete work, upload results, report, and ask for more work per normal. More work is "promised" but it just goes into the building queue of DL's and nothing at all comes down the wire. They're getting your basic "Temporarily failed download of . . . : http error." I've done all of the usual things like restart the client, warm boot, cold boot, I even power cycled the router. I flushed DNS, at least I think, I'm NOT a linux person really, but still nothing.

I've finally set NNT and will wait 24 hours in case it is a DNS thing and something needs to propogate or time out.

It's hard to believe that 3 boxes broke at the same time on my end when nothing was changed on any of them, but it's also hard to believe that, if it's on Berkeley's end, the masses are not screaming bloody murder in Number Crunching. At this point I don't know what to think. Also, why is one Linux box OK (after the backlog cleared up) but the others are not? They are all running the same distro.

The only thing that comes to mind is I seem to remember something similar way back when there was some trouble with the even/odd work distribution scheme and one of the two stopped feeding. But why only the Linux boxes? Wierd.
ID: 825584 · Report as offensive
ruben

Send message
Joined: 30 Oct 08
Posts: 2
Credit: 55,124
RAC: 0
Belgium
Message 825664 - Posted: 1 Nov 2008, 11:20:17 UTC

Same for me, windows ok/linux not ok.
ID: 825664 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 825676 - Posted: 1 Nov 2008, 12:30:37 UTC - in response to Message 825664.  

I can third it, Windows, Mac: Fine, Linux: nothing.

ID: 825676 · Report as offensive
Profile Azhren
Avatar

Send message
Joined: 28 Aug 02
Posts: 6
Credit: 5,978,210
RAC: 7
Australia
Message 825734 - Posted: 1 Nov 2008, 17:40:30 UTC

I have had no work for at least 2 days. Is it the server or is it a Debt sort of thing? How do I find out if it is a LTD(?)?
Vista 64bit
Radeon HD5700
4GB RAM
AMD Athlon II x4 640 Processor 3.23GHz
ID: 825734 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 825736 - Posted: 1 Nov 2008, 17:45:24 UTC - in response to Message 825734.  

I have had no work for at least 2 days. Is it the server or is it a Debt sort of thing? How do I find out if it is a LTD(?)?

Best to post this question in the Number Crunching forum for help......

Post there, and copy some of the Boinc messages that it sends you when it does a work request....
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 825736 · Report as offensive
Profile Lemat
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 16
Credit: 14,968,143
RAC: 0
Poland
Message 825756 - Posted: 1 Nov 2008, 18:51:05 UTC

208.68.240.13 seems to be not responding
and the boinc client does not attempt to connect to the secondary IP 208.68.240.18

the *temporary* solution is to modify /etc/hosts adding one line:
208.68.240.18 boinc2.ssl.berkeley.edu.

ID: 825756 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 825758 - Posted: 1 Nov 2008, 19:08:36 UTC - in response to Message 825756.  

208.68.240.13 seems to be not responding
and the boinc client does not attempt to connect to the secondary IP 208.68.240.18

the *temporary* solution is to modify /etc/hosts adding one line:
208.68.240.18 boinc2.ssl.berkeley.edu.

That'll be Vader again.

Note that according to the Server Status Page, there's very little work available at the moment (disks full, probably), so even the temporary fix - which seems like a good one - won't necessarily get you any.
ID: 825758 · Report as offensive
Profile speedimic
Volunteer tester
Avatar

Send message
Joined: 28 Sep 02
Posts: 362
Credit: 16,590,653
RAC: 0
Germany
Message 825783 - Posted: 1 Nov 2008, 20:17:05 UTC

That'll be Vader again.


indeed, like mentioned before...

Note that according to the Server Status Page, there's very little work available at the moment (disks full, probably), so even the temporary fix - which seems like a good one - won't necessarily get you any.


...at least it helps to get those through that are stuck.

The main question is why the round robin dns does not work for some clients.
Here's what I experience:
For me all my win-clients do download, which means that they got bane in their (windows internal) dns cache.
The linux clients in office do download after some trys - those are connected directly to dns-server and get one time vader, next time bane.
My linux-box at home doesn't connect - this one gets dns-infomation from a router which seems to have a cache and vader is in there.
So what's the difference between those two caches? (windows / router)

I read about a internal dns cache in the boinc-client... what about that one?

Wouldn't it be easier to store those two IPs in client_state.xml (instead of the single dns-name) and let the boinc-client do the round robin stuff?

mic.


ID: 825783 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 825786 - Posted: 1 Nov 2008, 20:22:36 UTC

Somehow, even though I'm set to "English" as my language of choice, when I came to the "technical news" page, all my headers turned to Spanish!
.

Hello, from Albany, CA!...
ID: 825786 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30651
Credit: 53,134,872
RAC: 32
United States
Message 825794 - Posted: 1 Nov 2008, 21:24:29 UTC - in response to Message 825786.  

Somehow, even though I'm set to "English" as my language of choice, when I came to the "technical news" page, all my headers turned to Spanish!

I wonder if that in any way is related to the page load stalls?


ID: 825794 · Report as offensive
Profile Lemat
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 16
Credit: 14,968,143
RAC: 0
Poland
Message 825851 - Posted: 2 Nov 2008, 0:11:26 UTC - in response to Message 825758.  

208.68.240.13 seems to be not responding
and the boinc client does not attempt to connect to the secondary IP 208.68.240.18

the *temporary* solution is to modify /etc/hosts adding one line:
208.68.240.18 boinc2.ssl.berkeley.edu.

That'll be Vader again.

Note that according to the Server Status Page, there's very little work available at the moment (disks full, probably), so even the temporary fix - which seems like a good one - won't necessarily get you any.


Well, there is no problem with scheduler - it assigned to my quadcore ~40 new units, which are currently waiting (with less or more luck) to be downloaded. And since my previous post it downloaded another 30 (or so) units. Therefore it is a problem with downloading, not with little work. Maybe something stuck with harddrives again.
ID: 825851 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 825852 - Posted: 2 Nov 2008, 0:17:44 UTC - in response to Message 825794.  

Somehow, even though I'm set to "English" as my language of choice, when I came to the "technical news" page, all my headers turned to Spanish!

I wonder if that in any way is related to the page load stalls?


Dunno, but it doesn't look like Matt turned off the Google Analytics thingy yet....
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 825852 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 826152 - Posted: 2 Nov 2008, 17:25:08 UTC - in response to Message 825783.  


The main question is why the round robin dns does not work for some clients.

Here is the answer:

The relevant RFCs say that the record order in DNS should be randomized to help balance load.

Trouble is, the RFC does not clearly state who is supposed to randomize:


  • Is it the server?
  • Is it the resolver?
  • Is it the client?



It appears that many DNS server developers have assumed that the resolver or client will randomize. Resolver developers have assumed that what they get is already randomized, or that the client will, and many clients assume that what they get is already suitably randomized.

This is particularly true in most versions of Windows.


ID: 826152 · Report as offensive
Thierry Godefroy

Send message
Joined: 4 Jul 00
Posts: 12
Credit: 1,043,682
RAC: 0
France
Message 826159 - Posted: 2 Nov 2008, 17:36:25 UTC - in response to Message 826152.  
Last modified: 2 Nov 2008, 17:37:47 UTC

This does not look like a DNS problem to me. I tried with various DNSes (my ISP's and open DNSes) and also used /etc/hosts to hard bind boinc2.ssl.berkeley.edu with 208.68.240.13 then with 208.68.240.18, and the result is always the same: HTTP error while BOINC is trying to download the already granted work.

I can also ping both 208.68.240.13 and 208.68.240.18 without problem.

I even reduced the MTU (down to 1024 bytes), just in case it would be the result of an added VPN or encapsulation on the route, but here again to no avail... It has been almost 48 hours that the downloads suddenly stopped working for Seti (the other projects are fine in this respect).
ID: 826159 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 826271 - Posted: 2 Nov 2008, 20:58:26 UTC - in response to Message 825852.  

Dunno, but it doesn't look like Matt turned off the Google Analytics thingy yet....


I didn't turn it off because I didn't turn it on. Reminder: Eric and Jeff and Bob do stuff here too - just because I'm the only one reporting around here doesn't mean I'm in charge or even know what they are doing all the time.

Anyway... looks like the root drive on vader filled up again, so half the downloads have been choked. Mounts are all messed up, I'm rebooting now, etc.

- Matt



-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 826271 · Report as offensive
Profile KWSN Ekky Ekky Ekky
Avatar

Send message
Joined: 25 May 99
Posts: 944
Credit: 52,956,491
RAC: 67
United Kingdom
Message 826282 - Posted: 2 Nov 2008, 21:12:49 UTC

Looks like Luke got Vader

ID: 826282 · Report as offensive
Profile speedimic
Volunteer tester
Avatar

Send message
Joined: 28 Sep 02
Posts: 362
Credit: 16,590,653
RAC: 0
Germany
Message 826299 - Posted: 2 Nov 2008, 22:14:20 UTC

It appears that many DNS server developers have assumed that the resolver or client will randomize. Resolver developers have assumed that what they get is already randomized, or that the client will, and many clients assume that what they get is already suitably randomized.


As far as the servers are conserned bind does ramdomize. In combination with linux client that works out fine.
Problems arise when caches on resolver/client side come into play.

Anyway, vader seems to find it hard to come back to live...
mic.


ID: 826299 · Report as offensive
PhonAcq

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 826301 - Posted: 2 Nov 2008, 22:20:22 UTC - in response to Message 826271.  

Dunno, but it doesn't look like Matt turned off the Google Analytics thingy yet....


I didn't turn it off because I didn't turn it on. Reminder: Eric and Jeff and Bob do stuff here too - just because I'm the only one reporting around here doesn't mean I'm in charge or even know what they are doing all the time.

Anyway... looks like the root drive on vader filled up again, so half the downloads have been choked. Mounts are all messed up, I'm rebooting now, etc.

- Matt




Thanks for the weekend support, Matt.
ID: 826301 · Report as offensive
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 826306 - Posted: 2 Nov 2008, 22:33:04 UTC


. . . Thanks for takin' the time to explain Matt


BOINC Wiki . . .

Science Status Page . . .
ID: 826306 · Report as offensive
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Technical News : The End of All Things (Oct 30 2008)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.