Panic Mode On (20) Server problems

Message boards : Number crunching : Panic Mode On (20) Server problems
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 15 · Next

AuthorMessage
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 918071 - Posted: 15 Jul 2009, 16:14:26 UTC - in response to Message 917959.  

I have been looking at this:

http://setiathome.berkeley.edu/sah_status.html

and this:

http://bluenorthernsoftware.com/scarecrow/sahstats/graphs.php?t=48

and this:

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;view=Octets;ranges=d%3Aw%3Am%3Ay

If the bandwidth was being used by CUDA, the turnaround time would not be rising. If the bandwidth was being used by so-called older, slower machines would the "bytes out" not be higher? If it's database queries the "queries/sec" would be much higher.

I expect almost all the faster hosts have too many pending uploads to allow work fetch. A bit of statistical analysis I did about a week ago indicated that the top 10000 hosts do about 39% of all work, so even if they're all cut off there's a lot of crunching power still. The average RAC across all 292000 active hosts is around 216, and a host with RAC under 200 because it's doing multiple projects might not be cut off from SETI work yet. Enough hosts are cut off that demand is less than production, though.

So, is it possible that "ACK" (hey, other computer, are you there and can we talk) is accounting for 30-40Mbits/sec of bandwidth?!?! If so, it certainly would explain the congestion and dataflow problems of late.

No, with less than 4 MBits/sec going to SSL handshake data on the download side would be about the same. That 30-40 MBits/sec of download is mostly WUs being delivered to hosts which are still able to request work. For instance, an older host which is approaching the end of an AP_v505 WU it got a week ago would have no results awaiting upload. Because a "Results ready to send" queue has built up for both types of work, the ratio should be about 32:1 MB:AP_v505 work going out. 7 or 8 MB WUs and 1/5 to 1/4 of an AP_v505 being downloaded each second would give the shown download rate.
ID: 918071 · Report as offensive
Profile craggyislander
Volunteer tester
Avatar

Send message
Joined: 28 Oct 01
Posts: 100
Credit: 206,709
RAC: 0
United Kingdom
Message 918075 - Posted: 15 Jul 2009, 16:23:30 UTC

Bruno -up :-) happy day
"The longest journey begins with a single step" Confucius

ID: 918075 · Report as offensive
Profile Zeus Fab3r
Avatar

Send message
Joined: 17 Jan 01
Posts: 649
Credit: 275,335,635
RAC: 597
Serbia
Message 918080 - Posted: 15 Jul 2009, 16:40:37 UTC - in response to Message 918075.  
Last modified: 15 Jul 2009, 16:41:54 UTC

Bruno -up :-) happy day


If I could be at Bruno's place right now, I certainly wouldn't be so happy ... lol

Who the hell is General Failure and why is he reading my harddisk?¿
ID: 918080 · Report as offensive
Profile jay_e

Send message
Joined: 6 Apr 03
Posts: 62
Credit: 1,072,112
RAC: 0
United States
Message 918083 - Posted: 15 Jul 2009, 16:44:59 UTC

Greetings!
Have similar problems doing upload since Sunday.
(Have 10 WU that are getting the -107 status)

I would like some guidance.
How long is a 'normal' (bwaahaaahaaa) wait-time to get a chance for an upload to succeed????

In the meantime, I have tried several things.
With my luck, they made things worse.

I have tried:
- rebooting the PC
- trying a ping to setiboincdata.ssl.berkeley.edu ( Ping was OK:
10 bytes from 208.68.240.16: icmp_seq=0 ttl=46
10 bytes from 208.68.240.16: icmp_seq=1 ttl=46
- set DNS cache TTL to 16 seconds
- tried upload with antivirus turned off.
- waited through the Tuesday outage for backup. But its Wednesday now...
- Looked at the seti I/O graphs (THANK YOU) to see when there is a 'normal'
amount of load


Suggestions are welcome.

Environment stuff
7/15/2009 12:09:26 PM Processor: 2 GenuineIntel Intel(R) Core(TM) Duo CPU T2300 @ 1.66GHz [x86 Family 6 Model 14 Stepping 12]
7/15/2009 12:09:26 PM Processor features: fpu tsc pae nx sse sse2 mmx
7/15/2009 12:09:26 PM OS: Microsoft Windows XP: Professional x86 Edition, Service Pack 3, (05.01.2600.00)
7/15/2009 12:09:26 PM Memory: 2.00 GB physical, 3.84 GB virtual
7/15/2009 12:09:26 PM Disk: 107.41 GB total, 85.03 GB free
7/15/2009 12:09:26 PM Local time is UTC -4 hours
7/15/2009 12:09:26 PM No CUDA devices found
7/15/2009 12:09:26 PM No coprocessors
7/15/2009 12:09:26 PM Not using a proxy


Here is a snap of some upload messages (There are 2 overlapping uploads)
7/15/2009 12:11:16 PM SETI@home Started upload of 01dc08ad.14412.13569.8.8.222_0_0
7/15/2009 12:11:16 PM SETI@home [file_xfer_debug] URL: http://setiboincdata.ssl.berkeley.edu/sah_cgi/file_upload_handler
7/15/2009 12:11:18 PM Internet access OK - project servers may be temporarily down.
7/15/2009 12:11:38 PM Project communication failed: attempting access to reference site
7/15/2009 12:11:38 PM SETI@home [file_xfer_debug] FILE_XFER_SET::poll(): http op done; retval -107
7/15/2009 12:11:38 PM SETI@home [file_xfer_debug] FILE_XFER_SET::poll(): http op done; retval -107
7/15/2009 12:11:38 PM SETI@home [file_xfer_debug] file transfer status -107



Thanks again,
Jay
ID: 918083 · Report as offensive
Profile # Bob Ahlers #

Send message
Joined: 30 Mar 01
Posts: 18
Credit: 10,209,954
RAC: 0
Netherlands
Message 918085 - Posted: 15 Jul 2009, 16:47:06 UTC - in response to Message 918071.  

Now that the server is back online, lets all upload at the same time :-)
THNX, Seti (Matt)
ID: 918085 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 918086 - Posted: 15 Jul 2009, 16:48:49 UTC - in response to Message 917837.  

If simplex is the main issue on the connection, a possible (work around) solution could be to switch upload and download servers on-line and offline with a window of 24 hours or so.

In my opinion, this would actually be worse. The BOINC client doesn't have a way to know that "monday is download day" and will keep trying to upload.

The best way to avoid getting hammered on uploads is to take them -- then they don't come back later.

ID: 918086 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 918087 - Posted: 15 Jul 2009, 16:50:19 UTC


Yes.. the UL server is now green..
http://setiathome.berkeley.edu/sah_status.html

BUT, I can't upload..

ID: 918087 · Report as offensive
HarryM
Volunteer tester

Send message
Joined: 24 Jul 08
Posts: 68
Credit: 3,812,695
RAC: 0
United States
Message 918089 - Posted: 15 Jul 2009, 16:51:54 UTC - in response to Message 918087.  

You'll have to get in line with everyone else.
ID: 918089 · Report as offensive
Profile Samdani
Avatar

Send message
Joined: 21 Oct 00
Posts: 85
Credit: 13,480,553
RAC: 0
Pakistan
Message 918091 - Posted: 15 Jul 2009, 16:52:34 UTC - in response to Message 918057.  

Thanks Bill Walker and SuperJoker. It certainly makes sense but how about relaxing the linking rule under special circumstances like this. Would that be a feasible option?
ID: 918091 · Report as offensive
Profile craggyislander
Volunteer tester
Avatar

Send message
Joined: 28 Oct 01
Posts: 100
Credit: 206,709
RAC: 0
United Kingdom
Message 918092 - Posted: 15 Jul 2009, 16:54:39 UTC - in response to Message 918080.  

Bruno -up :-) happy day


If I could be at Bruno's place right now, I certainly wouldn't be so happy ... lol



True!, but we are going forward, (may be, for the moment ) :-)
"The longest journey begins with a single step" Confucius

ID: 918092 · Report as offensive
Profile erik

Send message
Joined: 27 Oct 05
Posts: 1
Credit: 4,350,996
RAC: 0
Netherlands
Message 918093 - Posted: 15 Jul 2009, 16:55:32 UTC

i have been letting my pc work for seti for a few years now
But often my pc does nothing becouse of problems with upload download no work avalible etc etc etc
Are there to many people helping? its rather anoying to find out in the mornig my pc was on all night for nothing once again.
And it does not seem to get better but worse treu the years

greetings from Holland
Erik Muller
ID: 918093 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 918095 - Posted: 15 Jul 2009, 16:57:47 UTC - in response to Message 918083.  

Suggestions are welcome.

Next time, check the server status page first - a server was disabled until about half an hour ago.

It's always a good idea to do whatever you can to distinguish whether it's a problem at 'your end' or 'their end'. Sometimes people complain about the project servers being down for weeks on end - and it turns out to be a problem on their own computer.

This time, it was a problem at the project's end - and no amount of fiddling with your computer is going to overcome that.
ID: 918095 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 918096 - Posted: 15 Jul 2009, 17:00:05 UTC - in response to Message 918089.  
Last modified: 15 Jul 2009, 17:06:53 UTC

You'll have to get in line with everyone else.


???

I don't know what you mean.

I have now maybe ~ 1,600 results ready for UL.

And after every few seconds an other result would like to start the UL, but can't and a new one start and can't also reach the UL server.

I guess it's now again a prob with DNS, http or what ever.
With my ISP from Germany - I can't reach the UL server.

ID: 918096 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 918099 - Posted: 15 Jul 2009, 17:07:42 UTC - in response to Message 918051.  

What is the harm in allowing downloads if the work is available.

... because downloads eventually become uploads, and if uploads are a problem, you don't want to add to it.
ID: 918099 · Report as offensive
Profile Byron S Goodgame
Volunteer tester
Avatar

Send message
Joined: 16 Jan 06
Posts: 1145
Credit: 3,936,993
RAC: 0
United States
Message 918100 - Posted: 15 Jul 2009, 17:08:16 UTC - in response to Message 918096.  
Last modified: 15 Jul 2009, 17:09:19 UTC

Well, it's no better in Southern California either. Seems to be the same thing as yesterday when supposedly the upload server was up during the maintenance. 4 pc's and not one uploaded. Hang in there, we'll get there.
ID: 918100 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 918101 - Posted: 15 Jul 2009, 17:09:28 UTC - in response to Message 918091.  

Thanks Bill Walker and SuperJoker. It certainly makes sense but how about relaxing the linking rule under special circumstances like this. Would that be a feasible option?

"Circumstances like this" are exactly why you want to link uploads and downloads.

Trouble uploading + more downloads + time means even more upload problems later.
ID: 918101 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 918102 - Posted: 15 Jul 2009, 17:12:33 UTC - in response to Message 918096.  


I have now maybe ~ 1,600 results ready for UL.

Your machine has 1,600 results, and it tries to connect 1600 times every four hours, more or less, depending on the backoff.

How many others out there have lots of uploads, and are retrying just as aggressively?

ID: 918102 · Report as offensive
Profile Samdani
Avatar

Send message
Joined: 21 Oct 00
Posts: 85
Credit: 13,480,553
RAC: 0
Pakistan
Message 918104 - Posted: 15 Jul 2009, 17:14:34 UTC - in response to Message 918101.  

Ok. I give up. There seems no room for any creative ideas :)
ID: 918104 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 918105 - Posted: 15 Jul 2009, 17:16:01 UTC
Last modified: 15 Jul 2009, 17:17:12 UTC


@ Byron S Goodgame

Of course, I'm patient.. I'm a SETI@home member.. ;-)

My GPU cruncher have still maybe two days WU cache.. I hope in this time the UL server will be reachable and new work will be available.. :-D

ID: 918105 · Report as offensive
Profile Bill Walker
Avatar

Send message
Joined: 4 Sep 99
Posts: 3868
Credit: 2,697,267
RAC: 0
Canada
Message 918106 - Posted: 15 Jul 2009, 17:18:44 UTC - in response to Message 918101.  
Last modified: 15 Jul 2009, 17:28:00 UTC

As usual, Ned has a good point.

Also, I belive the "linkage" occurs in BOINC, not just in SETI. Even if disabling the linking would help now, it would defeat the concept of BOINC being universal if each project could start manipulating it to work around their unique issues.

NEWS FLASH - I just had one WU upload! Only about a dozen left now...

15/07/2009 1:04:50 PM SETI@home Finished upload of 24fe09ab.20328.11115.5.8.44_3_0

ID: 918106 · Report as offensive
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 15 · Next

Message boards : Number crunching : Panic Mode On (20) Server problems


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.