Panic Mode On (5) Server Problems!

Message boards : Number crunching : Panic Mode On (5) Server Problems!
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 18 · Next

AuthorMessage
anders n

Send message
Joined: 26 Jul 99
Posts: 69
Credit: 916,751
RAC: 0
Sweden
Message 595859 - Posted: 30 Jun 2007, 14:52:57 UTC - in response to Message 595849.  


You're going to get responses that all over the board. I have mine set at 10 days so I don't have to deal with the somewhat daily issues of 'no work available', 'can't get downloads', 'servers down', etc. You will also hear that with 10 days cashe you will not be able to crunch everthing and work will 'expire', not the case, BOINC crunch's WU's based on WU expiration date, not when you got it. So from my perspective, think BIG :)


On the other hand (or other side of the board), with a 10-day cache and using the latest BOINC clients, many of those WUs will be aborted before they get a chance to run because the quorum will already have been met. The penalty for this is the space they take up on your harddrive plus the bandwidth to download them and their replacements.

I run with a 0 days connect setting and a 1.5 days additional work value. But I run multiple projects and have broadband, so I'm 'always on' anyway. YMMV.


Just what I am seeing on my MAC a lot of aborted Wu-s.

I started out with 0 connect and 1,5 days additional work value as well but the MAC is also doing Ralph and there are not allways work there. So when Ralph is out of work it downloads alot of SETI work and when new Ralph work comes in much of the downloaded work on SETI is aborted as the quorum is met. Maybe thats the price I have to pay because in order to get Ralph work to meet the 50/50 setting Ralph/SETI on the MAC I need to cache work on Ralph.

I´ll be trying 0 connect / 1 days additional work value for now.

Anders n

ID: 595859 · Report as offensive
Profile Keith T.
Volunteer tester
Avatar

Send message
Joined: 23 Aug 99
Posts: 962
Credit: 537,293
RAC: 9
United Kingdom
Message 595886 - Posted: 30 Jun 2007, 15:12:56 UTC

Beware the 0.0 day C.I. it may increase the probability of Validate Errors. Joe Segur has reccomended setting no lower than 0.001 days (86.4 seconds) to ensure that results have been written to disk before reporting them.

I beleive this advice applies to SETI main as well as Beta.
Sir Arthur C Clarke 1917-2008
ID: 595886 · Report as offensive
anders n

Send message
Joined: 26 Jul 99
Posts: 69
Credit: 916,751
RAC: 0
Sweden
Message 595889 - Posted: 30 Jun 2007, 15:19:14 UTC - in response to Message 595886.  

Beware the 0.0 day C.I. it may increase the probability of Validate Errors. Joe Segur has reccomended setting no lower than 0.001 days (86.4 seconds) to ensure that results have been written to disk before reporting them.

I beleive this advice applies to SETI main as well as Beta.


Thank you, will do :)
ID: 595889 · Report as offensive
Profile Philadelphia
Volunteer tester
Avatar

Send message
Joined: 12 Feb 07
Posts: 1590
Credit: 399,688
RAC: 0
United States
Message 595912 - Posted: 30 Jun 2007, 16:02:30 UTC - in response to Message 595849.  

What is the best chash size to have?



You're going to get responses that all over the board. I have mine set at 10 days so I don't have to deal with the somewhat daily issues of 'no work available', 'can't get downloads', 'servers down', etc. You will also hear that with 10 days cashe you will not be able to crunch everthing and work will 'expire', not the case, BOINC crunch's WU's based on WU expiration date, not when you got it. So from my perspective, think BIG :)


On the other hand (or other side of the board), with a 10-day cache and using the latest BOINC clients, many of those WUs will be aborted before they get a chance to run because the quorum will already have been met. The penalty for this is the space they take up on your harddrive plus the bandwidth to download them and their replacements.

I run with a 0 days connect setting and a 1.5 days additional work value. But I run multiple projects and have broadband, so I'm 'always on' anyway. YMMV.


I run 'Chicken Soup' so all my WU's get run and validated whether needed or not. I don't know which SETI version the up coming 'New And Improved Chicken Soup' will run on.

As an aside, if and when there are outages and issues getting WU's, having 10 days will insure I don't run into crunching down time.
ID: 595912 · Report as offensive
anders n

Send message
Joined: 26 Jul 99
Posts: 69
Credit: 916,751
RAC: 0
Sweden
Message 595946 - Posted: 30 Jun 2007, 16:38:31 UTC - in response to Message 595912.  

I run 'Chicken Soup' so all my WU's get run and validated whether needed or not. I don't know which SETI version the up coming 'New And Improved Chicken Soup' will run on.

As an aside, if and when there are outages and issues getting WU's, having 10 days will insure I don't run into crunching down time.


I see your point.
I stopped doing SETI about 1,5 years ago mostly because i felt there was wasted
computer time with high quorum at the time. Now whith the new system I´ll try
doing some work here again:)

Anders n


ID: 595946 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 596043 - Posted: 30 Jun 2007, 18:59:21 UTC - in response to Message 595912.  

I run 'Chicken Soup' so all my WU's get run and validated whether needed or not. I don't know which SETI version the up coming 'New And Improved Chicken Soup' will run on.

The version of the SETI application has nothing to do with the automatic aborts. You're running <core_client_version>5.8.15 which doesn't do them.

The 'New And Improved Chicken Soup' is an upgrade with the few changes needed for Multibeam plus some slight optimization improvements. It's based on the same code branch as 2.2B and should run with any version of BOINC.
                                                                 Joe
ID: 596043 · Report as offensive
KB7RZF
Volunteer tester
Avatar

Send message
Joined: 15 Aug 99
Posts: 9549
Credit: 3,308,926
RAC: 2
United States
Message 596267 - Posted: 1 Jul 2007, 1:24:24 UTC - in response to Message 595886.  

Beware the 0.0 day C.I. it may increase the probability of Validate Errors. Joe Segur has reccomended setting no lower than 0.001 days (86.4 seconds) to ensure that results have been written to disk before reporting them.

I beleive this advice applies to SETI main as well as Beta.

I've had mine set to 0.0, been crunching SETI Wu's for 2 days now, no problems. With my additional setting set at .5 days. I don't keep large caches, no need to with the many projects out there. :-)
ID: 596267 · Report as offensive
Profile Geek@Play
Volunteer tester
Avatar

Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,146,931
RAC: 0
United States
Message 612231 - Posted: 31 Jul 2007, 0:46:07 UTC

Oh No...............

Uploads ok......no downloads

Panic Mode ON???


Boinc....Boinc....Boinc....Boinc....
ID: 612231 · Report as offensive
Scarecrow

Send message
Joined: 15 Jul 00
Posts: 4520
Credit: 486,601
RAC: 0
United States
Message 612294 - Posted: 31 Jul 2007, 3:13:43 UTC - in response to Message 612231.  

Panic Mode ON???


At least we're seeing a bit of a pattern.... makes life easier being able to plan my panic attacks in advance.


ID: 612294 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 612322 - Posted: 31 Jul 2007, 4:41:44 UTC
Last modified: 31 Jul 2007, 4:51:56 UTC

Hmm. Server stats say 34,000 ready to download, but the Scheduler says "No work from project" available.


Edit- after about the 5th attempt, it finally got a Work Unit to download.
Only problem now is that this keeps occuring.
31/07/2007 14:20:52|SETI@home|[file_xfer] Temporarily failed download of 19jn00aa.15360.1920.822164.3.190: system connect

Grant
Darwin NT
ID: 612322 · Report as offensive
Compukatt
Avatar

Send message
Joined: 5 Oct 99
Posts: 26
Credit: 27,325,826
RAC: 13
New Zealand
Message 612398 - Posted: 31 Jul 2007, 8:24:23 UTC - in response to Message 595788.  

What is the best chash size to have?



I got a few validate errors so have set connection interval to .001
I normally run the work cache at 1 day which gives me about 15 hours work in real time. Before the weekend starts I put the cache up to 7 days and then put it back to 1 after all machines have finished stocking up. This has paid off well over recent weeks when there have been problems over the weekend or after the scheduled maintenance on Tuesday.
Bill
Auckland, NZ
ID: 612398 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 612402 - Posted: 31 Jul 2007, 8:49:35 UTC - in response to Message 612398.  

What is the best chash size to have?

If running more than 1 project, less than a day.
If running just one project a cache of 3 days will see you through most outages.
Grant
Darwin NT
ID: 612402 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 614623 - Posted: 4 Aug 2007, 5:06:54 UTC
Last modified: 4 Aug 2007, 5:09:18 UTC



Since 04:44 UTC 'no work from project'

No update of the 'server status page' since 8/3/2007 17:20:08 UTC

What's going?





ID: 614623 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 614638 - Posted: 4 Aug 2007, 5:38:53 UTC



Was a short panic-time.. ;-)

Now I got new work.. :-)


ID: 614638 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 617265 - Posted: 9 Aug 2007, 22:35:49 UTC
Last modified: 9 Aug 2007, 22:36:27 UTC



I thought it's time to post in this thread.. ;-)





ID: 617265 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 618171 - Posted: 12 Aug 2007, 18:23:56 UTC


Looks like it's back online- network graphs show the flood gates wide open. Think i'll give it 12 hours or so before i give my system network acces again.
Grant
Darwin NT
ID: 618171 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 618480 - Posted: 13 Aug 2007, 6:52:17 UTC
Last modified: 13 Aug 2007, 7:07:08 UTC

Well, i've been allocated heaps of Work Units, but actually being able to download them is another thing.
The first dozen or wouldn't download due to
13/08/2007 16:17:01|SETI@home|[file_xfer] Temporarily failed download of 02ap00ab.27814.5024.228398.3.36: system connect

It then downloaded then next dozen or two OK, although it took longer & longer as it went through them before the the data would start flowing from the time it started trying to download. Then it kept giving
13/08/2007 16:17:01|SETI@home|[file_xfer] Temporarily failed download of 02ap00ab.27814.5024.228398.3.36: system connect
errors again on the next half dozen or so, then it started downloading (although very, very slowly) the next Work Units in line.



EDIT- it's taken about 30min or so, but all the allocated Work Units have eventually downloaded.
Grant
Darwin NT
ID: 618480 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 618481 - Posted: 13 Aug 2007, 6:54:41 UTC - in response to Message 618480.  


Well, i've been allocated heaps of Work Units, but actually being able to download them is another thing.
The first dozen or wouldn't download due to
13/08/2007 16:17:01|SETI@home|[file_xfer] Temporarily failed download of 02ap00ab.27814.5024.228398.3.36: system connect

It then downloaded then next dozen or two OK, although it took longer & longer as it went through them before the the data would start flowing from the time it started trying to download. Then it kept giving
13/08/2007 16:17:01|SETI@home|[file_xfer] Temporarily failed download of 02ap00ab.27814.5024.228398.3.36: system connect
errors again on the next half dozen or so, then it started downloading (although very, very slowly) the next Work Units in line.


Same here, buddy. Transfer rates are all over the place when downloads do work, but I am getting a lot of 'system connect' errors as well. Either server problems in the background, or the traffic is still overloading the servers due to the pent up demand after the outage.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 618481 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 618486 - Posted: 13 Aug 2007, 7:00:08 UTC


Just checked the time over there- another 8 hours or so before they're back at work & have a chance to sort it out.
Looks like i've got enough now for about 48 hours or so, so at least the system will be busy while they get things straightened out.
Grant
Darwin NT
ID: 618486 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 618489 - Posted: 13 Aug 2007, 7:10:05 UTC - in response to Message 618486.  


Just checked the time over there- another 8 hours or so before they're back at work & have a chance to sort it out.
Looks like i've got enough now for about 48 hours or so, so at least the system will be busy while they get things straightened out.


I am hoping we are on the cusp of a new age of Seti stability. Now that the MB work has been rolled out and new apps released, maybe Matt and Eric will have the time to grapple with the few niggling server issues that keep bringing the project to it's knees now and again. There must be some server software configuration settings or such that need some tweaking, as I hardly believe that the new hardware they have brought online lately is not up to the task.

I know, the kitties and I are ever the optimists.

"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 618489 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 . . . 18 · Next

Message boards : Number crunching : Panic Mode On (5) Server Problems!


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.