Panic Mode On (57) Server problems?

Message boards : Number crunching : Panic Mode On (57) Server problems?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 10 · Next

AuthorMessage
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22737
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1158450 - Posted: 3 Oct 2011, 6:54:16 UTC

Simple - the limit is a simple numeric base, not a performance base.
For each CPU you get a number of WU
For each GPU you get A larger number of WU

It appears that a multi-core CPU is only counted as one CPU, in the same way that a multi-core GPU only counts as one GPU
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1158450 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19592
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1158451 - Posted: 3 Oct 2011, 6:57:06 UTC - in response to Message 1158448.  

I'm starting to wonder about how my single-core machine can have more than 50 in progress tasks. I know those of you with GPUs have demonstrated that once you get to 450, you get the limit message, but has there been a test where there is no GPU at all? I know you can just disable GPU crunching, but I'm trying to figure out if this is just one of those things where it's an old, pre-GPU client, or if it is because I don't have a GPU and haven't hit 450, or what?

Any quad-core people out there want to pull their GPU and see if that's the case?

My computer in the last hour or so has started making separate requests for cpu and gpu. I haven't counted, but I get the reached limit for CPU and sometimes get taasks for the GPU. At present total in progress is 192.
ID: 1158451 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 37735
Credit: 261,360,520
RAC: 489
Australia
Message 1158453 - Posted: 3 Oct 2011, 7:06:37 UTC - in response to Message 1158451.  

I've been bouncing off the rev limiter (450 w/u's) for days now and still bouncing as we speak.

Cheers.
ID: 1158453 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1158454 - Posted: 3 Oct 2011, 7:16:16 UTC - in response to Message 1158450.  

Simple - the limit is a simple numeric base, not a performance base.
For each CPU you get a number of WU
For each GPU you get A larger number of WU

It appears that a multi-core CPU is only counted as one CPU, in the same way that a multi-core GPU only counts as one GPU

I get that. 1-200+ cores is still only supposed to get 50 tasks total. I've got right around double that and am not getting the message about a limit. Not at all complaining or trying to boast about my good fortune, but either there's a glitch in the system, or something makes my machine exempt/unique.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1158454 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13913
Credit: 208,696,464
RAC: 304
Australia
Message 1158457 - Posted: 3 Oct 2011, 7:41:19 UTC - in response to Message 1158454.  

Simple - the limit is a simple numeric base, not a performance base.
For each CPU you get a number of WU
For each GPU you get A larger number of WU

It appears that a multi-core CPU is only counted as one CPU, in the same way that a multi-core GPU only counts as one GPU

I get that. 1-200+ cores is still only supposed to get 50 tasks total.

That's not my understanding.
I understand it as the limit being per Processor. My E6600 is Dual Core, so it counts as 2 Processors. My i7 is Quad Core with HyperThreading, so it counts as 8 Processors.
With a limit of 50 per processor my E6600 can get upto 100, the i7 upto 400 (would be nice if it were possible, but with the rate i return them at, and the amount of time the Scheduler is not available i'm not likely to hit those limits any time soon).
Grant
Darwin NT
ID: 1158457 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13913
Credit: 208,696,464
RAC: 304
Australia
Message 1158458 - Posted: 3 Oct 2011, 7:45:20 UTC - in response to Message 1158450.  

It appears that a multi-core CPU is only counted as one CPU, in the same way that a multi-core GPU only counts as one GPU

AFAIK a multi core CPU counts as multiple processors. Multiple cores in a GPU isn't the same thing as multiple cores in a CPU, hence 1 GPU is 1 GPU. A video card with 2 GPUs on it counts as 2 GPUs.
Grant
Darwin NT
ID: 1158458 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34490
Credit: 79,922,639
RAC: 80
Germany
Message 1158459 - Posted: 3 Oct 2011, 7:49:25 UTC

The limit is for one host.
I´ve checked that out yesterday no one gets more than 450 units no matter how much CPUs and GPUs they have running.

With each crime and every kindness we birth our future.
ID: 1158459 · Report as offensive
Dave Stegner
Volunteer tester
Avatar

Send message
Joined: 20 Oct 04
Posts: 540
Credit: 65,583,328
RAC: 27
United States
Message 1158463 - Posted: 3 Oct 2011, 8:14:03 UTC - in response to Message 1158459.  
Last modified: 3 Oct 2011, 8:18:30 UTC

The limit is for one host.
I´ve checked that out yesterday no one gets more than 450 units no matter how much CPUs and GPUs they have running.


I provide this information as information. I am not being argumentative.

6 of my 19 machines have more than 450 wu right now. None of the 6 machines has a GPU, they are dual core pentium.

I believe that Boinc/Seti has issues and is somewhat out of control.
Dave

ID: 1158463 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34490
Credit: 79,922,639
RAC: 80
Germany
Message 1158470 - Posted: 3 Oct 2011, 8:42:48 UTC - in response to Message 1158463.  

The limit is for one host.
I´ve checked that out yesterday no one gets more than 450 units no matter how much CPUs and GPUs they have running.


I provide this information as information. I am not being argumentative.

6 of my 19 machines have more than 450 wu right now. None of the 6 machines has a GPU, they are dual core pentium.

I believe that Boinc/Seti has issues and is somewhat out of control.


I didn´t say the code works pretty well.

I dont get 100 to keep my machine busy.
Others getting to much.
Thats life.

With each crime and every kindness we birth our future.
ID: 1158470 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13913
Credit: 208,696,464
RAC: 304
Australia
Message 1158478 - Posted: 3 Oct 2011, 10:00:29 UTC - in response to Message 1158459.  

The limit is for one host.
I´ve checked that out yesterday no one gets more than 450 units no matter how much CPUs and GPUs they have running.

Bugger.
That sucks.
Grant
Darwin NT
ID: 1158478 · Report as offensive
OTS
Volunteer tester

Send message
Joined: 6 Jan 08
Posts: 371
Credit: 20,533,537
RAC: 0
United States
Message 1158500 - Posted: 3 Oct 2011, 14:54:09 UTC - in response to Message 1158457.  

That's not my understanding.
I understand it as the limit being per Processor. My E6600 is Dual Core, so it counts as 2 Processors. My i7 is Quad Core with HyperThreading, so it counts as 8 Processors.
With a limit of 50 per processor my E6600 can get upto 100, the i7 upto 400 (would be nice if it were possible, but with the rate i return them at, and the amount of time the Scheduler is not available i'm not likely to hit those limits any time soon).



That might be the way your PC is working but it doesn't seem that way on mine, at least as far as CPU work is concerned. I have a dual core with a WU unit being worked on by each core but the limit seems to be a total of 50. Whenever it drops to 49 or 48 it gets bumped back up to 50 (and only 50) within 30 minutes so I do not believe it is a case of there not being any work or the machine not being able to receive it. It is either a case of a limit being set somewhere of either 25 WUs per core or 50 per machine. If you are receiving 50 per core all I can say is lucky you and I am envious.


17567 ? RNl 42:51 ../../projects/setiathome.berkeley.edu/AK_V8_linux32_ssse3
21529 ? RNl 30:09 ../../projects/setiathome.berkeley.edu/AK_V8_linux32_ssse3

ID: 1158500 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6497
Credit: 34,134,168
RAC: 0
United States
Message 1158504 - Posted: 3 Oct 2011, 15:09:50 UTC

With twin Xeons( 8 physical/ 16 HT cores) and a GTX 590(2 GPU's)
I can confirm: I reach a max of 450 tasks, when the scheduler connection permits.
Janice
ID: 1158504 · Report as offensive
Profile John Neale
Volunteer tester
Avatar

Send message
Joined: 16 Mar 00
Posts: 634
Credit: 7,246,513
RAC: 9
South Africa
Message 1158522 - Posted: 3 Oct 2011, 16:05:07 UTC

My Intel® Core™2 Duo CPU T8100 has two processors, and the current limit for this rig is 50 tasks.
ID: 1158522 · Report as offensive
Profile Fred E.
Volunteer tester

Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,140,697
RAC: 0
United States
Message 1158529 - Posted: 3 Oct 2011, 16:20:16 UTC

Can +1 on being over limits for maybe 24 hours. Now back under, due mostly to connect failures and intermittent limits response on successful connects. Don't think it is platform. Will provide additional details/comments if there is interest.
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.
ID: 1158529 · Report as offensive
Miklos M.

Send message
Joined: 5 May 99
Posts: 955
Credit: 136,115,648
RAC: 73
Hungary
Message 1158559 - Posted: 3 Oct 2011, 17:26:18 UTC

Close to running on empty.
ID: 1158559 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1158560 - Posted: 3 Oct 2011, 17:28:27 UTC - in response to Message 1158463.  

The limit is for one host.
I´ve checked that out yesterday no one gets more than 450 units no matter how much CPUs and GPUs they have running.


I provide this information as information. I am not being argumentative.

6 of my 19 machines have more than 450 wu right now. None of the 6 machines has a GPU, they are dual core pentium.

I believe that Boinc/Seti has issues and is somewhat out of control.

You and Cosmic_Ocean are both running older versions of BOINC, and it looks like that may be the reason the limits are not being applied. Looking in the top_hosts list starting several thousand back so there would be a reasonable number of hosts running only CPU crunching, those with BOINC 6.6.20 and later are being limited, but those with 6.4.5 and earlier are not.

The most obvious client change between those versions is that 6.4 and earlier don't report runtime, but there were many other changes too.

Note that the limits are at least in part being applied as protection against overfetching as the capping of server scaling of task estimates is returned toward normal operation. IOW if you have a large cache setting and a small fractional DCF now, if the servers send much shorter estimates your host will ask for more work than you really want, and continue doing so until one of those new tasks is crunched and drives DCF up to something near 1. Anyone considering going to an old version of BOINC to build cache will risk that overfetch unless they also set a reasonably small cache, and of course the older versions are not practical for those doing GPU crunching.
                                                                  Joe
ID: 1158560 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1158567 - Posted: 3 Oct 2011, 17:52:35 UTC - in response to Message 1158560.  

The limit is for one host.
I´ve checked that out yesterday no one gets more than 450 units no matter how much CPUs and GPUs they have running.


I provide this information as information. I am not being argumentative.

6 of my 19 machines have more than 450 wu right now. None of the 6 machines has a GPU, they are dual core pentium.

I believe that Boinc/Seti has issues and is somewhat out of control.

You and Cosmic_Ocean are both running older versions of BOINC, and it looks like that may be the reason the limits are not being applied. Looking in the top_hosts list starting several thousand back so there would be a reasonable number of hosts running only CPU crunching, those with BOINC 6.6.20 and later are being limited, but those with 6.4.5 and earlier are not.

The most obvious client change between those versions is that 6.4 and earlier don't report runtime, but there were many other changes too.

Note that the limits are at least in part being applied as protection against overfetching as the capping of server scaling of task estimates is returned toward normal operation. IOW if you have a large cache setting and a small fractional DCF now, if the servers send much shorter estimates your host will ask for more work than you really want, and continue doing so until one of those new tasks is crunched and drives DCF up to something near 1. Anyone considering going to an old version of BOINC to build cache will risk that overfetch unless they also set a reasonably small cache, and of course the older versions are not practical for those doing GPU crunching.
                                                                  Joe

Well I don't know about right now, but a few weeks ago when my same single-core machine in question went from AP-only to MB-only, the estimates were ridiculously tiny. So much so, that shorties take right at 2:20 (h:mm) but the estimates were showing between 3 and 8 minutes. This led my 10-day cache to turn into.. I think it was 586 WUs. I was completing these WUs as the runaway cache was being filled and the estimates were increasing, but not drastically. 8 minutes turned into 12, then just under 20, etc.

It was a gradual increase that resulted in a reasonable decrease in the number of seconds in the work request. Eventually, all the estimates were pretty close to correct and EDF mode began. I pulled up BoincTasks and it said it was ~267 days worth of cache. I selected all but the first 20 in the list and aborted them.

Point is.. if anyone is going to do that.. go with a tiny cache.. like 0.5 days max, until you figure out what the estimates are really going to be, then increase it from there.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1158567 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13913
Credit: 208,696,464
RAC: 304
Australia
Message 1158571 - Posted: 3 Oct 2011, 18:30:42 UTC


Given the difficulties getting work over the last month or so, i can't see too big a cache being a problem for a while yet.
Grant
Darwin NT
ID: 1158571 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6497
Credit: 34,134,168
RAC: 0
United States
Message 1158592 - Posted: 3 Oct 2011, 19:13:15 UTC
Last modified: 3 Oct 2011, 19:15:51 UTC

With the current scheduler difficulties, I think it would be virtually impossible to get an overload of units in under a 24 hour period.

Edit: plus, I just got a bunch of these...



core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
WU download error: couldn't get input files:
<file_xfer_error>
<file_name>20ap11ad.30869.11928.12.10.59</file_name>
<error_code>-200</error_code>
</file_xfer_error>

</message>
]]>
Janice
ID: 1158592 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1158593 - Posted: 3 Oct 2011, 19:14:44 UTC - in response to Message 1158589.  

Ok, things are changing. After several days of not running, ap_validate3 is now online.

Wow. I just looked at the server status page and somehow didn't notice that. It's amazing. Maybe a bunch of my pendings will finally be validated and moved into the ridiculous purge backlog.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1158593 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 10 · Next

Message boards : Number crunching : Panic Mode On (57) Server problems?


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.