Panic Mode On (98) Server Problems?

Message boards : Number crunching : Panic Mode On (98) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 26 · 27 · 28 · 29 · 30 · 31 · 32 · Next

AuthorMessage
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5516
Credit: 528,817,460
RAC: 242
United States
Message 1703476 - Posted: 20 Jul 2015, 19:53:48 UTC - in response to Message 1703473.  
Last modified: 20 Jul 2015, 19:54:56 UTC

Yeah, there seems to be more in play than just the GPU, as when wider distribution was tried some reported extreme problems (with anger). Maybe the next generation of OpenCL and Cuda applications will handle these more to everyone's liking.

Yes, as in VLARS take more GPU resources, so, all those cards set to run more than 1 or 2 at a time would start choking...and then the complaints would roll in.

ATI cards receive VLARs at Beta, as do nVidias. Neither receive VLARs on Main.



True but if there is an option to allow users to click y/n Run VLAR in preferences, then many could just opt out of running them.

For those that choose to run them, it's an easy fix to modify the app_config.xml for inclusion of opencl_nvidia_sah into it as shown in mine from Beta site

<app_config>
<app_version>
<app_name>setiathome_v7</app_name>
<plan_class>opencl_nvidia_sah</plan_class>
<avg_ncpus>1</avg_ncpus>
<ngpus>1</ngpus>
</app_version>
<app_version>
<app_name>setiathome_v7</app_name>
<plan_class>cuda50</plan_class>
<avg_ncpus>1</avg_ncpus>
<ngpus>1</ngpus>
</app_version>
<app_version>
<app_name>setiathome_v7</app_name>
<plan_class>cuda42</plan_class>
<avg_ncpus>1</avg_ncpus>
<ngpus>1</ngpus>
</app_version>
</app_config>
ID: 1703476 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1703477 - Posted: 20 Jul 2015, 19:56:10 UTC - in response to Message 1703473.  

Yeah, there seems to be more in play than just the GPU, as when wider distribution was tried some reported extreme problems (with anger). Maybe the next generation of OpenCL and Cuda applications will handle these more to everyone's liking.

Yes, as in VLARS take more GPU resources, so, all those cards set to run more than 1 or 2 at a time would start choking...and then the complaints would roll in.

ATI cards receive VLARs at Beta, as do nVidias. Neither receive VLARs on Main.


Well they can blame me for requesting them on beta :) That request was because we need somewhere to test new code etc targeted for improving them. Though Cuda work along those lines is slow (but non-zero) progress due to technology shifts (stuff happens), I hear the OpenCL work there is promising.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1703477 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1703480 - Posted: 20 Jul 2015, 20:08:49 UTC - in response to Message 1703476.  
Last modified: 20 Jul 2015, 20:12:36 UTC

True but if there is an option to allow users to click y/n Run VLAR in preferences, then many could just opt out of running them.


I've been noticing two main kinds of user since I put some fairly mild options in Cuda MB. Those who leave things at defaults (complete with conservative underutilisation), and those that crank everything into overcommit. I'm sure there are some in between that take the time to dial things things in, but didn
t really find them. Im feeling the next couple of generations of'application will beçome more adaptive to user demands, using measurable indicators, as opposed to slamming a lead foot on the throttle.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1703480 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5516
Credit: 528,817,460
RAC: 242
United States
Message 1703482 - Posted: 20 Jul 2015, 20:14:52 UTC - in response to Message 1703480.  
Last modified: 20 Jul 2015, 20:21:05 UTC

0_0
ID: 1703482 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1703486 - Posted: 20 Jul 2015, 20:23:31 UTC - in response to Message 1703466.  

Now I'd be happy to get VLARs to my NVIDIA cards.
Is there a way to say in app_info.xml that my cards could take a try? (2000 seconds one at a time)
(Fake they are ATI/AMD ...)

No idea. But maybe you can use the Rescheduler to move them from CPU to GPU?


I could reschedule one task by hand. How? Editing some file?
I will not try to find a rescheduler for my linux machine.

Just edit the client state file by changing the <results> entry to the version number and plan class of your CUDA App in your app_info. Works for ATIs.
Just watch the estimated time, it may be too short after the change.
Best way is to suspend the VLAR, stop BOINC, then edit the entry containing the suspended line. In my case I would change;
<version_num>700</version_num>
to
<version_num>708</version_num>
<plan_class>opencl_ati5_sah</plan_class>

Don't make a mistake...or you Lose All your cache...



Thanks TBar, that is what I did. I had to do it to both clien_state and client_state_prev .xml files.

Now I'm going to check how it did (WU). It may require a third run by someone else. My version has still some accuracy problems.

Hmmm, not much difference. Except mine didn't overflow;
Yours, Run time: 28 min 46 sec
Mine, Run time: 28 min 37 sec
I think the Grump found the nVidia OpenCL App was a little faster on MBs, but it ate a whole CPU core.
Or, maybe it was someone else...


Here is one that did not overflow. Runtime a bit high. (WU)
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1703486 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1703487 - Posted: 20 Jul 2015, 20:30:05 UTC - in response to Message 1703486.  

My version has still some accuracy problems.


Still didn't find the complete story there, though have the full team winding up to put each bit in and see what breaks (watch that Github soon). The Chirp explains a little, but not all of the issue. It'll be interesting what falls out the next few weeks in the background.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1703487 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1703489 - Posted: 20 Jul 2015, 20:34:52 UTC - in response to Message 1703477.  

...I hear the OpenCL work there is promising.

Yes, the only problem seems to be the nVidia App requires the latest driver, and that appears to be the show stopper.

Of course my OSX CPU App doesn't Require any drivers, and completes those nasty VLARs almost twice as fast as the current OSX CPU App. Alas, it too is just sitting there...

Here are a couple similar OSX CPUs, note the difference,
Fast, http://setiathome.berkeley.edu/results.php?hostid=7366188&offset=200
Not so Fast, http://setiathome.berkeley.edu/results.php?hostid=7362183&offset=120
ID: 1703489 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1703491 - Posted: 20 Jul 2015, 20:39:48 UTC - in response to Message 1703489.  

Yes, the only problem seems to be the nVidia App requires the latest driver, and that appears to be the show stopper.


Well that's a turnaround for the books. It used to be the OpenCL crowd encouraging sticking with old nvidia drivers. Not that I mind at all, just sayin'")
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1703491 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5516
Credit: 528,817,460
RAC: 242
United States
Message 1703498 - Posted: 20 Jul 2015, 20:44:51 UTC - in response to Message 1703489.  


Yes, the only problem seems to be the nVidia App requires the latest driver


Double edged sword. Newest NV drivers work best for both the Opencl_nvidia_sah and Astropulse v7.10

But latest NV drivers seem to have issues with lower end GPUs and Astropulse v7.10 when combined. Causing freezes up, instability and crashes. This has lead people to reverting to lower NV driver versions ie 347.88 where this doesn't seem to cause any problems.
ID: 1703498 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1703499 - Posted: 20 Jul 2015, 20:46:11 UTC - in response to Message 1703498.  


Yes, the only problem seems to be the nVidia App requires the latest driver


Double edged sword. Newest NV drivers work best for both the Opencl_nvidia_sah and Astropulse v7.10

But latest NV drivers seem to have issues with lower end GPUs and Astropulse v7.10 when combined. Causing freezes up, instability and crashes. This has lead people to reverting to lower NV driver versions ie 347.88 where this doesn't seem to cause any problems.


Hmmm interesting. I wonder what would do that (code wise).
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1703499 · Report as offensive
Wedge009
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 451
Credit: 431,396,357
RAC: 553
Australia
Message 1703524 - Posted: 20 Jul 2015, 22:11:40 UTC - in response to Message 1703498.  

But latest NV drivers seem to have issues with lower end GPUs and Astropulse v7.10 when combined. Causing freezes up, instability and crashes. This has lead people to reverting to lower NV driver versions ie 347.88 where this doesn't seem to cause any problems.

Not just low-end GPUs. I found it a problem on high-end Fermi and Kepler, namely GF110 and GK110. It was exacerbated by BOINC 7.6.x GUI issues (which I encounter on both Windows and Linux, CPU-only and CPU/GPU hosts) to the point of display corruption, freezing etc, but even after going back to BOINC 7.4.x, I had problems until reverting to 347.88 (last release before OpenCL 1.2 support).
Soli Deo Gloria
ID: 1703524 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6324
Credit: 106,370,077
RAC: 121
Russia
Message 1703526 - Posted: 20 Jul 2015, 22:15:00 UTC - in response to Message 1703383.  
Last modified: 20 Jul 2015, 22:15:18 UTC

Now I'd be happy to get VLARs to my NVIDIA cards.
Is there a way to say in app_info.xml that my cards could take a try? (2000 seconds one at a time)
(Fake they are ATI/AMD ...)


you could pretend they are CPU instead.
And revive my old @teammod@ to process both CPU and GPU apps via CPU-only BOINc scheduling. Good old days before BOINC even know what GPU is :DDD
ID: 1703526 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5516
Credit: 528,817,460
RAC: 242
United States
Message 1703527 - Posted: 20 Jul 2015, 22:15:59 UTC - in response to Message 1703524.  

Had to click USE CPU as no work units for the GPUs for several hours.

RAC is in free fall as can't get any work for the GPUs.

CPU is slowly cranking out work....
ID: 1703527 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6324
Credit: 106,370,077
RAC: 121
Russia
Message 1703530 - Posted: 20 Jul 2015, 22:19:36 UTC - in response to Message 1703438.  

Sorry, didn't see you're on linux. I think there's just a rescheduler for windows and even that seems to be hard to find.

Rescheduling work was found to screw up the credit issued once we switched to CreditNew & as far as I know no version that supports MB v7 was released.



2.7 Resheduler supports both v7 AP and MB. via "other" tab.
ID: 1703530 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6324
Credit: 106,370,077
RAC: 121
Russia
Message 1703532 - Posted: 20 Jul 2015, 22:24:18 UTC - in response to Message 1703472.  
Last modified: 20 Jul 2015, 22:32:23 UTC


I think the Grump found the nVidia OpenCL App was a little faster on MBs, but it ate a whole CPU core.
Or, maybe it was someone else...


Actually it was me and Grump...

I'm still willing to do the VLARs if I knew how to do it. Time to complete 1 VLAR 18-20 minutes


I think better way is to complete beta testing on beta project for that.
When we will know what cards can handle VLAR and for what cards OpenCL MB is better then we could ask Eric to devise corresponding plan class and release it to main.
Also, i'm developing some new approach to signal logging for MultiBeam. Some testers with modern NV hardware are welcomed as alpha testers. Currently I'm short of modern NV hardware to play with.
ID: 1703532 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5516
Credit: 528,817,460
RAC: 242
United States
Message 1703533 - Posted: 20 Jul 2015, 22:27:43 UTC - in response to Message 1703532.  

Ok, I'll head back over to beta for the next 12 hours thou I think you've already see these machines. Who's going to join me??
ID: 1703533 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6324
Credit: 106,370,077
RAC: 121
Russia
Message 1703534 - Posted: 20 Jul 2015, 22:33:39 UTC - in response to Message 1703533.  

Ok, I'll head back over to beta for the next 12 hours thou I think you've already see these machines. Who's going to join me??

Yep, I saw your results. What about mastering offline benches and promote to alpha from beta? ;)
ID: 1703534 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1703579 - Posted: 21 Jul 2015, 1:51:03 UTC

SSP:
Data Distribution State        SETI@home #     Astropulse #    As of*
Results ready to send          331,529         0               0m
Current result creation rate   34.0774/sec     -0.3333/sec     5m


Interesting creation rate for AP WUs... ;-)
ID: 1703579 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 20798
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1703607 - Posted: 21 Jul 2015, 5:00:40 UTC

...Happens when a task is returned (and validates) after its deadline, but before the next version of the task is created.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1703607 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13334
Credit: 208,696,464
RAC: 304
Australia
Message 1703622 - Posted: 21 Jul 2015, 6:01:07 UTC - in response to Message 1703607.  

MB splitters output have improved, but still off the pace.
If not for all the VLARs going through we'd be out of ready-to-send work by now.
Grant
Darwin NT
ID: 1703622 · Report as offensive
Previous · 1 . . . 26 · 27 · 28 · 29 · 30 · 31 · 32 · Next

Message boards : Number crunching : Panic Mode On (98) Server Problems?


 
©2022 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.