Panic Mode On (84) Server Problems?

Message boards : Number crunching : Panic Mode On (84) Server Problems?
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 21 · Next

AuthorMessage
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1373211 - Posted: 30 May 2013, 1:08:18 UTC

In honor of Seti@Home v7 roll out, it is time for a new thread.

ID: 1373211 · Report as offensive
Darth Beaver Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 20 Aug 99
Posts: 6728
Credit: 21,443,075
RAC: 3
Australia
Message 1373213 - Posted: 30 May 2013, 1:11:41 UTC - in response to Message 1373211.  

In honor of Seti@Home v7 roll out, it is time for a new thread.


+1.....hehehehe
ID: 1373213 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1373219 - Posted: 30 May 2013, 1:38:09 UTC
Last modified: 30 May 2013, 1:39:47 UTC

V7 running, but it seems something is not working fine, on a 2x690 Hosts it runs cuda42 and some cuda32 and not the right one for this type of GPU (expect to run cuda50 or i´m wrong?).

http://setiathome.berkeley.edu/show_host_detail.php?hostid=6269362

Any clue?
ID: 1373219 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1373222 - Posted: 30 May 2013, 1:41:44 UTC - in response to Message 1373219.  
Last modified: 30 May 2013, 1:47:30 UTC

Any clue?


The server needs to try each compatible version (gather statistics), to determine which is best. This should converge on Cuda5 for those after many tasks. If it converges on the wrong one, there will be a reset statistics button (at some stage).
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1373222 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1373224 - Posted: 30 May 2013, 1:48:53 UTC - in response to Message 1373222.  
Last modified: 30 May 2013, 1:49:26 UTC

Any clue?


The server needs to try each compatible version, to determine which is best. This should converge on Cuda5 for those after many tasks. If it converges on the wrong one, there will be a reset statistics button (at some stage).

So a select version option in the app_config.xml could be a good ideia...

That will be a long night/day for you guys... hope you all have a good beer & coffee stock to help.
ID: 1373224 · Report as offensive
Profile SciManStev Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Jun 99
Posts: 6651
Credit: 121,090,076
RAC: 0
United States
Message 1373226 - Posted: 30 May 2013, 1:51:16 UTC

I am running stock now, and doing exactly what Jason suggests. It will all balance out in the long run, and with a project that has the potential for going long past my expected life time, I am happy. It will all balance out on it's own. Then the tweaking begins......

Steve
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website
ID: 1373226 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1373228 - Posted: 30 May 2013, 1:55:40 UTC - in response to Message 1373224.  

Any clue?


The server needs to try each compatible version, to determine which is best. This should converge on Cuda5 for those after many tasks. If it converges on the wrong one, there will be a reset statistics button (at some stage).

So a select version option in the app_config.xml could be a good ideia...

That will be a long night/day for you guys... hope you all have a good beer & coffee stock to help.


Forcing an application version can already be done with app_info.xml, as the installers will do. From the server perspective it needs to have your knowledge about versions, but is a blank slate. For credits to dial in, and your APRs correctly confirm what you already know (or break horribly), best to let it run stock for a while & see if it works out sensible numbers, or David & Eric need to be locked in a small dark room together until they work it out :D

Jason
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1373228 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1373229 - Posted: 30 May 2013, 1:56:09 UTC - in response to Message 1373222.  



The server needs to try each compatible version (gather statistics), to determine which is best.


We could save it a lot of trouble if that were an option.
ID: 1373229 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11354
Credit: 29,581,041
RAC: 66
United States
Message 1373230 - Posted: 30 May 2013, 1:59:49 UTC - in response to Message 1373226.  

I am running stock now, and doing exactly what Jason suggests. It will all balance out in the long run, and with a project that has the potential for going long past my expected life time, I am happy. It will all balance out on it's own. Then the tweaking begins......

Steve

Steve, that is too damn rational.
ID: 1373230 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1373231 - Posted: 30 May 2013, 2:01:02 UTC - in response to Message 1373229.  
Last modified: 30 May 2013, 2:01:34 UTC



The server needs to try each compatible version (gather statistics), to determine which is best.


We could save it a lot of trouble if that were an option.


As per previous post, yeah you already have that option with app_info.xml.
In the short term, It's more about dialling in credits, which will probably be all over the place for some time.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1373231 · Report as offensive
Profile Fred E.
Volunteer tester

Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,140,697
RAC: 0
United States
Message 1373239 - Posted: 30 May 2013, 2:25:22 UTC

In the previous thread, there were some comments about VLARs going to Nvidia GPU's now. I"m still on v6 for another 10 hours while draining my cache, and I notice that I have 3 VLARs and a non-VLAR now running on my 670 with x41zc, Cuda 5.00. I don't see any adverse effects except that run times will be longer than normal and there is a little lag in the system (responsible for any typos in this post!) :)

But the odd thing is that my gpu temperature is more than 10 degrees below normal. No downclock and gpu utilization is at a constant 99%. CPU usage is below normal for 6.10 tasks. Why would these run cooler? I expected the opposite.


Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.
ID: 1373239 · Report as offensive
ExchangeMan
Volunteer tester

Send message
Joined: 9 Jan 00
Posts: 115
Credit: 157,719,104
RAC: 0
United States
Message 1373242 - Posted: 30 May 2013, 2:36:35 UTC - in response to Message 1373239.  

In the previous thread, there were some comments about VLARs going to Nvidia GPU's now. I"m still on v6 for another 10 hours while draining my cache, and I notice that I have 3 VLARs and a non-VLAR now running on my 670 with x41zc, Cuda 5.00. I don't see any adverse effects except that run times will be longer than normal and there is a little lag in the system (responsible for any typos in this post!) :)

But the odd thing is that my gpu temperature is more than 10 degrees below normal. No downclock and gpu utilization is at a constant 99%. CPU usage is below normal for 6.10 tasks. Why would these run cooler? I expected the opposite.


I see the exact same thing. My guess is that since VLARs don't do well parallelizing, you can't keep as many cores busy in the GPU. However, Precision X reports high CPU usage, especially with another task running on that same GPU. Less cores in use mean less heat.

I can see this is going to be a problem since not only does the VLAR run much longer compared to a normal work unit, but it degrades the other jobs running on that same GPU. I don't know if there is a workaround to this except for only running a single tasks at a time on all my GPUs. That's doesn't seem to be a very efficient use of GPU resources. I would much rather the GPU stay on the CPUs which do pretty well with them. I wouldn't care if all my CPU tasks were VLARs.

ID: 1373242 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1373246 - Posted: 30 May 2013, 2:43:59 UTC - in response to Message 1373242.  

In the previous thread, there were some comments about VLARs going to Nvidia GPU's now. I"m still on v6 for another 10 hours while draining my cache, and I notice that I have 3 VLARs and a non-VLAR now running on my 670 with x41zc, Cuda 5.00. I don't see any adverse effects except that run times will be longer than normal and there is a little lag in the system (responsible for any typos in this post!) :)

But the odd thing is that my gpu temperature is more than 10 degrees below normal. No downclock and gpu utilization is at a constant 99%. CPU usage is below normal for 6.10 tasks. Why would these run cooler? I expected the opposite.


I see the exact same thing. My guess is that since VLARs don't do well parallelizing, you can't keep as many cores busy in the GPU. However, Precision X reports high CPU usage, especially with another task running on that same GPU. Less cores in use mean less heat.

I can see this is going to be a problem since not only does the VLAR run much longer compared to a normal work unit, but it degrades the other jobs running on that same GPU. I don't know if there is a workaround to this except for only running a single tasks at a time on all my GPUs. That's doesn't seem to be a very efficient use of GPU resources. I would much rather the GPU stay on the CPUs which do pretty well with them. I wouldn't care if all my CPU tasks were VLARs.


I have something weird going on with my 2 machines, they were assigned VLAR but they are showing up as suspended by user and I did not suspend them.

ID: 1373246 · Report as offensive
Dave Stegner
Volunteer tester
Avatar

Send message
Joined: 20 Oct 04
Posts: 540
Credit: 65,583,328
RAC: 27
United States
Message 1373252 - Posted: 30 May 2013, 2:55:14 UTC

Is there a need to change preferences for amount of work and additional work with V7 ? I seem to remember they are now backwards, or have I read too many posts.
Dave

ID: 1373252 · Report as offensive
Profile Fred E.
Volunteer tester

Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,140,697
RAC: 0
United States
Message 1373257 - Posted: 30 May 2013, 3:10:51 UTC

Is there a need to change preferences for amount of work and additional work with V7 ? I seem to remember they are now backwards, or have I read too many posts.
____________
Dave
Thinbk you're remembering the difference in work fetch settings between BOINC 6(and earlier) and BOINC 7 where you have to swtich the settings. Unless you upgrade BOINC, there's no need to change those settings. You do need to make sure SETIatHome v7 is selected in your website project preferences.

Update on my earlier VLAR comment. Lag time got out of hand - had trouble making that post. Dropped down to 3 at a time (2 VLARS) and it is still bad. Took 10 seconds to open this thread. VLAR's on Nvidia aren't going to work for me. Also has adverse impact on other gpu tasks - their run time is abnormally long.
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.
ID: 1373257 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1373259 - Posted: 30 May 2013, 3:16:32 UTC

I agree the Nvidias does not like the VLARS...

Is there any configuration we could do in order to avoid the GPU receive the VLARS?
ID: 1373259 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65690
Credit: 55,293,173
RAC: 49
United States
Message 1373260 - Posted: 30 May 2013, 3:18:23 UTC

Me I'm working thru My cpu cache, before I can switch over, so far that's 35 hours of cpu work or 96 wu's, plus 3 I'm working on now.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 1373260 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1373264 - Posted: 30 May 2013, 3:43:34 UTC - in response to Message 1373257.  
Last modified: 30 May 2013, 3:46:25 UTC

...Update on my earlier VLAR comment. Lag time got out of hand - had trouble making that post. Dropped down to 3 at a time (2 VLARS) and it is still bad. Took 10 seconds to open this thread. VLAR's on Nvidia aren't going to work for me. Also has adverse impact on other gpu tasks - their run time is abnormally long.

Don't send those things to AMDs either. I tried a few on my 6850 with MB7_win_x86_SSE_OpenCL_ATi_HD5_r1817.exe. They work, but, the computer has 'spikes' of unresponsiveness. You can actually see it in the SIV CPU meter as a clear line every 30 seconds or so. That is with the period of iterations set at 32. Not to mention they took ~40 minutes to complete. The 6850 does an unblanked AP in less time. The GPU temp was lower, so was the credits...
ID: 1373264 · Report as offensive
bill

Send message
Joined: 16 Jun 99
Posts: 861
Credit: 29,352,955
RAC: 0
United States
Message 1373268 - Posted: 30 May 2013, 3:56:13 UTC - in response to Message 1373259.  

I agree the Nvidias does not like the VLARS...


My Nvidias don't have a problem with them.



Is there any configuration we could do in order to avoid the GPU receive the VLARS?

ID: 1373268 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1373276 - Posted: 30 May 2013, 4:34:25 UTC - in response to Message 1373239.  
Last modified: 30 May 2013, 4:52:50 UTC

In the previous thread, there were some comments about VLARs going to Nvidia GPU's now. I"m still on v6 for another 10 hours while draining my cache, and I notice that I have 3 VLARs and a non-VLAR now running on my 670 with x41zc, Cuda 5.00. I don't see any adverse effects except that run times will be longer than normal and there is a little lag in the system (responsible for any typos in this post!) :)

But the odd thing is that my gpu temperature is more than 10 degrees below normal. No downclock and gpu utilization is at a constant 99%. CPU usage is below normal for 6.10 tasks. Why would these run cooler? I expected the opposite.


It's surprising these class of GPUs didn't show pressure here under Beta test with the expected new 2 task per GPU optimum. If the reported experiences match the general consensus on these cards, I would request review of either:

- Removing VLARs from being sent to these GPUs, OR
- a change in default settings, OR
- an Opt-in/Opt-out feature. [e.g. My own aging Core2Duo with GTX 680 happily crunches them while watching the Starship Troopers Trilogy, I'd like to crunch them because they are longer & should hopefully get more credit]

In General there are a few things to be aware of (VLAR or not):
- V7 does new processing (Autocorrelations) that changes the dynamics quite substantially, including making all task times longer, not comparable to V6.
- If you were running 3, 4 or more tasks on the same GPU before, that is quite likely too many under V7. Autocorrelations are very memory intensive, reduce it to 2 at once per device. This is the 'main' reason for running cooler.

VLAR in particular:
- will be noticeable if you have too many running at once. If you experience any display lag with these, reduce the # of instances from 4 or 3 to 2.
- If problems persist, suspect 'system overcommit'. try the following settings in the empty supplied cfg file for the app:
[mbcuda]
processpriority = normal
pfblockspersm = 4
pfperiodsperlaunch = 50


Which are settings overrides to improve CPU responsiveness to the app, while reducing pressure from VLAR specific pulsefind loadings.

Jason
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1373276 · Report as offensive
1 · 2 · 3 · 4 . . . 21 · Next

Message boards : Number crunching : Panic Mode On (84) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.