Panic Mode On (84) Server Problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (84) Server Problems?

1 · 2 · 3 · 4 . . . 22 · Next
Author Message
Profile arkaynProject donor
Volunteer tester
Avatar
Send message
Joined: 14 May 99
Posts: 3721
Credit: 48,768,260
RAC: 1,737
United States
Message 1373211 - Posted: 30 May 2013, 1:08:18 UTC

In honor of Seti@Home v7 roll out, it is time for a new thread.
____________

Glenn savill
Avatar
Send message
Joined: 20 Aug 99
Posts: 2720
Credit: 4,209,885
RAC: 9,896
Australia
Message 1373213 - Posted: 30 May 2013, 1:11:41 UTC - in response to Message 1373211.

In honor of Seti@Home v7 roll out, it is time for a new thread.


+1.....hehehehe
____________

juan BFBProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 5470
Credit: 313,419,242
RAC: 151,755
Brazil
Message 1373219 - Posted: 30 May 2013, 1:38:09 UTC
Last modified: 30 May 2013, 1:39:47 UTC

V7 running, but it seems something is not working fine, on a 2x690 Hosts it runs cuda42 and some cuda32 and not the right one for this type of GPU (expect to run cuda50 or i´m wrong?).

http://setiathome.berkeley.edu/show_host_detail.php?hostid=6269362

Any clue?
____________

Profile jason_gee
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 5079
Credit: 74,107,062
RAC: 6,452
Australia
Message 1373222 - Posted: 30 May 2013, 1:41:44 UTC - in response to Message 1373219.
Last modified: 30 May 2013, 1:47:30 UTC

Any clue?


The server needs to try each compatible version (gather statistics), to determine which is best. This should converge on Cuda5 for those after many tasks. If it converges on the wrong one, there will be a reset statistics button (at some stage).
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

juan BFBProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 5470
Credit: 313,419,242
RAC: 151,755
Brazil
Message 1373224 - Posted: 30 May 2013, 1:48:53 UTC - in response to Message 1373222.
Last modified: 30 May 2013, 1:49:26 UTC

Any clue?


The server needs to try each compatible version, to determine which is best. This should converge on Cuda5 for those after many tasks. If it converges on the wrong one, there will be a reset statistics button (at some stage).

So a select version option in the app_config.xml could be a good ideia...

That will be a long night/day for you guys... hope you all have a good beer & coffee stock to help.
____________

Profile SciManStevProject donor
Volunteer tester
Avatar
Send message
Joined: 20 Jun 99
Posts: 4894
Credit: 83,862,208
RAC: 18,849
United States
Message 1373226 - Posted: 30 May 2013, 1:51:16 UTC

I am running stock now, and doing exactly what Jason suggests. It will all balance out in the long run, and with a project that has the potential for going long past my expected life time, I am happy. It will all balance out on it's own. Then the tweaking begins......

Steve
____________
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website

Profile jason_gee
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 5079
Credit: 74,107,062
RAC: 6,452
Australia
Message 1373228 - Posted: 30 May 2013, 1:55:40 UTC - in response to Message 1373224.

Any clue?


The server needs to try each compatible version, to determine which is best. This should converge on Cuda5 for those after many tasks. If it converges on the wrong one, there will be a reset statistics button (at some stage).

So a select version option in the app_config.xml could be a good ideia...

That will be a long night/day for you guys... hope you all have a good beer & coffee stock to help.


Forcing an application version can already be done with app_info.xml, as the installers will do. From the server perspective it needs to have your knowledge about versions, but is a blank slate. For credits to dial in, and your APRs correctly confirm what you already know (or break horribly), best to let it run stock for a while & see if it works out sensible numbers, or David & Eric need to be locked in a small dark room together until they work it out :D

Jason
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

tbretProject donor
Volunteer tester
Avatar
Send message
Joined: 28 May 99
Posts: 2897
Credit: 218,381,374
RAC: 62,793
United States
Message 1373229 - Posted: 30 May 2013, 1:56:09 UTC - in response to Message 1373222.



The server needs to try each compatible version (gather statistics), to determine which is best.


We could save it a lot of trouble if that were an option.

Profile betregerProject donor
Avatar
Send message
Joined: 29 Jun 99
Posts: 2578
Credit: 5,382,801
RAC: 4,497
United States
Message 1373230 - Posted: 30 May 2013, 1:59:49 UTC - in response to Message 1373226.

I am running stock now, and doing exactly what Jason suggests. It will all balance out in the long run, and with a project that has the potential for going long past my expected life time, I am happy. It will all balance out on it's own. Then the tweaking begins......

Steve

Steve, that is too damn rational.
____________

Profile jason_gee
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 5079
Credit: 74,107,062
RAC: 6,452
Australia
Message 1373231 - Posted: 30 May 2013, 2:01:02 UTC - in response to Message 1373229.
Last modified: 30 May 2013, 2:01:34 UTC



The server needs to try each compatible version (gather statistics), to determine which is best.


We could save it a lot of trouble if that were an option.


As per previous post, yeah you already have that option with app_info.xml.
In the short term, It's more about dialling in credits, which will probably be all over the place for some time.
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

Profile Fred E.Project donor
Volunteer tester
Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,139,004
RAC: 0
United States
Message 1373239 - Posted: 30 May 2013, 2:25:22 UTC

In the previous thread, there were some comments about VLARs going to Nvidia GPU's now. I"m still on v6 for another 10 hours while draining my cache, and I notice that I have 3 VLARs and a non-VLAR now running on my 670 with x41zc, Cuda 5.00. I don't see any adverse effects except that run times will be longer than normal and there is a little lag in the system (responsible for any typos in this post!) :)

But the odd thing is that my gpu temperature is more than 10 degrees below normal. No downclock and gpu utilization is at a constant 99%. CPU usage is below normal for 6.10 tasks. Why would these run cooler? I expected the opposite.


____________
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.

ExchangeMan
Volunteer tester
Send message
Joined: 9 Jan 00
Posts: 115
Credit: 147,072,092
RAC: 87,652
United States
Message 1373242 - Posted: 30 May 2013, 2:36:35 UTC - in response to Message 1373239.

In the previous thread, there were some comments about VLARs going to Nvidia GPU's now. I"m still on v6 for another 10 hours while draining my cache, and I notice that I have 3 VLARs and a non-VLAR now running on my 670 with x41zc, Cuda 5.00. I don't see any adverse effects except that run times will be longer than normal and there is a little lag in the system (responsible for any typos in this post!) :)

But the odd thing is that my gpu temperature is more than 10 degrees below normal. No downclock and gpu utilization is at a constant 99%. CPU usage is below normal for 6.10 tasks. Why would these run cooler? I expected the opposite.


I see the exact same thing. My guess is that since VLARs don't do well parallelizing, you can't keep as many cores busy in the GPU. However, Precision X reports high CPU usage, especially with another task running on that same GPU. Less cores in use mean less heat.

I can see this is going to be a problem since not only does the VLAR run much longer compared to a normal work unit, but it degrades the other jobs running on that same GPU. I don't know if there is a workaround to this except for only running a single tasks at a time on all my GPUs. That's doesn't seem to be a very efficient use of GPU resources. I would much rather the GPU stay on the CPUs which do pretty well with them. I wouldn't care if all my CPU tasks were VLARs.

____________

Profile arkaynProject donor
Volunteer tester
Avatar
Send message
Joined: 14 May 99
Posts: 3721
Credit: 48,768,260
RAC: 1,737
United States
Message 1373246 - Posted: 30 May 2013, 2:43:59 UTC - in response to Message 1373242.

In the previous thread, there were some comments about VLARs going to Nvidia GPU's now. I"m still on v6 for another 10 hours while draining my cache, and I notice that I have 3 VLARs and a non-VLAR now running on my 670 with x41zc, Cuda 5.00. I don't see any adverse effects except that run times will be longer than normal and there is a little lag in the system (responsible for any typos in this post!) :)

But the odd thing is that my gpu temperature is more than 10 degrees below normal. No downclock and gpu utilization is at a constant 99%. CPU usage is below normal for 6.10 tasks. Why would these run cooler? I expected the opposite.


I see the exact same thing. My guess is that since VLARs don't do well parallelizing, you can't keep as many cores busy in the GPU. However, Precision X reports high CPU usage, especially with another task running on that same GPU. Less cores in use mean less heat.

I can see this is going to be a problem since not only does the VLAR run much longer compared to a normal work unit, but it degrades the other jobs running on that same GPU. I don't know if there is a workaround to this except for only running a single tasks at a time on all my GPUs. That's doesn't seem to be a very efficient use of GPU resources. I would much rather the GPU stay on the CPUs which do pretty well with them. I wouldn't care if all my CPU tasks were VLARs.


I have something weird going on with my 2 machines, they were assigned VLAR but they are showing up as suspended by user and I did not suspend them.
____________

Dave Stegner
Volunteer tester
Avatar
Send message
Joined: 20 Oct 04
Posts: 476
Credit: 41,303,535
RAC: 8,238
United States
Message 1373252 - Posted: 30 May 2013, 2:55:14 UTC

Is there a need to change preferences for amount of work and additional work with V7 ? I seem to remember they are now backwards, or have I read too many posts.
____________
Dave

Profile Fred E.Project donor
Volunteer tester
Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,139,004
RAC: 0
United States
Message 1373257 - Posted: 30 May 2013, 3:10:51 UTC

Is there a need to change preferences for amount of work and additional work with V7 ? I seem to remember they are now backwards, or have I read too many posts.
____________
Dave
Thinbk you're remembering the difference in work fetch settings between BOINC 6(and earlier) and BOINC 7 where you have to swtich the settings. Unless you upgrade BOINC, there's no need to change those settings. You do need to make sure SETIatHome v7 is selected in your website project preferences.

Update on my earlier VLAR comment. Lag time got out of hand - had trouble making that post. Dropped down to 3 at a time (2 VLARS) and it is still bad. Took 10 seconds to open this thread. VLAR's on Nvidia aren't going to work for me. Also has adverse impact on other gpu tasks - their run time is abnormally long.
____________
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.

juan BFBProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 5470
Credit: 313,419,242
RAC: 151,755
Brazil
Message 1373259 - Posted: 30 May 2013, 3:16:32 UTC

I agree the Nvidias does not like the VLARS...

Is there any configuration we could do in order to avoid the GPU receive the VLARS?
____________

zoom314Project donor
Avatar
Send message
Joined: 30 Nov 03
Posts: 46748
Credit: 36,998,845
RAC: 3,681
United States
Message 1373260 - Posted: 30 May 2013, 3:18:23 UTC

Me I'm working thru My cpu cache, before I can switch over, so far that's 35 hours of cpu work or 96 wu's, plus 3 I'm working on now.
____________
My Facebook, War Commander, 2015

TBar
Volunteer tester
Send message
Joined: 22 May 99
Posts: 1496
Credit: 53,060,113
RAC: 48,437
United States
Message 1373264 - Posted: 30 May 2013, 3:43:34 UTC - in response to Message 1373257.
Last modified: 30 May 2013, 3:46:25 UTC

...Update on my earlier VLAR comment. Lag time got out of hand - had trouble making that post. Dropped down to 3 at a time (2 VLARS) and it is still bad. Took 10 seconds to open this thread. VLAR's on Nvidia aren't going to work for me. Also has adverse impact on other gpu tasks - their run time is abnormally long.

Don't send those things to AMDs either. I tried a few on my 6850 with MB7_win_x86_SSE_OpenCL_ATi_HD5_r1817.exe. They work, but, the computer has 'spikes' of unresponsiveness. You can actually see it in the SIV CPU meter as a clear line every 30 seconds or so. That is with the period of iterations set at 32. Not to mention they took ~40 minutes to complete. The 6850 does an unblanked AP in less time. The GPU temp was lower, so was the credits...

bill
Send message
Joined: 16 Jun 99
Posts: 861
Credit: 24,148,044
RAC: 2,510
United States
Message 1373268 - Posted: 30 May 2013, 3:56:13 UTC - in response to Message 1373259.

I agree the Nvidias does not like the VLARS...


My Nvidias don't have a problem with them.



Is there any configuration we could do in order to avoid the GPU receive the VLARS?

Profile jason_gee
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 5079
Credit: 74,107,062
RAC: 6,452
Australia
Message 1373276 - Posted: 30 May 2013, 4:34:25 UTC - in response to Message 1373239.
Last modified: 30 May 2013, 4:52:50 UTC

In the previous thread, there were some comments about VLARs going to Nvidia GPU's now. I"m still on v6 for another 10 hours while draining my cache, and I notice that I have 3 VLARs and a non-VLAR now running on my 670 with x41zc, Cuda 5.00. I don't see any adverse effects except that run times will be longer than normal and there is a little lag in the system (responsible for any typos in this post!) :)

But the odd thing is that my gpu temperature is more than 10 degrees below normal. No downclock and gpu utilization is at a constant 99%. CPU usage is below normal for 6.10 tasks. Why would these run cooler? I expected the opposite.


It's surprising these class of GPUs didn't show pressure here under Beta test with the expected new 2 task per GPU optimum. If the reported experiences match the general consensus on these cards, I would request review of either:

- Removing VLARs from being sent to these GPUs, OR
- a change in default settings, OR
- an Opt-in/Opt-out feature. [e.g. My own aging Core2Duo with GTX 680 happily crunches them while watching the Starship Troopers Trilogy, I'd like to crunch them because they are longer & should hopefully get more credit]

In General there are a few things to be aware of (VLAR or not):
- V7 does new processing (Autocorrelations) that changes the dynamics quite substantially, including making all task times longer, not comparable to V6.
- If you were running 3, 4 or more tasks on the same GPU before, that is quite likely too many under V7. Autocorrelations are very memory intensive, reduce it to 2 at once per device. This is the 'main' reason for running cooler.

VLAR in particular:
- will be noticeable if you have too many running at once. If you experience any display lag with these, reduce the # of instances from 4 or 3 to 2.
- If problems persist, suspect 'system overcommit'. try the following settings in the empty supplied cfg file for the app:
[mbcuda]
processpriority = normal
pfblockspersm = 4
pfperiodsperlaunch = 50


Which are settings overrides to improve CPU responsiveness to the app, while reducing pressure from VLAR specific pulsefind loadings.

Jason
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

1 · 2 · 3 · 4 . . . 22 · Next

Message boards : Number crunching : Panic Mode On (84) Server Problems?

Copyright © 2014 University of California