Lunatics Windows Installer v0.43 Release Notes

Message boards : Number crunching : Lunatics Windows Installer v0.43 Release Notes
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 · Next

AuthorMessage
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1591595 - Posted: 24 Oct 2014, 22:41:03 UTC - in response to Message 1591590.  

Just curiosity Mike your son´s computer uses AMD or Intel CPU?

A long long not logical & totaly insane shoot, all the hosts i see with the issue are powered by low end Intels CPUs, Mike´s uses AMD CPU (at least the one listed by Boinc), could be possible a diference in the way the host deals with the memory be the source of the problem? Ok i know it´s wierd but i belive in witches.


Yes, AMD CPU.
A very slow 5000+

So it´s another AMD without the issue.

Could you try to run the WU on an Intel slow (old) CPU like ours? That could explain why you don´t have the issue and we all have.

Or maybe Raistmer who have AMD CPU could try the opositive?

I know have almost no sense what i sugest.


In some way it makes sense.
AMD is using a different memory controller.


With each crime and every kindness we birth our future.
ID: 1591595 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1591598 - Posted: 24 Oct 2014, 22:45:37 UTC - in response to Message 1591592.  

or getting very very drunk :-)

I´m in.
ID: 1591598 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1591623 - Posted: 24 Oct 2014, 23:23:51 UTC - in response to Message 1591598.  

Well, now that I know what I'm looking for I saw what Juan is talking about. I just had one 1 G of memory usage that gave 30/30 with 0 blanking and exited at 14 minutes.

Task 3798736005

Name ap_23jn11aa_B2_P1_00279_20141023_00693.wu_0

I use the following

-use_sleep -unroll 18 -oclFFT_plan 256 16 256 -ffa_block 16384 -ffa_block_fetch 8192 -tune 1 64 4 1 -tune 2 64 4 1 -hp

but it hasn't really affected my cruncher. I think there are several reasons for that. First thing pointed out by a coworker is that I'm running GTX780 with 3 GB Memory. The second was I run 16 GB of physical Memory in my Cruncher.

I can't be for sure about my second cruncher. It crashed 3 hours ago and I just noticed it. Checking the history I don't see any memory hogging work units. This one is different as it has 2 GTX 780 with 3 GB and 2 GTX 750 with 2 GB. That machine as 32 GB of physical memory. Guess I'll have to keep an eye on it and see if I run into a memory hog and see which GPU it goes to. This one has a different command line than the first, not as aggressive.


Zalster
ID: 1591623 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1591671 - Posted: 25 Oct 2014, 1:04:24 UTC - in response to Message 1591623.  
Last modified: 25 Oct 2014, 1:06:05 UTC

Well, now that I know what I'm looking for I saw what Juan is talking about. I just had one 1 G of memory usage that gave 30/30 with 0 blanking and exited at 14 minutes.

Task 3798736005

Name ap_23jn11aa_B2_P1_00279_20141023_00693.wu_0

I use the following

-use_sleep -unroll 18 -oclFFT_plan 256 16 256 -ffa_block 16384 -ffa_block_fetch 8192 -tune 1 64 4 1 -tune 2 64 4 1 -hp

but it hasn't really affected my cruncher. I think there are several reasons for that. First thing pointed out by a coworker is that I'm running GTX780 with 3 GB Memory. The second was I run 16 GB of physical Memory in my Cruncher.

I can't be for sure about my second cruncher. It crashed 3 hours ago and I just noticed it. Checking the history I don't see any memory hogging work units. This one is different as it has 2 GTX 780 with 3 GB and 2 GTX 750 with 2 GB. That machine as 32 GB of physical memory. Guess I'll have to keep an eye on it and see if I run into a memory hugg and see which GPU it goes to. This one has a different command line than the first, not as aggressive.


Zalster


Since you use an AMD CPU on your 780 host that indicates the problem is not CPU realated as i suspect on my last posts so you actualy answer that question.

Try to lower -ffa_block 8192 -ffa_block_fetch 4096 and check if the max memory usage changes to about 1/2 GB. If that happening you now see what i realy talk about. Few WU ussing 1 or more GB ends on out of memory error very fast on a 8 GB like the ones i use.

Lower values give us less memory hugged but there is a performance penalty.

Back to beer drinking task.
ID: 1591671 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1591674 - Posted: 25 Oct 2014, 1:11:09 UTC - in response to Message 1591671.  

I just saw 4 more Error Work units with too many exits. Check the Memory none of them were close to 1 GB so not sure why. I went ahead and modified the commandline like you suggest. I'll keep an eye on it



Zalster
ID: 1591674 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1591676 - Posted: 25 Oct 2014, 1:13:52 UTC
Last modified: 25 Oct 2014, 1:17:33 UTC

I just saw a WU with 1GB of memory usage even with -ffa_block 8192 -ffa_block_fetch 4096.

SO now i´m totaly lost, lower (defoult) block size slow the host but fix the problem.

Will leave with this configuration while waiting for a clue tomorrow.
ID: 1591676 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1591877 - Posted: 25 Oct 2014, 12:19:35 UTC
Last modified: 25 Oct 2014, 12:20:24 UTC

Raistmer post few interesting tests on the Lunatics site about the hugging memory WU problem.

http://lunatics.kwsn.net/12-gpu-crunching/opencl-ap-v7-memory-consumption.msg57231.html;topicseen#msg57231

We could be sure he his working hard to find why it happening and a possible fix for that.

I thanks him for that and let´s give him time to do his usual code magic.
ID: 1591877 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1592380 - Posted: 26 Oct 2014, 9:49:24 UTC - in response to Message 1591897.  

Thanks for trust :D

Some improvement is already reached indeed: http://lunatics.kwsn.net/12-gpu-crunching/opencl-ap-v7-memory-consumption.msg57241.html#msg57241

Ideas for more radical solution require possible non-trivial code changes so perhaps till next weekend, will see...
ID: 1592380 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1592386 - Posted: 26 Oct 2014, 10:07:40 UTC - in response to Message 1592380.  

Thanks for trust :D

Some improvement is already reached indeed: http://lunatics.kwsn.net/12-gpu-crunching/opencl-ap-v7-memory-consumption.msg57241.html#msg57241

Ideas for more radical solution require possible non-trivial code changes so perhaps till next weekend, will see...

Good news, i know we are in good hands. Take your time, if we could help in anything, just ask.
ID: 1592386 · Report as offensive
Profile Michel Makhlouta
Volunteer tester
Avatar

Send message
Joined: 21 Dec 03
Posts: 169
Credit: 41,799,743
RAC: 0
Lebanon
Message 1592388 - Posted: 26 Oct 2014, 10:24:01 UTC
Last modified: 26 Oct 2014, 10:31:22 UTC

-use_sleep -unroll 16 -oclfft_plan 256 16 256 -ffa_block 16384 -ffa_block_fetch 8192 -tune 1 64 4 1 -tune 2 64 4 1

I just moved from stock to lunatics, which I've been avoiding for a while but I guess my obsession with optimizing everything got the better of me. I've got an i7 4770K and 2x780. For the GPU, I went for cuda50 which was what stock was running. As for the CPU, I went for avx on both AP and MB, although stock was running sseX from time to time. Was it the right choice?

About the quoted text. I've seen this on the forums a couple of times now. From what I understood, it has some advantages when running AP? Can someone clarify the need for the command line and what's best for my setup? Also where to add the above line?

EDIT:
adding a question. I'm running 3 WU's per GPU, allocating 1 core to AP and 0.06 for MB. Any thoughts on my current values?
ID: 1592388 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1592389 - Posted: 26 Oct 2014, 10:29:58 UTC - in response to Message 1592388.  
Last modified: 26 Oct 2014, 10:32:18 UTC

-use_sleep -unroll 16 -oclfft_plan 256 16 256 -ffa_block 16384 -ffa_block_fetch 8192 -tune 1 64 4 1 -tune 2 64 4 1

I just moved from stock to lunatics, which I've been avoiding for a while but I guess my obsession with optimizing everything got the better of me. I've got an i7 4770K and 2x780. For the GPU, I went for cuda50 which was what stock was running. As for the CPU, I went for avx on both AP and MB, although stock was running sseX from time to time. Was it the right choice?

About the quoted text. I've seen this on the forums a couple of times now. From what I understood, it has some advantages when running AP? Can someone clarify the need for the command line and what's best for my setup? Also where to add the above line?


Check for ap_cmdline_win_x86_SSE2_OpenCL_NV.txt

I ´n case of the memory consumption on overflow tasks use this.

-use_sleep -unroll 16 -oclFFT_plan 256 16 256 -ffa_block 8192 -ffa_block_fetch 4096 -tune 1 64 4 1 -tune 2 64 4 1


With each crime and every kindness we birth our future.
ID: 1592389 · Report as offensive
Profile Michel Makhlouta
Volunteer tester
Avatar

Send message
Joined: 21 Dec 03
Posts: 169
Credit: 41,799,743
RAC: 0
Lebanon
Message 1592394 - Posted: 26 Oct 2014, 10:54:27 UTC

Thanks Mike. I've used the one in the readme file for now:
For NV x80/x70
-use_sleep -unroll 18 -oclFFT_plan 256 16 256 -ffa_block 16384 -ffa_block_fetch 8192 -tune 1 64 8 1 -tune 2 64 8 1

I've also freed 2 cores, utilization is still 100%, so I guess running all cores is creating a bottleneck somewhere?

I've had a crash 2 minutes ago:
Display driver nvlddmkm stopped responding and has successfully recovered.

I will wait and see if this occurs again.
ID: 1592394 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1592396 - Posted: 26 Oct 2014, 10:58:35 UTC - in response to Message 1591338.  

-oclFFT_plan is case sensitive.

Maybe is a good ideia change that, all the other switches uses lower case letters.


Like Raistmer said its for advanced users.
Everybody can snip it out of the read me.

Sorry for the typo.

My mistake.


Well, that option is from advanced area not cause it hard to type of course :) but cause not all combos will go and there is no fool-proof at its excersising.

But indeed, there is inconsistency in options naming. FFT is shortcut, but FFA is just similar shortcut.

Hence there should be -FFA_block and -FFA_block_fetch.
In next builds app will understand both "correct" option naming (case-sencitive where upper case can be) and low-register "unix-style" one.
(-oclfft_plan and -oclFFT_plan both will go along with -FFA_block ).
ID: 1592396 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1592397 - Posted: 26 Oct 2014, 10:59:53 UTC - in response to Message 1592394.  

Thanks Mike. I've used the one in the readme file for now:
For NV x80/x70
-use_sleep -unroll 18 -oclFFT_plan 256 16 256 -ffa_block 16384 -ffa_block_fetch 8192 -tune 1 64 8 1 -tune 2 64 8 1

I've also freed 2 cores, utilization is still 100%, so I guess running all cores is creating a bottleneck somewhere?

I've had a crash 2 minutes ago:
Display driver nvlddmkm stopped responding and has successfully recovered.

I will wait and see if this occurs again.


Yes, it will slow processing down without freeing cores.

Please reduce ff_block values like i posted to prevent memory leak on overflown tasks.
Not sure how much system RAM you have installed.
Just to be on the safe side.


With each crime and every kindness we birth our future.
ID: 1592397 · Report as offensive
Profile Michel Makhlouta
Volunteer tester
Avatar

Send message
Joined: 21 Dec 03
Posts: 169
Credit: 41,799,743
RAC: 0
Lebanon
Message 1592399 - Posted: 26 Oct 2014, 11:14:44 UTC - in response to Message 1592397.  


Yes, it will slow processing down without freeing cores.

Please reduce ff_block values like i posted to prevent memory leak on overflown tasks.
Not sure how much system RAM you have installed.
Just to be on the safe side.


Arlight, I've changed all values to what you've provided earlier. I've got 16GB RAM. I've had another display driver crash, 15 minutes after the first one. I used to have this once or twice a week after I added another card in SLI, but it has become more frequent after installing lunatics. Any ideas?
ID: 1592399 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1592673 - Posted: 26 Oct 2014, 23:29:54 UTC - in response to Message 1591537.  

The oclFFT_plan will more than compensate it.
It speeds up at least by 10% if set correctly.
Try this for your multi GPU host.

-use_sleep -unroll 10 -oclFFT_plan 256 16 256 -ffa_block 8192 -ffa_block_fetch 4096 -tune 1 64 4 1 -tune 2 64 4 1

Correcting the -oclfft_plan to -oclFFT_plan seems to have somewhat mixed results. I don't feel I have enough data points on my xw9400 yet using the above cmdline to draw firm conclusions, though the fact that it has completely locked up 5 times in 4 days is rather discouraging. ;^(

However, on my T7400 (using similar cmdline but with -unroll 12) I've found that while the corrected -oclFFT_plan has helped the GTX 660 and GTX 670, it appears to have had no material effect on the GTX 780. Here are my observations, using only 0% blanked tasks which did not reach 30 pulses of either type.

Scenario #1 is baseline, using no ap_cmdline.
Scenario #2 is using ap_cmdline with incorrect -oclfft_plan (which seems to be ignored)
Scenario #3 is using ap_cmdline with corrected -oclFFT_plan

Average Run Times (5 tasks, 0% blanked, less than 30 pulses of each type)
GTX 660: #1) 54 min 29.4 sec; #2) 1 hour 16 min 5.5 sec (+39.6%); #3) 1 hour 0 min 58 sec (+11.9%)
GTX 670: #1) 40 min 24.8 sec; #2) 48 min 2.6 sec (+18.9%); #3) 41 min 48.8 sec (+3.5%)
GTX 780: #1) 25 min 52.2 sec; #2) 35 min 10.6 sec (+36.0%); #3) 37 min 37.8 sec (+45.5%)

For all 3 GPUs, of course, the CPU Time has fallen dramatically thanks to the -use_sleep parameter. However, the Run Time results are really a net loss for the machine. The GTX 780 should be the most productive GPU, and the significant Run Time slowdown that it's exhibiting considerably overshadows the overall CPU time gains.

For the time being, I've removed everything but -use_sleep from the ap_cmdline, to see if I can get something close to the baseline Run Times along with the reduced CPU Times. (I've done the same with the xw9400 to see if the lockups go away. Perhaps there's something else going on there, but I've never had that happen even once before, and suddenly got 5 in 4 days.)

I would guess that it might be very difficult to come up with a set of ap_cmdline parameters that would be "best" for a machine with mixed GPUs like those two of mine (and, I suspect, like many others out there). Since the AP app is now attempting to internally assign default values for some of the parameters based on a GPUs CU capability, I was wondering if it might make more sense for the ap_cmdline to input a "multiplier" for those values, rather than absolutes. That way, the multiplier would apply to the default values for each GPU, rather than having a single absolute "middle ground" value apply to all. Just a thought.
ID: 1592673 · Report as offensive
Stick Project Donor
Volunteer tester

Send message
Joined: 26 Feb 00
Posts: 100
Credit: 5,283,449
RAC: 5
United States
Message 1593206 - Posted: 28 Oct 2014, 1:04:30 UTC
Last modified: 28 Oct 2014, 1:08:23 UTC

Task 3804666490 is repeatedly hanging up on my Toshiba Laptop's GPU at the 11.865% mark. When this happens my screen flickers momentarily and I get a message that says the display driver stopped responding and has recovered. But the Task remains hung-up until I either suspend/resume the task or restart BOINC. After a restart, the task reverts to the previous checkpoint, and it starts counting up again until the 11.685% mark is reached.

I have had this issue before (with differing hang-up points), with previous Lunatics releases and with stock applications as well as with various releases of BOINC and the Catalyst.

I believe the problem is related to certain WU's. That is, it happens very rarely - maybe with 1% of the WU's I get. The other 99% complete and validate without any problems. I am guessing that with certain WU's, the program bumps up against the hardware limits of my GPU and crashes. But, that is only a guess.

I am reporting it here in case anyone is interested in investigating the issue further. And I will hold the task "suspended" for a few days in case there are questions. As I said earlier, the problem occurs very rarely so I understand if it is deemed not worth pursuing.
ID: 1593206 · Report as offensive
Wedge009
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 451
Credit: 431,396,357
RAC: 553
Australia
Message 1593231 - Posted: 28 Oct 2014, 2:29:46 UTC

Stick, I have similar APUs to your machine and have experienced the same on Multi-Beam tasks with the GPU application. It doesn't happen all the time but often enough. In those cases I switched the task to the CPU (requires fiddling with the client_state.xml), but I've since stopped running MB on the GPU for those low-powered hosts - only AstroPulse runs on the GPU now.

Basically I came to the same conclusion - the hardware probably can't handle the complexity of the MB GPU application as well as their more powerful siblings.
Soli Deo Gloria
ID: 1593231 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1593237 - Posted: 28 Oct 2014, 2:37:14 UTC
Last modified: 28 Oct 2014, 2:40:05 UTC

Stick & Wedge,
You might just need to tweak some of the Windows OS video watchdog settings.

See more details here: http://setiathome.berkeley.edu/forum_thread.php?id=75324&postid=1553652#1553652

If bumping up the TdrDelay doesn't work you might want to just disable it with the TdrLevel setting.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1593237 · Report as offensive
Wedge009
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 451
Credit: 431,396,357
RAC: 553
Australia
Message 1593239 - Posted: 28 Oct 2014, 2:38:52 UTC

Hmm. Might be interesting to try next time I need to try the MB GPU application. Thanks.
Soli Deo Gloria
ID: 1593239 · Report as offensive
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 · Next

Message boards : Number crunching : Lunatics Windows Installer v0.43 Release Notes


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.