Lunatics Windows Installer v0.42 Release Notes

Message boards : Number crunching : Lunatics Windows Installer v0.42 Release Notes
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 12 · Next

AuthorMessage
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34253
Credit: 79,922,639
RAC: 80
Germany
Message 1557384 - Posted: 15 Aug 2014, 16:49:30 UTC - in response to Message 1557365.  

Found this in the AP OpenCL Readme.txt -

****Best usage tips****

For best performance it is important to free 2 CPU cores running multiple instances.
Freeing at least 1 CPU core is necessity to get enough GPU usage.*


When I first started using multi GPUs with multi tasks (v0.41) I was using .5 CPU & .5 GPU in the app_config.xml and it was strongly suggested that I increase the CPU usage to 1 to prevent kernel thrashing. Now, I'm seeing above that that is not the case and I can run .5 CPU & .5 GPU or even .33 GPU under v0.42. Right now I'm running with 1 CPU & .33 GPU per device, which uses 1 core for each task on a GTX750Ti. I need clarification on this as I would like to free up more cores, if possible, for CPU processing but keep the same number of GPU tasks.


You dont need it because you have a fast CPU.
Of course its always good to have some resources left for heavily blanked AP`s.
You are using the sleep switch and it works quite well on your host.

Please consider we need to make sure it works for everybody using the installer and a lot of hosts have rather slow CPU`s.



I've reduced the CPU specs to .5 for AP tasks and there doesn't appear to be any significant change is total run time for those tasks that have been reported. So anyone who has a faster machine who are running AP tasks and desire to increase the number of GPU tasks but reduce the number of cores supporting them this is good news. I did notice that there is a slight fluctuation in GPU usage of 94-99% from a steady 99% , but I can live with that. Memory usage appears to be averaging 759MB out of 2048MB for each device, so I might attempt to increase the number of tasks from .33 to .25 to see if I can use up some of that.


Beware just using another motherboard could change that.
I have tested quite a few different host CPU/GPU combos and most still needed a free core.

You will gain nothing running 4 instances on your GPU`s.
Even on my fast GPU its not worth running 4 instances.


With each crime and every kindness we birth our future.
ID: 1557384 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1557423 - Posted: 15 Aug 2014, 17:56:08 UTC

Mike - Could you verify, he is ussing a 750Ti who has only 5 CU, unroll 12 is not to much for this GPU?

Zalster - Could you confirm, on the 750Ti the optimal point for GPU crunching is 2 AP or 3 MB max.
ID: 1557423 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1557475 - Posted: 15 Aug 2014, 19:12:35 UTC - in response to Message 1557346.  


Well, I think we've drifted far enough away from relevance to the thread. My original question was, can I set Boinc for delayed start so it will find the GPU. The answer is, yes, I can set it for delayed start, but it still doesn't find the GPU so there's no point. I also looked briefly into making the video driver start at startup instead of at login, but I wasn't able to do that. (I didn't put a major effort into it, though.)

If you can do that Microsoft would like to have a word with you.

I have apps from v0.42 running on all of my machines with the exception of my 1 low end mobile ATI GPU.
Seems there are some GPU with max work group size of 128 that ignore -tune command & will be unable to run AP v6 r2399 application.
If you have such a GPU you might want to head over the beta & see if test app works or if there is a different issue.

Remember that the test app at Beta, and the development work being put into it, are for Astropulse v7 only. It will not be usable here until the whole AP v7 application transfer has taken place.

(I know you wouldn't be thinking of doing that, HAL, but just pre-empting anyone who might have thought they'd spotted a possibility)


I should have included that note as well. My thinking is that apps on SETI@home Beta are only for SETI@home Beta, but I have been doing software testing as my job for 15 some odd years. So my line of thinking might not be the same as everyone else.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1557475 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34253
Credit: 79,922,639
RAC: 80
Germany
Message 1557526 - Posted: 15 Aug 2014, 21:34:18 UTC - in response to Message 1557423.  

Mike - Could you verify, he is ussing a 750Ti who has only 5 CU, unroll 12 is not to much for this GPU?

Zalster - Could you confirm, on the 750Ti the optimal point for GPU crunching is 2 AP or 3 MB max.


Yes, confirmed.

But he is using lower ffa_fetch values.
Its a little bit complicated.


With each crime and every kindness we birth our future.
ID: 1557526 · Report as offensive
Profile Cliff Harding
Volunteer tester
Avatar

Send message
Joined: 18 Aug 99
Posts: 1432
Credit: 110,967,840
RAC: 67
United States
Message 1557543 - Posted: 15 Aug 2014, 22:01:22 UTC - in response to Message 1557526.  

Mike - Could you verify, he is ussing a 750Ti who has only 5 CU, unroll 12 is not to much for this GPU?

Zalster - Could you confirm, on the 750Ti the optimal point for GPU crunching is 2 AP or 3 MB max.


Yes, confirmed.

But he is using lower ffa_fetch values.
Its a little bit complicated.


These are the options that I'm using -use_sleep -unroll 12 -ffa_block 8192 -ffa_block_fetch 4096 -hp, what should I be using for the ffa_fetch value?


I don't buy computers, I build them!!
ID: 1557543 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34253
Credit: 79,922,639
RAC: 80
Germany
Message 1557544 - Posted: 15 Aug 2014, 22:08:19 UTC - in response to Message 1557543.  

Mike - Could you verify, he is ussing a 750Ti who has only 5 CU, unroll 12 is not to much for this GPU?

Zalster - Could you confirm, on the 750Ti the optimal point for GPU crunching is 2 AP or 3 MB max.


Yes, confirmed.

But he is using lower ffa_fetch values.
Its a little bit complicated.


These are the options that I'm using -use_sleep -unroll 12 -ffa_block 8192 -ffa_block_fetch 4096 -hp, what should I be using for the ffa_fetch value?


For your GPU thats fine.
You could try -uroll 10 ffa_fetch 1288 -ffa_fetch_block 6144.
I prefer lower unroll and higher ffa_fetch values for speed.
But your milage might vary.


With each crime and every kindness we birth our future.
ID: 1557544 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1557550 - Posted: 15 Aug 2014, 22:16:59 UTC - in response to Message 1557365.  

... I might attempt to increase the number of tasks from .33 to .25 to see if I can use up some of that.

Keep in mind that will probably result in less work being done per hour, not more, leading to an eventual reduction in RAC. Keep a close eye on runtimes if you change to 3 at a time.
Grant
Darwin NT
ID: 1557550 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1557551 - Posted: 15 Aug 2014, 22:17:49 UTC

Sorry Juan I was on the road to south padre island. First chance to log back in. I use 2 aps on 750s and 2 mbs but could do 3 mbs without any problems.
ID: 1557551 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34253
Credit: 79,922,639
RAC: 80
Germany
Message 1557552 - Posted: 15 Aug 2014, 22:19:19 UTC - in response to Message 1557550.  

... I might attempt to increase the number of tasks from .33 to .25 to see if I can use up some of that.

Keep in mind that will probably result in less work being done per hour, not more, leading to an eventual reduction in RAC. Keep a close eye on runtimes if you change to 3 at a time.


Thats for sure.
No need to test that.
Not one GPU i`ve tested is gaining anything running 3 instances.


With each crime and every kindness we birth our future.
ID: 1557552 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34253
Credit: 79,922,639
RAC: 80
Germany
Message 1557726 - Posted: 16 Aug 2014, 8:15:55 UTC - in response to Message 1557544.  

Mike - Could you verify, he is ussing a 750Ti who has only 5 CU, unroll 12 is not to much for this GPU?

Zalster - Could you confirm, on the 750Ti the optimal point for GPU crunching is 2 AP or 3 MB max.


Yes, confirmed.

But he is using lower ffa_fetch values.
Its a little bit complicated.


These are the options that I'm using -use_sleep -unroll 12 -ffa_block 8192 -ffa_block_fetch 4096 -hp, what should I be using for the ffa_fetch value?


For your GPU thats fine.
You could try -uroll 10 ffa_fetch 1288 -ffa_fetch_block 6144.
I prefer lower unroll and higher ffa_fetch values for speed.
But your milage might vary.


Now i`ve got a typo.

Cliff it should be

-uroll 10 ffa_fetch 12288 -ffa_fetch_block 6144


With each crime and every kindness we birth our future.
ID: 1557726 · Report as offensive
Profile Cliff Harding
Volunteer tester
Avatar

Send message
Joined: 18 Aug 99
Posts: 1432
Credit: 110,967,840
RAC: 67
United States
Message 1557740 - Posted: 16 Aug 2014, 9:02:25 UTC - in response to Message 1557726.  

Mike - Could you verify, he is ussing a 750Ti who has only 5 CU, unroll 12 is not to much for this GPU?

Zalster - Could you confirm, on the 750Ti the optimal point for GPU crunching is 2 AP or 3 MB max.


Yes, confirmed.

But he is using lower ffa_fetch values.
Its a little bit complicated.


These are the options that I'm using -use_sleep -unroll 12 -ffa_block 8192 -ffa_block_fetch 4096 -hp, what should I be using for the ffa_fetch value?


For your GPU thats fine.
You could try -uroll 10 ffa_fetch 1288 -ffa_fetch_block 6144.
I prefer lower unroll and higher ffa_fetch values for speed.
But your milage might vary.


Now i`ve got a typo.

Cliff it should be

-uroll 10 ffa_fetch 12288 -ffa_fetch_block 6144



I saw that I was getting errors with the old settings so I pulled it back.
DATA_CHUNK_UNROLL set to:10
FFA thread block override value:1288
WARNING: incorrect FFA thread fetch block override value:6144, using defaults
I'll try the new setting and see what happens.


I don't buy computers, I build them!!
ID: 1557740 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1557743 - Posted: 16 Aug 2014, 9:09:41 UTC - in response to Message 1557726.  

Mike - Could you verify, he is ussing a 750Ti who has only 5 CU, unroll 12 is not to much for this GPU?

Zalster - Could you confirm, on the 750Ti the optimal point for GPU crunching is 2 AP or 3 MB max.


Yes, confirmed.

But he is using lower ffa_fetch values.
Its a little bit complicated.


These are the options that I'm using -use_sleep -unroll 12 -ffa_block 8192 -ffa_block_fetch 4096 -hp, what should I be using for the ffa_fetch value?


For your GPU thats fine.
You could try -uroll 10 ffa_fetch 1288 -ffa_fetch_block 6144.
I prefer lower unroll and higher ffa_fetch values for speed.
But your milage might vary.


Now i`ve got a typo.

Cliff it should be

-uroll 10 ffa_fetch 12288 -ffa_fetch_block 6144

More typos, the -uroll should be -unroll and ffa_fetch should be -ffa_fetch

Claggy
ID: 1557743 · Report as offensive
Ivailo Bonev
Volunteer tester
Avatar

Send message
Joined: 26 Jun 00
Posts: 247
Credit: 35,864,461
RAC: 2
Bulgaria
Message 1557766 - Posted: 16 Aug 2014, 10:18:41 UTC

ID: 1557766 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1557771 - Posted: 16 Aug 2014, 10:38:31 UTC - in response to Message 1557766.  

Something is wrong with latest Lunatics Astropulse app and my oldie GT9800:
http://setiathome.berkeley.edu/result.php?resultid=3679594214
http://setiathome.berkeley.edu/result.php?resultid=3678689319
http://setiathome.berkeley.edu/result.php?resultid=3678139786

Thanks for the report. I've got a 9800GT myself, but unfortunately it's lying cold on the bench - I replaced all the 5 year old cards with GTX 750Ti at four times the productivity and half the electricity consumption.

I doubt we can do much about AP v6 at this stage, except suggest that you pull the older app out of the v0.41 installer and use that for the time being (I can help with a private drop if you need it).

I see you've got the 'volunteer tester' tag for being attached to Beta. Would you mind running a quick test to see if your card runs the AP v7 Beta app currently in testing? It makes more sense testing that at this stage, so that we can be sure AP v7 runs on older cards, both as stock and in the next Lunatics release.
ID: 1557771 · Report as offensive
Ivailo Bonev
Volunteer tester
Avatar

Send message
Joined: 26 Jun 00
Posts: 247
Credit: 35,864,461
RAC: 2
Bulgaria
Message 1557777 - Posted: 16 Aug 2014, 10:48:56 UTC - in response to Message 1557771.  

Sure, I'll jump in beta and turn off astropulse for now. Thanks :)
ID: 1557777 · Report as offensive
qbit
Volunteer tester
Avatar

Send message
Joined: 19 Sep 04
Posts: 630
Credit: 6,868,528
RAC: 0
Austria
Message 1557780 - Posted: 16 Aug 2014, 10:59:18 UTC

I want to say thx to everybody involved for this new installer!

Not sure about GPU, but CPU crunching on my laptop is definitly faster with the new apps. APs used to run ~30 hrs on my lappy and now they are done in ~26 hours. Haven't done much MB yet, but they seem to be faster also.

So again: Thank you, folks!
ID: 1557780 · Report as offensive
Profile Cliff Harding
Volunteer tester
Avatar

Send message
Joined: 18 Aug 99
Posts: 1432
Credit: 110,967,840
RAC: 67
United States
Message 1557799 - Posted: 16 Aug 2014, 12:10:14 UTC

Can someone explain, in laymen terms, what the -tune option is and how it is supposed to work.


I don't buy computers, I build them!!
ID: 1557799 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34253
Credit: 79,922,639
RAC: 80
Germany
Message 1557802 - Posted: 16 Aug 2014, 12:30:13 UTC - in response to Message 1557799.  
Last modified: 16 Aug 2014, 12:52:29 UTC

Can someone explain, in laymen terms, what the -tune option is and how it is supposed to work.


The tune param will define the kernel size of the GPU into chunks.
Your GPU has work group size of 256 for example.

So tune 1 64 4 means it will load 64 4 times until 256 is reached.

Possible values are 128 2 or 32 8 or 16 16.
Using unequal number results in lower speed because value of 256 can`t be reached in this case.

Without tune params the kernel will be loaded without any structure.


With each crime and every kindness we birth our future.
ID: 1557802 · Report as offensive
Profile Cliff Harding
Volunteer tester
Avatar

Send message
Joined: 18 Aug 99
Posts: 1432
Credit: 110,967,840
RAC: 67
United States
Message 1557806 - Posted: 16 Aug 2014, 12:52:21 UTC - in response to Message 1557802.  

Can someone explain, in laymen terms, what the -tune option is and how it is supposed to work.


The tune param will define the kernel size of the GPU into junks.
Your GPU has work group size of 256 for example.

So tune 1 64 4 means it will load 64 4 times until 256 is reached.

Possible values are 128 2 or 32 8 or 16 16.
Using unequal number results in lower speed because value of 256 can`t be reached in this case.

Without tune params the kernel will be loaded without any structure.


Thanks Mike, I'll try it. BTW, the corrected fetch values seems to be working. I just disregarded the option typos and changed the values.


I don't buy computers, I build them!!
ID: 1557806 · Report as offensive
Profile Cliff Harding
Volunteer tester
Avatar

Send message
Joined: 18 Aug 99
Posts: 1432
Credit: 110,967,840
RAC: 67
United States
Message 1557811 - Posted: 16 Aug 2014, 13:04:29 UTC
Last modified: 16 Aug 2014, 13:13:12 UTC

I inserted the tune option in the command line file -use_sleep -unroll 10 -ffa_block 12288 -ffa_block_fetch 6144 -tune 1 64 4 and restarted BOINC and immediately lost all of my 138 AP GPU tasks due to computional error.

[edit] I just noticed this in the CUDA Version thread -tune 1 64 4 1. Did the above errors occur because I was missing the last sub parameter?[/edit] If so, I'll reinsert the option and wait patiently for the next AP feeding frenzy.


I don't buy computers, I build them!!
ID: 1557811 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 12 · Next

Message boards : Number crunching : Lunatics Windows Installer v0.42 Release Notes


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.