CUDA Versions

Message boards : Number crunching : CUDA Versions
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · Next

AuthorMessage
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1557569 - Posted: 15 Aug 2014, 22:43:03 UTC

Which host are you talking about ?


With each crime and every kindness we birth our future.
ID: 1557569 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1557570 - Posted: 15 Aug 2014, 22:43:15 UTC - in response to Message 1557568.  

Additionally, I'm unsure how the command switch construct above as inserted in
ap_cmdline_win_x86_SSE2_OpenCL_NV.txt
is engaged. I assume that it finds the proper place (in app_info.xml?) using aimerge.cmd or, as you suggest, with an exit and restart of Boinc.

Neither is necessary. Simply place the command line text you require into the supplied (empty) .txt file, and the application will read and act on it when the next task starts running in the normal course of events. If you are running multiple copies of the application, each will read its own copy of the parameters as each starts its own next task - so don't draw any conclusions from performance measurements until *every* task instance has completed at least one task starting with the new settings.
ID: 1557570 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1557577 - Posted: 15 Aug 2014, 23:05:38 UTC - in response to Message 1557568.  
Last modified: 15 Aug 2014, 23:12:36 UTC

Well, have taken several actions. Moved nv driver to 337.88 (working with drivers always has to be a hassle), now working on Lunatics 0.42, and have dropped the number of wu's per GPU to 4 (vs 5). Per your advice to others, been reading up on parameters as described in ReadMe_AstroPulse_OpenCL_NV.txt and decided, again per your advice, to use your set
-use_sleep -unroll 12 -ffa_block 12288 -ffa_block_fetch 6144 -tune 1 64 4 1
as a starting point. However, I do not yet have a handle on the parameters since I lack an understanding about "kernal call" (to what?) and FFA as in ffa_block_fetch. I'm therefore unable to determine the relevance of a command line switch value change. Perhaps that isn't necessary or will surface with experimentation but it would be useful to know at least the most relevant value to work with.
Additionally, I'm unsure how the command switch construct above as inserted in
ap_cmdline_win_x86_SSE2_OpenCL_NV.txt
is engaged. I assume that it finds the proper place (in app_info.xml?) using aimerge.cmd or, as you suggest, with an exit and restart of Boinc. I may give the latter a try awaiting your response but would appreciate your thoughts on the above - command switch values in your construct and how the construct is engaged.

You just edit the file (with notepath only)with a simple copy & paste sequence and close the file, each time a new AP WU will start the Boinc automaticaly read the file and execute the command. I prefear, to be sure, restart the Boinc itself but that is not realy necessary.

Today i follow a Mike´s sugestion and change a little the parameters in my 780 hosts to:

-use_sleep -unroll 12 -ffa_block 16384 -ffa_block_fetch 8192 -tune 1 64 4 1

until now all is working fine and obtain a little gain in crunching times & video performance, i imagine in your 780Ti you will see similar improvements.

A explanation: i dont´t use the -hp switch since my host is a non dedicated cruncher and i use it for other tasks who don´t allow me to run Boinc at high priority, but you could add that at the end of the command line and gain a little more.

Finally, I had been advised elsewhere to use EVGA PrecisionX 15 to monitor GPU performance but it has been withdrawn due to some plagiarism issues.

Yes you need to use a program to increase your GPU fan speed in order to keep it cooler or their thermal protection will slow down the GPU clock. Most of us uses EVGA Precision to do that, if you can´t find it, PM your e-mail (never put your e-mail on the open forums) and i will send it to you. There are another similar program at the MSI Site called Afterburner, works in the similar way.

How do you measure GPU and CPU performance changes?

You could use EVGA Precision to do that or if you wish you could DL a program called GPU-Z it´s free to and allow you to motinoring all the caracteristics and sendsors on practicaly all avaiable GPU´s (not just NV). This is the link: http://www.techpowerup.com/gpuz/

I assume that wu feeds to CPU must be turned off in order to see changes from the -use_sleep command otherwise the CPU will be running full bore (100%) executing wu's. Alternatively, I suppose I could wait for an average of CPU times once the -use_sleep command is turned on.

Don´t worry for now about that, first be sure the switch (and the command line) is working, to see the go to the Stderr output of a crunched AP WU, in the begining of the file you will see something like this (depends only on the parameter you use):

<core_client_version>7.2.42</core_client_version>
<![CDATA[
<stderr_txt>
Running on device number: 0
Sleep() & wait for event loops will be used in some places
DATA_CHUNK_UNROLL set to:12
FFA thread block override value:16384
FFA thread fetchblock override value:8192
TUNE: kernel 1 now has workgroup size of (64,4,1)

If that don´t apears your command line is not working.

For now check if all is working, when you where sure we will continue to the next phase, find the optimal number of WU crunched at a time for your host.

WIll wait for your return.

<edit> Mike he talks about his 2x780Ti host.
ID: 1557577 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1557590 - Posted: 16 Aug 2014, 0:15:20 UTC - in response to Message 1557577.  

The latest version of precision x was having problems engaging the auto regulation of the fans. I pulled it off mine after temps were spiking and the fans weren't speeding up. Fortunately I had old installer on a computer and reinstalled on all mine.
ID: 1557590 · Report as offensive
qbit
Volunteer tester
Avatar

Send message
Joined: 19 Sep 04
Posts: 630
Credit: 6,868,528
RAC: 0
Austria
Message 1557778 - Posted: 16 Aug 2014, 10:50:26 UTC

Can anybody pls explain to me what the use sleep option does? I thought I give it a try today and it seems to decrease the crunching speed on my system. APs seem to take about 15% longer while the GPU temperature is around 5 degrees lower.

My CPU is rather old and slow, so maybe this option is more useful on faster CPUs?
ID: 1557778 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1557782 - Posted: 16 Aug 2014, 11:07:40 UTC - in response to Message 1557778.  
Last modified: 16 Aug 2014, 11:08:33 UTC

Can anybody pls explain to me what the use sleep option does? I thought I give it a try today and it seems to decrease the crunching speed on my system. APs seem to take about 15% longer while the GPU temperature is around 5 degrees lower.

My CPU is rather old and slow, so maybe this option is more useful on faster CPUs?


Exactly vice versa.

Use_sleep reduces CPU consumption for Nvidia GPU`s.

It just needs to be set up correctly.
I can see you only use ffa_fetch without ffa_fetch_block.
Also using -tune is speeding things up.

Try

-unroll 6 -ffa_block 6144 -ffa_block_fetch 1536 -tune 1 64 4 1 -use_sleep.


With each crime and every kindness we birth our future.
ID: 1557782 · Report as offensive
qbit
Volunteer tester
Avatar

Send message
Joined: 19 Sep 04
Posts: 630
Credit: 6,868,528
RAC: 0
Austria
Message 1557787 - Posted: 16 Aug 2014, 11:20:33 UTC

I will ty this, thx Mike!
ID: 1557787 · Report as offensive
Profile FalconFly
Avatar

Send message
Joined: 5 Oct 99
Posts: 394
Credit: 18,053,892
RAC: 0
Germany
Message 1557918 - Posted: 16 Aug 2014, 19:10:05 UTC
Last modified: 16 Aug 2014, 19:14:48 UTC

Hmm, I stumbled over a nasty surprise this morning :

On this Host I found two AP workunits got stuck overnight, that had been wasting GPU time on the GTX750ti for many hours without making any further progress.

Is is that a known contingency requiring frequent monitoring/caretaking ?

Upon quitting and restarting BOINC, they finished normal with and with expected performance.

Paramaters for the Lun v0.42 App I'm using on that host :

-unroll 12 -ffa_block 8192 -ffa_block_fetch -hp -tune 1 64 4 1 -use_sleep

2 AP were Task running parallel, one using >40% CPU and one using only 0.5% CPU but both definitely stuck.

Anything I could do to prevent that in the future ?
GPUs are very well cooled and GPUs hardly exceed 50deg C.

For now, I've set the Windows power management to high performance profile (basically already was, I ran on balanced profile with every power-saving turned off with the exception of the CPU allowed to clock down if able to)
ID: 1557918 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1557948 - Posted: 16 Aug 2014, 20:52:16 UTC - in response to Message 1557918.  

Hmm, I stumbled over a nasty surprise this morning :

On this Host I found two AP workunits got stuck overnight, that had been wasting GPU time on the GTX750ti for many hours without making any further progress.

Is is that a known contingency requiring frequent monitoring/caretaking ?

Upon quitting and restarting BOINC, they finished normal with and with expected performance.

Paramaters for the Lun v0.42 App I'm using on that host :

-unroll 12 -ffa_block 8192 -ffa_block_fetch -hp -tune 1 64 4 1 -use_sleep

2 AP were Task running parallel, one using >40% CPU and one using only 0.5% CPU but both definitely stuck.

Anything I could do to prevent that in the future ?
GPUs are very well cooled and GPUs hardly exceed 50deg C.

For now, I've set the Windows power management to high performance profile (basically already was, I ran on balanced profile with every power-saving turned off with the exception of the CPU allowed to clock down if able to)


Thats the point.

An AMD CPU should never downclock whilst running astropulses on GPU.


With each crime and every kindness we birth our future.
ID: 1557948 · Report as offensive
Bill Greene
Volunteer tester

Send message
Joined: 3 Jul 99
Posts: 80
Credit: 116,047,529
RAC: 61
United States
Message 1558034 - Posted: 17 Aug 2014, 2:10:18 UTC - in response to Message 1557570.  

Additionally, I'm unsure how the command switch construct above as inserted in
ap_cmdline_win_x86_SSE2_OpenCL_NV.txt
is engaged. I assume that it finds the proper place (in app_info.xml?) using aimerge.cmd or, as you suggest, with an exit and restart of Boinc.

Neither is necessary. Simply place the command line text you require into the supplied (empty) .txt file, and the application will read and act on it when the next task starts running in the normal course of events. If you are running multiple copies of the application, each will read its own copy of the parameters as each starts its own next task - so don't draw any conclusions from performance measurements until *every* task instance has completed at least one task starting with the new settings.


That shortens the change process some lending to more efficient experimentation. Yes, seems though each change deserves averages of an overnight with the variability in results I've noted. Fortunately, and thus far, I've been able to keep to a single instance on each machine but I have some oddball GPU's lying around that I would like to put to work. Your input will be especially useful for that purpose. Thanks for the response.
ID: 1558034 · Report as offensive
Bill Greene
Volunteer tester

Send message
Joined: 3 Jul 99
Posts: 80
Credit: 116,047,529
RAC: 61
United States
Message 1558038 - Posted: 17 Aug 2014, 2:38:54 UTC - in response to Message 1557577.  

Juan - just in case you did not get my reply, Precision executable received and now operating on all nv machines. It revealed the reason that one of the 480's in the dual 480 machine was throttling back - one of the GPU's was hot ... over 100 C. Must get some additional cooling on it somehow.

I have also updated most of my machines to Lun. 0.42 along with updates to 337.88 on nv machines. With minor changes where appropriate, I've installed the command line switches you provided including -use_sleep on all nv equipped machines. With your help, I feel like I've come a long way in just a few days and am now looking forward to evaluating effects. I no longer see display stalls or driver failures on the 780 machine and see on Precision that GPU1 is running about 88% (4 wu's). Not sure how to show GPU2.

Always open to more suggestions but have learned a lot and my confidence is up that I'm not going to trash my system with these changes. Thanks again for your perseverance getting up to speed.
ID: 1558038 · Report as offensive
Bill Greene
Volunteer tester

Send message
Joined: 3 Jul 99
Posts: 80
Credit: 116,047,529
RAC: 61
United States
Message 1558045 - Posted: 17 Aug 2014, 3:08:49 UTC - in response to Message 1558038.  

Anyone know of a way to put on-board GPU's to work when there exists bus installed GPU's? Eligible on-board GPU's are recognized and it seems a waste that they are setting there idle.
ID: 1558045 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1558054 - Posted: 17 Aug 2014, 3:51:21 UTC - in response to Message 1558045.  

At the bottom of precision x you will see the precision log with 3 displays. If you double click anywhere on it, it will open up a new window with multiple stats to view. You can run the built in GPU on your chip, but from what I've heard it's not worth it if you have GPUs via the PCIe. Using it tends to slow down the dedicated GPUs from what I've read. Most that have dedicated GPUs don't tend to use it. If you search the threads you will find what I talk about. If that's the only GPU you have them it's fine to use it, others I'd suggest you concentrate on the dedicated GPUs. My 2 cents



Zalster
ID: 1558054 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1558070 - Posted: 17 Aug 2014, 5:25:06 UTC - in response to Message 1558038.  

one of the GPU's was hot ... over 100 C. Must get some additional cooling on it somehow.

At that temperature I'd say it's not getting any cooling.
Probably be like my old GTX 560Ti, fans ended up seizing up. Sleave bearings are rubbish.
Grant
Darwin NT
ID: 1558070 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1558071 - Posted: 17 Aug 2014, 5:33:51 UTC - in response to Message 1558045.  
Last modified: 17 Aug 2014, 6:15:48 UTC

Anyone know of a way to put on-board GPU's to work when there exists bus installed GPU's? Eligible on-board GPU's are recognized and it seems a waste that they are setting there idle.

Hi

SOrry for the delay, i was sleeping, you know time zones and shifts are our enemy.

About the Precision i get the msg, i´m happy i was able to give you a hand on that, just one tip be sure to enable it´s automatic fans control and try to run your GPU´s fans in the range of 80%. Thats will make them lasting for a long time. Each GPU has his max temp, but 100C is not a safe temp for sure, i set my max temp at 75C in the case of the 780 for example.

I never try to use the iGPU´s, but i imagine is not a good ideia. Why? iGPUs normaly makes the CPU produce a lot of heat, and a lot of heat is bad, it could trotle you CPU clock, or forces your CPU fan to the limit. IMHO the troubles are bigger than a possible gain, your 780Ti has a lot larger crunching capacity than the iGPU, so focus on their optimization and leave the iGPU aside, at least for now.

I see your allready crunched AP Wu (on the 780Ti host at least) and seems like your configuration is OK and the -use_sllep is working fine, look the diference on the CPU times.

I belive you are ready to the next phase, find the best WU/time could we continue?
ID: 1558071 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1558096 - Posted: 17 Aug 2014, 7:36:11 UTC - in response to Message 1557778.  
Last modified: 17 Aug 2014, 7:43:22 UTC

Can anybody pls explain to me what the use sleep option does?


После помещения в очередь нескольких вызовов ГПУ-ядер вызывает Sleep(), отдавая ЦПУ системе для других нужд.
-use_sleep простейший способ немного освободить процессор.
Для любителей поэкспериментировать я добавил другие опции:

Из ReadMe:

-initial_ffa_sleep N M: In PC-FFA will sleep N ms for short and M ms for large one before looking for results. Can decrease CPU usage.
Affects performance. Experimentation required for particular CPU/GPU/GPU driver combo. N and M should be integer non-negative numbers.
Approximation of useful values can be received via running app with -v 2 and -use_sleep switches enabled and analyzing stderr.txt log file.

-initial_single_pulse_sleep N : In SingleFind search will sleep N ms before looking for results. Can decrease CPU usage.
Affects performance. Experimentation required for particular CPU/GPU/GPU driver combo. N should be integer positive number.
Approximation of useful values can be received via running app with -v 2 and -use_sleep switches enabled and analyzing stderr.txt log file.


Эти опции позволяют подобрать время, на которое программа отдает ЦПУ. Это время должно быть примерно равно времени, которое требуется ГПУ для завершения работы. Тогда освобождение ЦПУ не будет приводить к существенному падению общей производительности.

Особенно актуально для самых быстрых (где требуется дополнительное укрупнение ГПУ-ядер путем дополнительных опций) и самых медленных, где ГПУ не успевает закончить обработку за временной интервал, устанавливаемый через -use_sleep.

P.S.
Обсуждаемые опции применимы к АстроПульсу, который написан на OpenCL, не на CUDA. Поэтому заголовок обсуждения может несколько запутать читателей в этой части.
ID: 1558096 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1558115 - Posted: 17 Aug 2014, 7:59:37 UTC - in response to Message 1558096.  
Last modified: 17 Aug 2014, 8:00:33 UTC

Еще один способ избежать неполной загрузки ЦПУ - использование -cpu_lock опции.
Для ATi эта опция позволяет использовать 100% загрузку ЦПУ другими приложениями, приводя лишь к небольшой потере производительности ГПУ (по крайней мере, в некоторых конфигурациях). Для nVidia (насколько мне известно) эта опция не тестировалась в полной мере. Тем не менее, она доступна для использования во всех OpenCL АстроПульсах, включая iGPU.

Для быстрых устройств она должна быть дополнена указанием, сколько экземпляров программы выполняется одновременно (для правильного распределения процессов по имеющимся в системе логическим процессорам).

Из ReadMe:
-cpu_lock : Enables CPUlock feature. Results in CPUs number limitation for particular app instance. Also attempt to bind different instances to different CPU cores will be made.
Can be used to increase performance under some specific conditions. Can decrease performance in other cases though. Experimentation required.
Now this option allows GPU app to use only single logical CPU.
Different instances will use different CPUs as long as there is enough of CPU in the system.
To use CPUlock in round-robin mode GPUlock feature will be enabled. Use -instances_per_device N option if few instances per GPU device are needed.

-cpu_lock_fixed_cpu N : Will enable CPUlock too but will bind all app instances to the same N-th CPU (N=0,1,.., number of CPUs-1).

-instances_per_device N :Sets allowed number of simultaneously executed GPU app instances per GPU device (shared with MultiBeam app instances).
N - integer number of allowed instances.

ID: 1558115 · Report as offensive
qbit
Volunteer tester
Avatar

Send message
Joined: 19 Sep 04
Posts: 630
Credit: 6,868,528
RAC: 0
Austria
Message 1558117 - Posted: 17 Aug 2014, 8:01:19 UTC - in response to Message 1557782.  

Can anybody pls explain to me what the use sleep option does? I thought I give it a try today and it seems to decrease the crunching speed on my system. APs seem to take about 15% longer while the GPU temperature is around 5 degrees lower.

My CPU is rather old and slow, so maybe this option is more useful on faster CPUs?


Exactly vice versa.

Use_sleep reduces CPU consumption for Nvidia GPU`s.

It just needs to be set up correctly.
I can see you only use ffa_fetch without ffa_fetch_block.
Also using -tune is speeding things up.

Try

-unroll 6 -ffa_block 6144 -ffa_block_fetch 1536 -tune 1 64 4 1 -use_sleep.

OK, I gave those settings a try but crunching is still slower with "use sleep" option then without. Here's an example:

This is with "use sleep": http://setiathome.berkeley.edu/result.php?resultid=3678613272
This is without "use sleep": http://setiathome.berkeley.edu/result.php?resultid=3678619850

But, as I said, the computer is running more quiet and more cool when using the "use sleep" option. GPU temperature is around 57 degrees, without the opotion it is around 62 degrees.



Can anybody pls explain to me what the use sleep option does?


После помещения в очередь нескольких вызовов ГПУ-ядер вызывает Sleep(), отдавая ЦПУ системе для других нужд.
-use_sleep простейший способ немного освободить процессор.
Для любителей поэкспериментировать я добавил другие опции:

Из ReadMe:

-initial_ffa_sleep N M: In PC-FFA will sleep N ms for short and M ms for large one before looking for results. Can decrease CPU usage.
Affects performance. Experimentation required for particular CPU/GPU/GPU driver combo. N and M should be integer non-negative numbers.
Approximation of useful values can be received via running app with -v 2 and -use_sleep switches enabled and analyzing stderr.txt log file.

-initial_single_pulse_sleep N : In SingleFind search will sleep N ms before looking for results. Can decrease CPU usage.
Affects performance. Experimentation required for particular CPU/GPU/GPU driver combo. N should be integer positive number.
Approximation of useful values can be received via running app with -v 2 and -use_sleep switches enabled and analyzing stderr.txt log file.


Эти опции позволяют подобрать время, на которое программа отдает ЦПУ. Это время должно быть примерно равно времени, которое требуется ГПУ для завершения работы. Тогда освобождение ЦПУ не будет приводить к существенному падению общей производительности.

Особенно актуально для самых быстрых (где требуется дополнительное укрупнение ГПУ-ядер путем дополнительных опций) и самых медленных, где ГПУ не успевает закончить обработку за временной интервал, устанавливаемый через -use_sleep.

P.S.
Обсуждаемые опции применимы к АстроПульсу, который написан на OpenCL, не на CUDA. Поэтому заголовок обсуждения может несколько запутать читателей в этой части.

Thx Raistmer, but I'm afaraid I don't understand this language. I tried to translate with google and it looks like you recommend me to test those other options instead of "use sleep"?
ID: 1558117 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1558122 - Posted: 17 Aug 2014, 8:08:25 UTC - in response to Message 1558117.  

Thx Raistmer, but I'm afaraid I don't understand this language. I tried to translate with google and it looks like you recommend me to test those other options instead of "use sleep"?


"This language" is Russian. Worth to be acquainted with at least if not learn.
And Sheldon could ask Volovitz - he knows this language well ;) :D

Yep, those options allow more fine tuning sleeping time.
ID: 1558122 · Report as offensive
qbit
Volunteer tester
Avatar

Send message
Joined: 19 Sep 04
Posts: 630
Credit: 6,868,528
RAC: 0
Austria
Message 1558127 - Posted: 17 Aug 2014, 8:22:26 UTC

Ah, ok. I'm sure russian is a nice language but I'm also sure it's rather hard to learn. I can't even read the characters. And Howard is busy with Bernadette once again ;-)

Could you recommend a commandline with those options for my card? I'm new to this whole Cuda/OpenCL thing so it's not easy for me to figure out the best values on my own now.
ID: 1558127 · Report as offensive
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · Next

Message boards : Number crunching : CUDA Versions


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.