Message boards :
Number crunching :
Lunatics Windows Installer v0.42 Release Notes
Message board moderation
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 12 · Next
Author | Message |
---|---|
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 ![]() ![]() |
From app's readme: Advanced level options (some app code reading and understanding of algorithms used is recommended before use, not fool-proof even in same degree as |
Darrell Wilcox ![]() Send message Joined: 11 Nov 99 Posts: 303 Credit: 180,954,940 RAC: 118 ![]() ![]() |
The tune param will define the kernel size of the GPU into chunks. Mike, please explain HOW you know his GPU has a work group size of 256. How can someone determine their own GPU parameters? |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 ![]() ![]() |
The tune param will define the kernel size of the GPU into chunks. Look into stderr output of GPU task. Our apps list WG sizes for all GPUs in system and WG for used GPU too. EDIT: example from one of your own hosts: Number of OpenCL platforms: 2 |
Darrell Wilcox ![]() Send message Joined: 11 Nov 99 Posts: 303 Credit: 180,954,940 RAC: 118 ![]() ![]() |
Thanks for the quick reply and the information! |
![]() ![]() ![]() Send message Joined: 17 Feb 01 Posts: 34464 Credit: 79,922,639 RAC: 80 ![]() ![]() |
I inserted the tune option in the command line file -use_sleep -unroll 10 -ffa_block 12288 -ffa_block_fetch 6144 -tune 1 64 4 and restarted BOINC and immediately lost all of my 138 AP GPU tasks due to computional error. I dont think you lost your cache because of the param edit. Something else must have happened. With each crime and every kindness we birth our future. |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
I inserted the tune option in the command line file -use_sleep -unroll 10 -ffa_block 12288 -ffa_block_fetch 6144 -tune 1 64 4 and restarted BOINC and immediately lost all of my 138 AP GPU tasks due to computional error. stderr_txt makes it look very much as if he did. <message> Processed four parameters correctly without complaint, and barfed on the fifth. |
![]() ![]() Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 ![]() ![]() |
-tune N Mx My Mz It is not 100% clear to me but is work group size calculated by Mx * My * Mz or is Mz for another use? It would make sense to me if -tune 1 8 8 8 = WG size 512. Setting value for N sets the function to use fetch vs compare? This would be more "Try each and see if one is better for your hardware"? SETI@home classic workunits: 93,865 CPU time: 863,447 hours ![]() |
Ulrich Metzner ![]() Send message Joined: 3 Jul 02 Posts: 1256 Credit: 13,565,513 RAC: 13 ![]() ![]() |
Question regarding workgroup size: From the stdout of my GPU apps the maximum workgroup size of my GT 640 and GT 430 is 1024, but Mike always suggests 256 for my setup, for example 1,128,2,1 1,64,4,1 1,32,8,1 or 1,16,16,1 Would i gain performance when using workgroup size 1024 instead? For example 1,512,2,1 1,256,4,1 1,128,8,1 1,64,16,1 or 1,32,32,1 Regards, Uli Aloha, Uli |
![]() ![]() ![]() Send message Joined: 17 Feb 01 Posts: 34464 Credit: 79,922,639 RAC: 80 ![]() ![]() |
I inserted the tune option in the command line file -use_sleep -unroll 10 -ffa_block 12288 -ffa_block_fetch 6144 -tune 1 64 4 and restarted BOINC and immediately lost all of my 138 AP GPU tasks due to computional error. Thats weird indeed. With each crime and every kindness we birth our future. |
![]() ![]() ![]() Send message Joined: 17 Feb 01 Posts: 34464 Credit: 79,922,639 RAC: 80 ![]() ![]() |
Question regarding workgroup size: I cant test it because my card only has WG 256. Like the readme says testing required. I would run some off line benches to make sure. With each crime and every kindness we birth our future. |
![]() ![]() Send message Joined: 18 Aug 99 Posts: 1432 Credit: 110,967,840 RAC: 67 ![]() ![]() |
I want to thank everyone for assisting me in understanding and applying the -tune option to the AP cmdline file. I did correct the argument for the -tune option and I've learned a very important lesson when copy & paste is involved for suggestions - copy & paste, but verify before applying. I do have 5 AP tasks in the queue, but were not affected by the error because of their deadline, so we will wait and see what happens. Note to all developers - Attempt to ensure that parameters, options, arguments, etc. are described in more laymen terms so that those of us who are not proficient is C++ or any other PC language, who still want to attempt more advanced options, can understand them. Does not have to be a tutorial, but should be enough to comprehend. For example - Advanced level options (some app code reading and understanding of algorithms used is recommended before use, not fool-proof even in same degree as I would probably win a bet that most of us have not heard of the profiler mentioned above, but would understand better if the following were inserted in the text - The tune param will define the kernel size of the GPU into junks. Or even mentioning something from the following would help - The tune param will define the kernel size of the GPU into chunks. I still don't know what the profiler is or how to apply it. ![]() ![]() I don't buy computers, I build them!! |
Ulrich Metzner ![]() Send message Joined: 3 Jul 02 Posts: 1256 Credit: 13,565,513 RAC: 13 ![]() ![]() |
Well, a profiler will "profile" an application while running, logging the times spent in different parts of the program, so allowing to identify the parts, in which most of the runtime is spent. These are the spots, where optimization is most effective. Also you can make different profiles for different sets of command line parameters, identifying the set, gaining the most performance boost. BTW: I am using now the following parameter set, which seems to be the best for my setup running 2 instances of r2399 per card: -use_sleep -unroll 4 -ffa_block 2048 -ffa_block_fetch 1024 -tune 1 128 8 1 So the GT430 is maxed out and the GT640 has a little headroom for other tasks. Aloha, Uli |
![]() ![]() Send message Joined: 18 Aug 99 Posts: 1432 Credit: 110,967,840 RAC: 67 ![]() ![]() |
Well, a profiler will "profile" an application while running, logging the times spent in different parts of the program, so allowing to identify the parts, in which most of the runtime is spent. These are the spots, where optimization is most effective. Also you can make different profiles for different sets of command line parameters, identifying the set, gaining the most performance boost. Thanks Ulrich, and where can I find such an animal? I would like to see what else it can do. ![]() ![]() I don't buy computers, I build them!! |
Ulrich Metzner ![]() Send message Joined: 3 Jul 02 Posts: 1256 Credit: 13,565,513 RAC: 13 ![]() ![]() |
Thanks Ulrich, and where can I find such an animal? I would like to see what else it can do. A profiler is normally included into development environments, for example Microsoft Visual Studio. Unfortunately it is not included in the "free" versions, aka "Express Edition", it is only included in the "professional" or "ultimate" editions. Also the usage is not trivial and most effective on so called debug builds. For real use you need developer skills and the source code of the application under test. A simple form of profiling would be to run the same AP test wu under the same conditions with only changing parameter sets and recording the total time. Sorry, but i don't know of any profiling application, which runs standalone and has an user interface suitable for laymens. :/ Aloha, Uli |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 ![]() ![]() |
-tune N Mx My Mz As stated in ReadMe I classified this param as "advanced". That is, better to consult with app code before usage. Yes, 8x8x8 would give total WG size of 512 (not supported by ATi GPUs) but even 8x8x4 has no meaning for 2D kernel. Not all combos are possible for supported kernels. PErhas more kernels will be supported in future, but again, one need to understand where most load goes.. For now it's fetch kernel then FFT ones. Hence there is no much sense to "optimize" smth that takes only fraction of % of total execution time. Version included in installer has 1D fetch kernel hance Mz should be 1 and My should be 1 for this type of kernel. Prev builds that support this key and APv7 include 2D fetch kernel, there Mz should be 1, MxxMy give workgroup size with that kernel will be called. |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 ![]() ![]() |
I want to thank everyone for assisting me in understanding and applying the -tune option to the AP cmdline file. I did correct the argument for the -tune option and I've learned a very important lesson when copy & paste is involved for suggestions - copy & paste, but verify before applying. I do have 5 AP tasks in the queue, but were not affected by the error because of their deadline, so we will wait and see what happens. Sorry, I would better try to code, I'm not a writer ;) And to have smth new to learn - what better could be for Homo Sapiens specie? ;) (joke). Actually, if feel uncomfortable with description and ESPECIALLY if see advanced warning given - better to do as manual describe... or just leave it at default. And report bug if default fails. General note regarding advanced options: I think better to have some tool for tweak than to have hardwired values and no way to change besides to write own code, right? But with tool comes responsibility for usage that tool. Yes, better fool-proof can and perhaps will be added. And this will cost time for coding new options/optimizations. Make your choice wisely. |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 ![]() ![]() |
CodeXL for ATi, Nsight for NV. Still have to find free good profiler for iGPU though. EDIT: not for laymans perhaps though. But then продвинутые опции comes in play. Perhaps I need to write in own native language instead of english next time, to make it more clear if english "advanced" word not comprehendable :-/ |
![]() ![]() Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 ![]() ![]() |
I think instead of "advanced" it is more "developer" option. Unless you are developer, or reading the code like one, you probably do not understand what the option is actually doing. SETI@home classic workunits: 93,865 CPU time: 863,447 hours ![]() |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 ![]() ![]() |
Ðет проблем, иÑправлю на "Ð´Ð»Ñ Ñ€Ð°Ð·Ñ€Ð°Ð±Ð¾Ñ‚Ñ‡Ð¸ÐºÐ¾Ð²" |
![]() ![]() Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 ![]() ![]() |
Yes, exactly! SETI@home classic workunits: 93,865 CPU time: 863,447 hours ![]() |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.