GBT ('guppi') .vlar tasks will be send to GPUs, what you think about this?

Author	Message
Sutaru Tsureku Volunteer tester Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5	Message 1784929 - Posted: 5 May 2016, 10:35:08 UTC Here in the SETI-Beta thread I wrote what I think about this. I thought a larger discussion (here) would be helpful. I made my decision/statement... Either... an option in the project prefs for to un-/check 'GBT ('guppi') .vlar tasks to GPU' (so each member can decide to reduce or preserve the performance of the PC), or I use a tool for to send GBT ('guppi') .vlar tasks from GPU to CPU (and screw up CreditNew), or I search for a new primary project, or I switch off my PCs. Which opinion you have about the title/topic of this thread? Thanks. ID: 1784929 ·

Mr. Kevvy Volunteer moderator Volunteer tester Send message Joined: 15 May 99 Posts: 3776 Credit: 1,114,826,392 RAC: 3,319	Message 1784930 - Posted: 5 May 2016, 11:10:19 UTC Last modified: 5 May 2016, 11:12:57 UTC Repost of what I wrote here: I proposed in the Breakthrough Listen News thread that a checkbox be added to our project preferences akin to "Allow VLAR work on GPU" with an appropriate caveat ie if run on CUDA may complete slowly for less credit, cause machine slowness/lockup or work unit failure, etc. The box would be initially off, so no one would get them and risk issue(s) unless they chose to do so. Einstein@Home has similar controls with similar caveats in their project preferences page for setting the Count parameter to allow multiple concurrent work units on GPU (which has the same risks of instability plus added bonus of hardware failure risk if the GPU overheats) and they seem to do just fine with the disclaimer, so no reason it wouldn't work here. Volunteers with mixed work and dedicated cruncher machines could set home/work/school profiles and put the machine types in them so computers they actually use wouldn't be slowed. All I can suggest is that if you would like this to be implemented to post in that thread. If enough of us ask, it may be. If there is a control for it, from a support and user-friendliness standpoint, it should definitely be implemented as an opt-in rather than an opt-out. ID: 1784930 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1784947 - Posted: 5 May 2016, 13:46:43 UTC AFAIK there is no technical possibility currently to implement VLAR as "checkbox". So no opt-in or opt-out possible. With flow of GBT data there can be situations when GPU has no SETI work at all for prolonged times. It's definitely waste of resources for project. So, the local solution could be to ensure availability of non-VLAR work for GPU or enabling VLAR for GPU. I'm not sure that VLAR/non-VLAR mix can be ensured in unattended mode and to require man-power for this is a waste even more precious project resource. Third solution could be to arbitrary increase "payment" in credits for VLAR. So all those who watch RAC too closely will be pleased. Also, this action would emphasize real importance of dedicated observations that VLAR constitutes. Unfortunately, this solution hardly will go due to BOINC vs SETI cobblestone politics (any other project could do that with easy IMO). So, enabling VLAR for fast GPUs that can do it is the best real project-wise solution IMO. ID: 1784947 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34255 Credit: 79,922,639 RAC: 80	Message 1784971 - Posted: 5 May 2016, 15:48:52 UTC Last modified: 5 May 2016, 15:49:39 UTC AFAIK there is no technical possibility currently to implement VLAR as "checkbox". So no opt-in or opt-out possible. But GBT tasks in particular. Would be much easier IMO. With each crime and every kindness we birth our future. ID: 1784971 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1785019 - Posted: 5 May 2016, 19:22:13 UTC - in response to Message 1784971. AFAIK there is no technical possibility currently to implement VLAR as "checkbox". So no opt-in or opt-out possible. But GBT tasks in particular. Would be much easier IMO. They go not as GBT but as usual MB v8 tasks so no. We see some difference in server statistics representation but not in client part. in client it's still same *_v8 tasks. ID: 1785019 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34255 Credit: 79,922,639 RAC: 80	Message 1785029 - Posted: 5 May 2016, 20:17:12 UTC - in response to Message 1785019. AFAIK there is no technical possibility currently to implement VLAR as "checkbox". So no opt-in or opt-out possible. But GBT tasks in particular. Would be much easier IMO. They go not as GBT but as usual MB v8 tasks so no. We see some difference in server statistics representation but not in client part. in client it's still same *_v8 tasks. Thanks for clarification. With each crime and every kindness we birth our future. ID: 1785029 ·

Mr. Kevvy Volunteer moderator Volunteer tester Send message Joined: 15 May 99 Posts: 3776 Credit: 1,114,826,392 RAC: 3,319	Message 1785064 - Posted: 5 May 2016, 22:51:02 UTC - in response to Message 1784947. AFAIK there is no technical possibility currently to implement VLAR as "checkbox". So no opt-in or opt-out possible. Is there any detailed reason why it wouldn't be possible? ID: 1785064 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1785069 - Posted: 5 May 2016, 23:07:58 UTC - in response to Message 1785064. AFAIK there is no technical possibility currently to implement VLAR as "checkbox". So no opt-in or opt-out possible. Is there any detailed reason why it wouldn't be possible? No hard, technical, reason that I know of, but I can think of a couple of reasons which have come up in the past, and would be relevant again. 1) Time, manpower, priorities - other work to do. It would involve writing new code from scratch, and people need to be strongly motivated before they mess with a working (from their point of view) system. 2) Maintainability. There is a standard BOINC server component called the scheduler. This proposal would require making a separate, "special for SETI", tweaked version. And then remembering to re-apply the changes if the underlying, standard, code needs to be changed in the future. ID: 1785069 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1785118 - Posted: 6 May 2016, 5:13:30 UTC - in response to Message 1785069. Last modified: 6 May 2016, 5:13:52 UTC AFAIK there is no technical possibility currently to implement VLAR as "checkbox". So no opt-in or opt-out possible. Is there any detailed reason why it wouldn't be possible? No hard, technical, reason that I know of, but I can think of a couple of reasons which have come up in the past, and would be relevant again. 1) Time, manpower, priorities - other work to do. It would involve writing new code from scratch, and people need to be strongly motivated before they mess with a working (from their point of view) system. 2) Maintainability. There is a standard BOINC server component called the scheduler. This proposal would require making a separate, "special for SETI", tweaked version. And then remembering to re-apply the changes if the underlying, standard, code needs to be changed in the future. Yep. All this could be shortened in "no thechnical possibility" with added "staying in boundaries of current BOINC infrastructure implementation". ID: 1785118 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1785121 - Posted: 6 May 2016, 5:34:24 UTC Last modified: 6 May 2016, 5:36:50 UTC Instead of turning this thread in another point of rant I would propose for high-end GPU cards owners to more deeply explore quite a big parameter space of current OpenCL app and report back options that could speedup VLAR processing. Such set of params can be made new defaults for high-end devices. If we could decrease performance drop on VLAR this would be most appropriate solution to thread topic issue. Taking in consideration that VLAR has increased share of PulseFind search on longest time arrays (lower FFTs) there is a lack of parallelization that cause performance drop. That's why high-end devices with many CUs see big drop while low/mid-range devices owners don't see so big slowdown. So I would start tuning with these parameters: -period_iterations_num N : Splits single PulseFind kernel call to N calls for longest PulseFind calls. Can be used to reduce GUI lags or to prevent driver restarts. Can affect performance. Experimentation required. Default value for v6/v7/v8 task is N=20. N should be positive integer. -pref_wg_size N : Sets preferred workgroup size for Pulsefind kernels. Should be multiple of wave size (32 for nVidia, 64 for ATi) for better performance and doesn't exceed maximal possible WG size for particular device (256 for ATi and Intel, less than 2048 for NV, depending on CC of device). -pref_wg_num_per_cu N : Sets preferred number of workgroups per compute unit. Currently used only in PulseFind kernels. -sbs N :Sets maximum single buffer size for GPU memory allocations. N should be positive integer and means bigger size in Mbytes. Can affect performance and total memory requirements for application to run. Experimentation required. (and very first attempt should be to add -sbs 512 to tuning line). Also, new generations of GPU have ability to execute in parallell workgroups from different kernels. Not clear how this ability scales to workgroups from different kernels launched in different contexts/processes though. But worth to try to run few VLAR tasks per device to see if this will reduce VLAR performance drop. ID: 1785121 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1785123 - Posted: 6 May 2016, 5:49:09 UTC - in response to Message 1785121. Instead of turning this thread in another point of rant I would propose for high-end GPU cards owners to more deeply explore quite a big parameter space of current OpenCL app and report back options that could speedup VLAR processing. I'm sure this is quite a good suggestion .... however, aren't you referring to parameters that are useful on a BETA application??? Of what use do we have here on Main with Nvidia cards?? I would be most amenable to trying to run VLAR's on my Nvidia cards IF we had the Beta OpenCL app here on Main and had a fully implemented Lunatics installer. When is that happening?? Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1785123 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1785126 - Posted: 6 May 2016, 6:15:27 UTC - in response to Message 1785123. Instead of turning this thread in another point of rant I would propose for high-end GPU cards owners to more deeply explore quite a big parameter space of current OpenCL app and report back options that could speedup VLAR processing. I'm sure this is quite a good suggestion .... however, aren't you referring to parameters that are useful on a BETA application??? Of what use do we have here on Main with Nvidia cards?? I would be most amenable to trying to run VLAR's on my Nvidia cards IF we had the Beta OpenCL app here on Main and had a fully implemented Lunatics installer. When is that happening?? Hardly any change in scheduling policy on main will be done _before_ release of current beta apps (actually they should be long ago released IMO, but some circumstances prevented to do it in time). Nevertheless all binaries awailable as separate packs from my cloud storage space: https://cloud.mail.ru/public/DMkN/x4BRCYuAV Regarding when they will be added to Lunatics installer - I'll leave this question to Richard for answer. ID: 1785126 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1785148 - Posted: 6 May 2016, 7:35:29 UTC - in response to Message 1785126. Thanks for the link to the apps, Raistmer. I have been trying to follow along with the gurus on Beta about these new OpenCL and SoG apps, but it is mostly just going over my head about how to implement them and even which app is the most appropriate to run. By looking at the file names, it seems both are OpenCL yet different executables with the same plan_class. Huh?? I know just enough about how to edit my app_info files to guarantee dumping all my current tasks. I've done it too many times. Thus, without a proven Lunatics installer, I don't believe I'm competent to try using your linked files. I looked at both apps .aistub file and and wonder how both apps use the same SoG plan_class. Which app is the correct one to install to run VLAR's on GTX970's? Not brave enough today and I think I will wait till the apps officially make it to Main, the scheduler code gets updated to handle them and Richard publishes the next, latest and greatest Lunatics installer. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1785148 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304	Message 1785154 - Posted: 6 May 2016, 7:56:06 UTC - in response to Message 1785148. Isn't the determination of what is or isn't a VLAR done during splitting? Would it be possible (without major work) to change the threshold value for what constitutes a VLAR WU, based on the data source? ie- original data, retains current threshold value. GBT data has a different threshold value, so less of the current WUs being split are branded as VLAR. Grant Darwin NT ID: 1785154 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22186 Credit: 416,307,556 RAC: 380	Message 1785161 - Posted: 6 May 2016, 8:40:48 UTC Sounds a good idea, but the majority of the guppi being split are down in the 0.01 region. This is down the the GBT being used to look intensively at one point in space, rather than scanning a region. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1785161 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34255 Credit: 79,922,639 RAC: 80	Message 1785182 - Posted: 6 May 2016, 12:58:50 UTC - in response to Message 1785148. Thanks for the link to the apps, Raistmer. I have been trying to follow along with the gurus on Beta about these new OpenCL and SoG apps, but it is mostly just going over my head about how to implement them and even which app is the most appropriate to run. By looking at the file names, it seems both are OpenCL yet different executables with the same plan_class. Huh?? I know just enough about how to edit my app_info files to guarantee dumping all my current tasks. I've done it too many times. Thus, without a proven Lunatics installer, I don't believe I'm competent to try using your linked files. I looked at both apps .aistub file and and wonder how both apps use the same SoG plan_class. Which app is the correct one to install to run VLAR's on GTX970's? Not brave enough today and I think I will wait till the apps officially make it to Main, the scheduler code gets updated to handle them and Richard publishes the next, latest and greatest Lunatics installer. IÂ´m just a co of the Installer crew but i dont think we will redo the installer in the near future. Of course its up to Richard. The params can be added in the comandline text file which is included in each package. I would guide you through it if you want to. With each crime and every kindness we birth our future. ID: 1785182 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1785220 - Posted: 6 May 2016, 15:43:03 UTC - in response to Message 1785182. Mike, thanks for your gracious help offering. First question answered please. Can you use the beta apps here on Main or just in the Beta project? IOW, does the Main scheduler know about these apps to send tasks to correctly? Second question, do you really need BOTH OpenCL apps to run VLAR's? Another question is whether both of the apps use the same "opencl_nvidia_SoG" plan_class as they have the same entries in their aistub files? How does that work with the scheduler? Is there a preference about which app to run for VLARs on GTX970's or does it really matter? Is the performance the same with either app? Can you post the best suggested parameters for VLAR's for the MB_command_line files. Thanks in advance. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1785220 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34255 Credit: 79,922,639 RAC: 80	Message 1785297 - Posted: 6 May 2016, 21:23:13 UTC Last modified: 6 May 2016, 21:29:14 UTC Yes, you can use the "beta" apps here as i do. We just use them at beta first to see if they work as expected. The scheduler don`t care which app is in use so long it knows the plan class. No, of course you only need one OpenCL app to run VLARs. For Nvidias it looks like SoG is the better option. Since one only usues one of the apps the plan class can be the same. You can add the following comand line params to the cmdline.txt file. -sbs 512 -period_iterations_num 80 -spike_fft_thresh 2048 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 32 -oclfft_tune_cw 32 Just make sure the first character is a space. Just for Info. No VLARS beeing sent to GPU at this time. With each crime and every kindness we birth our future. ID: 1785297 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1785458 - Posted: 7 May 2016, 4:39:30 UTC - in response to Message 1785297. Ok, giving Raistmer's idea about shifting the CPU work to the GPUs. The app_info.xml he gave will work but with a caveat. I am unable to limit the amount of work distributed to the different GPUs. It will load as many work units as CPU cores you have. The only difference was, it was trying to restrict how much CPU was being used. Example, I had it set for 2 work units per GPU, 4 GPUs so I expected 8 total. Unfortunately 16 loaded on the the cards, 8 with full CPU support and 8 with about 10% of a core. I tried changing the value in his command line, if I increased to 3 per GPU and 12 total then of the 16 loaded 12 now had full CPU support and 4 were down around 10% CPU. Unfortunately, it got very sluggish and eventually locked up requiring a forced quit. I also tried limiting the number of instances in the app_config.xml for SoG plan class but it didn't do anything. What finally worked was placing a <project_max_concurrent > to 8 and that pushed 8 to waiting to run and leaving the original 8 running with full core support. Time to complete a GUPPI vlar is 21 minutes for 1 VLAR/GPU 41 minutes for 2 VLAR/GPU VLAR on CPU is around 53 min-1 hr. ID: 1785458 ·

AMDave Volunteer tester Send message Joined: 9 Mar 01 Posts: 234 Credit: 11,671,730 RAC: 0	Message 1785497 - Posted: 7 May 2016, 14:15:39 UTC - in response to Message 1785458. What finally worked was placing a <project_max_concurrent > to 8 and that pushed 8 to waiting to run and leaving the original 8 running with full core support. Time to complete a GUPPI vlar is 21 minutes for 1 VLAR/GPU 41 minutes for 2 VLAR/GPU VLAR on CPU is around 53 min-1 hr. What is the % of GPU usage for 1 VLAR/GPU? for 2 VLAR/GPU? ID: 1785497 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.