Message boards :
Number crunching :
GBT ('guppi') .vlar tasks will be send to GPUs, what you think about this?
Message board moderation
Author | Message |
---|---|
Dirk Sadowski Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
Here in the SETI-Beta thread I wrote what I think about this. I thought a larger discussion (here) would be helpful. I made my decision/statement... Either...
|
Mr. Kevvy Send message Joined: 15 May 99 Posts: 3797 Credit: 1,114,826,392 RAC: 3,319 |
Repost of what I wrote here: I proposed in the Breakthrough Listen News thread that a checkbox be added to our project preferences akin to "Allow VLAR work on GPU" with an appropriate caveat ie if run on CUDA may complete slowly for less credit, cause machine slowness/lockup or work unit failure, etc. If there is a control for it, from a support and user-friendliness standpoint, it should definitely be implemented as an opt-in rather than an opt-out. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
AFAIK there is no technical possibility currently to implement VLAR as "checkbox". So no opt-in or opt-out possible. With flow of GBT data there can be situations when GPU has no SETI work at all for prolonged times. It's definitely waste of resources for project. So, the local solution could be to ensure availability of non-VLAR work for GPU or enabling VLAR for GPU. I'm not sure that VLAR/non-VLAR mix can be ensured in unattended mode and to require man-power for this is a waste even more precious project resource. Third solution could be to arbitrary increase "payment" in credits for VLAR. So all those who watch RAC too closely will be pleased. Also, this action would emphasize real importance of dedicated observations that VLAR constitutes. Unfortunately, this solution hardly will go due to BOINC vs SETI cobblestone politics (any other project could do that with easy IMO). So, enabling VLAR for fast GPUs that can do it is the best real project-wise solution IMO. |
Mike Send message Joined: 17 Feb 01 Posts: 34350 Credit: 79,922,639 RAC: 80 |
AFAIK there is no technical possibility currently to implement VLAR as "checkbox". So no opt-in or opt-out possible. But GBT tasks in particular. Would be much easier IMO. With each crime and every kindness we birth our future. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
AFAIK there is no technical possibility currently to implement VLAR as "checkbox". So no opt-in or opt-out possible. They go not as GBT but as usual MB v8 tasks so no. We see some difference in server statistics representation but not in client part. in client it's still same *_v8 tasks. |
Mike Send message Joined: 17 Feb 01 Posts: 34350 Credit: 79,922,639 RAC: 80 |
AFAIK there is no technical possibility currently to implement VLAR as "checkbox". So no opt-in or opt-out possible. Thanks for clarification. With each crime and every kindness we birth our future. |
Mr. Kevvy Send message Joined: 15 May 99 Posts: 3797 Credit: 1,114,826,392 RAC: 3,319 |
|
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
AFAIK there is no technical possibility currently to implement VLAR as "checkbox". So no opt-in or opt-out possible. No hard, technical, reason that I know of, but I can think of a couple of reasons which have come up in the past, and would be relevant again. 1) Time, manpower, priorities - other work to do. It would involve writing new code from scratch, and people need to be strongly motivated before they mess with a working (from their point of view) system. 2) Maintainability. There is a standard BOINC server component called the scheduler. This proposal would require making a separate, "special for SETI", tweaked version. And then remembering to re-apply the changes if the underlying, standard, code needs to be changed in the future. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
AFAIK there is no technical possibility currently to implement VLAR as "checkbox". So no opt-in or opt-out possible. Yep. All this could be shortened in "no thechnical possibility" with added "staying in boundaries of current BOINC infrastructure implementation". |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Instead of turning this thread in another point of rant I would propose for high-end GPU cards owners to more deeply explore quite a big parameter space of current OpenCL app and report back options that could speedup VLAR processing. Such set of params can be made new defaults for high-end devices. If we could decrease performance drop on VLAR this would be most appropriate solution to thread topic issue. Taking in consideration that VLAR has increased share of PulseFind search on longest time arrays (lower FFTs) there is a lack of parallelization that cause performance drop. That's why high-end devices with many CUs see big drop while low/mid-range devices owners don't see so big slowdown. So I would start tuning with these parameters: -period_iterations_num N : Splits single PulseFind kernel call to N calls for longest PulseFind calls. Can be used to reduce GUI lags or to prevent driver restarts. Can affect performance. Experimentation required. Default value for v6/v7/v8 task is N=20. N should be positive integer. -pref_wg_size N : Sets preferred workgroup size for Pulsefind kernels. Should be multiple of wave size (32 for nVidia, 64 for ATi) for better performance and doesn't exceed maximal possible WG size for particular device (256 for ATi and Intel, less than 2048 for NV, depending on CC of device). -pref_wg_num_per_cu N : Sets preferred number of workgroups per compute unit. Currently used only in PulseFind kernels. -sbs N :Sets maximum single buffer size for GPU memory allocations. N should be positive integer and means bigger size in Mbytes. Can affect performance and total memory requirements for application to run. Experimentation required. (and very first attempt should be to add -sbs 512 to tuning line). Also, new generations of GPU have ability to execute in parallell workgroups from different kernels. Not clear how this ability scales to workgroups from different kernels launched in different contexts/processes though. But worth to try to run few VLAR tasks per device to see if this will reduce VLAR performance drop. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Instead of turning this thread in another point of rant I would propose for high-end GPU cards owners to more deeply explore quite a big parameter space of current OpenCL app and report back options that could speedup VLAR processing. I'm sure this is quite a good suggestion .... however, aren't you referring to parameters that are useful on a BETA application??? Of what use do we have here on Main with Nvidia cards?? I would be most amenable to trying to run VLAR's on my Nvidia cards IF we had the Beta OpenCL app here on Main and had a fully implemented Lunatics installer. When is that happening?? Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Instead of turning this thread in another point of rant I would propose for high-end GPU cards owners to more deeply explore quite a big parameter space of current OpenCL app and report back options that could speedup VLAR processing. Hardly any change in scheduling policy on main will be done _before_ release of current beta apps (actually they should be long ago released IMO, but some circumstances prevented to do it in time). Nevertheless all binaries awailable as separate packs from my cloud storage space: https://cloud.mail.ru/public/DMkN/x4BRCYuAV Regarding when they will be added to Lunatics installer - I'll leave this question to Richard for answer. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Thanks for the link to the apps, Raistmer. I have been trying to follow along with the gurus on Beta about these new OpenCL and SoG apps, but it is mostly just going over my head about how to implement them and even which app is the most appropriate to run. By looking at the file names, it seems both are OpenCL yet different executables with the same plan_class. Huh?? I know just enough about how to edit my app_info files to guarantee dumping all my current tasks. I've done it too many times. Thus, without a proven Lunatics installer, I don't believe I'm competent to try using your linked files. I looked at both apps .aistub file and and wonder how both apps use the same SoG plan_class. Which app is the correct one to install to run VLAR's on GTX970's? Not brave enough today and I think I will wait till the apps officially make it to Main, the scheduler code gets updated to handle them and Richard publishes the next, latest and greatest Lunatics installer. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13835 Credit: 208,696,464 RAC: 304 |
Isn't the determination of what is or isn't a VLAR done during splitting? Would it be possible (without major work) to change the threshold value for what constitutes a VLAR WU, based on the data source? ie- original data, retains current threshold value. GBT data has a different threshold value, so less of the current WUs being split are branded as VLAR. Grant Darwin NT |
rob smith Send message Joined: 7 Mar 03 Posts: 22449 Credit: 416,307,556 RAC: 380 |
Sounds a good idea, but the majority of the guppi being split are down in the 0.01 region. This is down the the GBT being used to look intensively at one point in space, rather than scanning a region. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Mike Send message Joined: 17 Feb 01 Posts: 34350 Credit: 79,922,639 RAC: 80 |
Thanks for the link to the apps, Raistmer. I have been trying to follow along with the gurus on Beta about these new OpenCL and SoG apps, but it is mostly just going over my head about how to implement them and even which app is the most appropriate to run. By looking at the file names, it seems both are OpenCL yet different executables with the same plan_class. Huh?? I know just enough about how to edit my app_info files to guarantee dumping all my current tasks. I've done it too many times. Thus, without a proven Lunatics installer, I don't believe I'm competent to try using your linked files. I looked at both apps .aistub file and and wonder how both apps use the same SoG plan_class. Which app is the correct one to install to run VLAR's on GTX970's? Not brave enough today and I think I will wait till the apps officially make it to Main, the scheduler code gets updated to handle them and Richard publishes the next, latest and greatest Lunatics installer. I´m just a co of the Installer crew but i dont think we will redo the installer in the near future. Of course its up to Richard. The params can be added in the comandline text file which is included in each package. I would guide you through it if you want to. With each crime and every kindness we birth our future. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Mike, thanks for your gracious help offering. First question answered please. Can you use the beta apps here on Main or just in the Beta project? IOW, does the Main scheduler know about these apps to send tasks to correctly? Second question, do you really need BOTH OpenCL apps to run VLAR's? Another question is whether both of the apps use the same "opencl_nvidia_SoG" plan_class as they have the same entries in their aistub files? How does that work with the scheduler? Is there a preference about which app to run for VLARs on GTX970's or does it really matter? Is the performance the same with either app? Can you post the best suggested parameters for VLAR's for the MB_command_line files. Thanks in advance. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Mike Send message Joined: 17 Feb 01 Posts: 34350 Credit: 79,922,639 RAC: 80 |
Yes, you can use the "beta" apps here as i do. We just use them at beta first to see if they work as expected. The scheduler don`t care which app is in use so long it knows the plan class. No, of course you only need one OpenCL app to run VLARs. For Nvidias it looks like SoG is the better option. Since one only usues one of the apps the plan class can be the same. You can add the following comand line params to the cmdline.txt file. -sbs 512 -period_iterations_num 80 -spike_fft_thresh 2048 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 32 -oclfft_tune_cw 32 Just make sure the first character is a space. Just for Info. No VLARS beeing sent to GPU at this time. With each crime and every kindness we birth our future. |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Ok, giving Raistmer's idea about shifting the CPU work to the GPUs. The app_info.xml he gave will work but with a caveat. I am unable to limit the amount of work distributed to the different GPUs. It will load as many work units as CPU cores you have. The only difference was, it was trying to restrict how much CPU was being used. Example, I had it set for 2 work units per GPU, 4 GPUs so I expected 8 total. Unfortunately 16 loaded on the the cards, 8 with full CPU support and 8 with about 10% of a core. I tried changing the value in his command line, if I increased to 3 per GPU and 12 total then of the 16 loaded 12 now had full CPU support and 4 were down around 10% CPU. Unfortunately, it got very sluggish and eventually locked up requiring a forced quit. I also tried limiting the number of instances in the app_config.xml for SoG plan class but it didn't do anything. What finally worked was placing a <project_max_concurrent > to 8 and that pushed 8 to waiting to run and leaving the original 8 running with full core support. Time to complete a GUPPI vlar is 21 minutes for 1 VLAR/GPU 41 minutes for 2 VLAR/GPU VLAR on CPU is around 53 min-1 hr. |
AMDave Send message Joined: 9 Mar 01 Posts: 234 Credit: 11,671,730 RAC: 0 |
What finally worked was placing a <project_max_concurrent > to 8 and that pushed 8 to waiting to run and leaving the original 8 running with full core support. What is the % of GPU usage for 1 VLAR/GPU? for 2 VLAR/GPU? |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.