GBT ('guppi') .vlar tasks will be send to GPUs, what you think about this?

Message boards : Number crunching : GBT ('guppi') .vlar tasks will be send to GPUs, what you think about this?
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 10 · Next

AuthorMessage
Profile Dirk Sadowski
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1784929 - Posted: 5 May 2016, 10:35:08 UTC

Here in the SETI-Beta thread I wrote what I think about this.

I thought a larger discussion (here) would be helpful.


I made my decision/statement...
Either...

  • an option in the project prefs for to un-/check 'GBT ('guppi') .vlar tasks to GPU' (so each member can decide to reduce or preserve the performance of the PC),
  • or I use a tool for to send GBT ('guppi') .vlar tasks from GPU to CPU (and screw up CreditNew),
  • or I search for a new primary project,
  • or I switch off my PCs.



Which opinion you have about the title/topic of this thread?

Thanks.


ID: 1784929 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3789
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 1784930 - Posted: 5 May 2016, 11:10:19 UTC
Last modified: 5 May 2016, 11:12:57 UTC

Repost of what I wrote here:

I proposed in the Breakthrough Listen News thread that a checkbox be added to our project preferences akin to "Allow VLAR work on GPU" with an appropriate caveat ie if run on CUDA may complete slowly for less credit, cause machine slowness/lockup or work unit failure, etc.

The box would be initially off, so no one would get them and risk issue(s) unless they chose to do so. Einstein@Home has similar controls with similar caveats in their project preferences page for setting the Count parameter to allow multiple concurrent work units on GPU (which has the same risks of instability plus added bonus of hardware failure risk if the GPU overheats) and they seem to do just fine with the disclaimer, so no reason it wouldn't work here. Volunteers with mixed work and dedicated cruncher machines could set home/work/school profiles and put the machine types in them so computers they actually use wouldn't be slowed.

All I can suggest is that if you would like this to be implemented to post in that thread. If enough of us ask, it may be.


If there is a control for it, from a support and user-friendliness standpoint, it should definitely be implemented as an opt-in rather than an opt-out.
ID: 1784930 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1784947 - Posted: 5 May 2016, 13:46:43 UTC

AFAIK there is no technical possibility currently to implement VLAR as "checkbox". So no opt-in or opt-out possible.
With flow of GBT data there can be situations when GPU has no SETI work at all for prolonged times.
It's definitely waste of resources for project. So, the local solution could be to ensure availability of non-VLAR work for GPU or enabling VLAR for GPU.

I'm not sure that VLAR/non-VLAR mix can be ensured in unattended mode and to require man-power for this is a waste even more precious project resource.

Third solution could be to arbitrary increase "payment" in credits for VLAR. So all those who watch RAC too closely will be pleased. Also, this action would emphasize real importance of dedicated observations that VLAR constitutes.
Unfortunately, this solution hardly will go due to BOINC vs SETI cobblestone politics (any other project could do that with easy IMO).

So, enabling VLAR for fast GPUs that can do it is the best real project-wise solution IMO.
ID: 1784947 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34330
Credit: 79,922,639
RAC: 80
Germany
Message 1784971 - Posted: 5 May 2016, 15:48:52 UTC
Last modified: 5 May 2016, 15:49:39 UTC

AFAIK there is no technical possibility currently to implement VLAR as "checkbox". So no opt-in or opt-out possible.


But GBT tasks in particular.
Would be much easier IMO.


With each crime and every kindness we birth our future.
ID: 1784971 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1785019 - Posted: 5 May 2016, 19:22:13 UTC - in response to Message 1784971.  

AFAIK there is no technical possibility currently to implement VLAR as "checkbox". So no opt-in or opt-out possible.


But GBT tasks in particular.
Would be much easier IMO.


They go not as GBT but as usual MB v8 tasks so no.
We see some difference in server statistics representation but not in client part. in client it's still same *_v8 tasks.
ID: 1785019 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34330
Credit: 79,922,639
RAC: 80
Germany
Message 1785029 - Posted: 5 May 2016, 20:17:12 UTC - in response to Message 1785019.  

AFAIK there is no technical possibility currently to implement VLAR as "checkbox". So no opt-in or opt-out possible.


But GBT tasks in particular.
Would be much easier IMO.


They go not as GBT but as usual MB v8 tasks so no.
We see some difference in server statistics representation but not in client part. in client it's still same *_v8 tasks.


Thanks for clarification.


With each crime and every kindness we birth our future.
ID: 1785029 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3789
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 1785064 - Posted: 5 May 2016, 22:51:02 UTC - in response to Message 1784947.  

AFAIK there is no technical possibility currently to implement VLAR as "checkbox". So no opt-in or opt-out possible.


Is there any detailed reason why it wouldn't be possible?
ID: 1785064 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14666
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1785069 - Posted: 5 May 2016, 23:07:58 UTC - in response to Message 1785064.  

AFAIK there is no technical possibility currently to implement VLAR as "checkbox". So no opt-in or opt-out possible.

Is there any detailed reason why it wouldn't be possible?

No hard, technical, reason that I know of, but I can think of a couple of reasons which have come up in the past, and would be relevant again.

1) Time, manpower, priorities - other work to do. It would involve writing new code from scratch, and people need to be strongly motivated before they mess with a working (from their point of view) system.

2) Maintainability. There is a standard BOINC server component called the scheduler. This proposal would require making a separate, "special for SETI", tweaked version. And then remembering to re-apply the changes if the underlying, standard, code needs to be changed in the future.
ID: 1785069 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1785118 - Posted: 6 May 2016, 5:13:30 UTC - in response to Message 1785069.  
Last modified: 6 May 2016, 5:13:52 UTC

AFAIK there is no technical possibility currently to implement VLAR as "checkbox". So no opt-in or opt-out possible.

Is there any detailed reason why it wouldn't be possible?

No hard, technical, reason that I know of, but I can think of a couple of reasons which have come up in the past, and would be relevant again.

1) Time, manpower, priorities - other work to do. It would involve writing new code from scratch, and people need to be strongly motivated before they mess with a working (from their point of view) system.

2) Maintainability. There is a standard BOINC server component called the scheduler. This proposal would require making a separate, "special for SETI", tweaked version. And then remembering to re-apply the changes if the underlying, standard, code needs to be changed in the future.


Yep. All this could be shortened in "no thechnical possibility" with added "staying in boundaries of current BOINC infrastructure implementation".
ID: 1785118 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1785121 - Posted: 6 May 2016, 5:34:24 UTC
Last modified: 6 May 2016, 5:36:50 UTC

Instead of turning this thread in another point of rant I would propose for high-end GPU cards owners to more deeply explore quite a big parameter space of current OpenCL app and report back options that could speedup VLAR processing.

Such set of params can be made new defaults for high-end devices. If we could decrease performance drop on VLAR this would be most appropriate solution to thread topic issue.

Taking in consideration that VLAR has increased share of PulseFind search on longest time arrays (lower FFTs) there is a lack of parallelization that cause performance drop. That's why high-end devices with many CUs see big drop while low/mid-range devices owners don't see so big slowdown.

So I would start tuning with these parameters:

-period_iterations_num N : Splits single PulseFind kernel call to N calls for longest PulseFind calls. Can be
used to reduce GUI lags or to prevent driver restarts. Can affect performance. Experimentation
required. Default value for v6/v7/v8 task is N=20. N should be positive integer.

-pref_wg_size N : Sets preferred workgroup size for Pulsefind kernels.
Should be multiple of wave size (32 for nVidia, 64 for ATi) for better performance
and doesn't exceed maximal possible WG size for particular device (256 for ATi and Intel, less than 2048 for NV, depending on CC of device).

-pref_wg_num_per_cu N : Sets preferred number of workgroups per compute unit. Currently used only in PulseFind kernels.

-sbs N :Sets maximum single buffer size for GPU memory allocations. N should be positive integer and means
bigger size in Mbytes. Can affect performance and total memory requirements for application to run.
Experimentation required.

(and very first attempt should be to add -sbs 512 to tuning line).

Also, new generations of GPU have ability to execute in parallell workgroups from different kernels. Not clear how this ability scales to workgroups from different kernels launched in different contexts/processes though.
But worth to try to run few VLAR tasks per device to see if this will reduce VLAR performance drop.
ID: 1785121 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1785123 - Posted: 6 May 2016, 5:49:09 UTC - in response to Message 1785121.  

Instead of turning this thread in another point of rant I would propose for high-end GPU cards owners to more deeply explore quite a big parameter space of current OpenCL app and report back options that could speedup VLAR processing.


I'm sure this is quite a good suggestion .... however, aren't you referring to parameters that are useful on a BETA application??? Of what use do we have here on Main with Nvidia cards?? I would be most amenable to trying to run VLAR's on my Nvidia cards IF we had the Beta OpenCL app here on Main and had a fully implemented Lunatics installer. When is that happening??
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1785123 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1785126 - Posted: 6 May 2016, 6:15:27 UTC - in response to Message 1785123.  

Instead of turning this thread in another point of rant I would propose for high-end GPU cards owners to more deeply explore quite a big parameter space of current OpenCL app and report back options that could speedup VLAR processing.


I'm sure this is quite a good suggestion .... however, aren't you referring to parameters that are useful on a BETA application??? Of what use do we have here on Main with Nvidia cards?? I would be most amenable to trying to run VLAR's on my Nvidia cards IF we had the Beta OpenCL app here on Main and had a fully implemented Lunatics installer. When is that happening??

Hardly any change in scheduling policy on main will be done _before_ release of current beta apps (actually they should be long ago released IMO, but some circumstances prevented to do it in time).

Nevertheless all binaries awailable as separate packs from my cloud storage space: https://cloud.mail.ru/public/DMkN/x4BRCYuAV

Regarding when they will be added to Lunatics installer - I'll leave this question to Richard for answer.
ID: 1785126 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1785148 - Posted: 6 May 2016, 7:35:29 UTC - in response to Message 1785126.  

Thanks for the link to the apps, Raistmer. I have been trying to follow along with the gurus on Beta about these new OpenCL and SoG apps, but it is mostly just going over my head about how to implement them and even which app is the most appropriate to run. By looking at the file names, it seems both are OpenCL yet different executables with the same plan_class. Huh?? I know just enough about how to edit my app_info files to guarantee dumping all my current tasks. I've done it too many times. Thus, without a proven Lunatics installer, I don't believe I'm competent to try using your linked files. I looked at both apps .aistub file and and wonder how both apps use the same SoG plan_class. Which app is the correct one to install to run VLAR's on GTX970's? Not brave enough today and I think I will wait till the apps officially make it to Main, the scheduler code gets updated to handle them and Richard publishes the next, latest and greatest Lunatics installer.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1785148 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13820
Credit: 208,696,464
RAC: 304
Australia
Message 1785154 - Posted: 6 May 2016, 7:56:06 UTC - in response to Message 1785148.  

Isn't the determination of what is or isn't a VLAR done during splitting?
Would it be possible (without major work) to change the threshold value for what constitutes a VLAR WU, based on the data source?
ie- original data, retains current threshold value. GBT data has a different threshold value, so less of the current WUs being split are branded as VLAR.
Grant
Darwin NT
ID: 1785154 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22384
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1785161 - Posted: 6 May 2016, 8:40:48 UTC

Sounds a good idea, but the majority of the guppi being split are down in the 0.01 region. This is down the the GBT being used to look intensively at one point in space, rather than scanning a region.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1785161 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34330
Credit: 79,922,639
RAC: 80
Germany
Message 1785182 - Posted: 6 May 2016, 12:58:50 UTC - in response to Message 1785148.  

Thanks for the link to the apps, Raistmer. I have been trying to follow along with the gurus on Beta about these new OpenCL and SoG apps, but it is mostly just going over my head about how to implement them and even which app is the most appropriate to run. By looking at the file names, it seems both are OpenCL yet different executables with the same plan_class. Huh?? I know just enough about how to edit my app_info files to guarantee dumping all my current tasks. I've done it too many times. Thus, without a proven Lunatics installer, I don't believe I'm competent to try using your linked files. I looked at both apps .aistub file and and wonder how both apps use the same SoG plan_class. Which app is the correct one to install to run VLAR's on GTX970's? Not brave enough today and I think I will wait till the apps officially make it to Main, the scheduler code gets updated to handle them and Richard publishes the next, latest and greatest Lunatics installer.


I´m just a co of the Installer crew but i dont think we will redo the installer in the near future.
Of course its up to Richard.

The params can be added in the comandline text file which is included in each package.
I would guide you through it if you want to.


With each crime and every kindness we birth our future.
ID: 1785182 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1785220 - Posted: 6 May 2016, 15:43:03 UTC - in response to Message 1785182.  

Mike, thanks for your gracious help offering. First question answered please.

Can you use the beta apps here on Main or just in the Beta project? IOW, does the Main scheduler know about these apps to send tasks to correctly?

Second question, do you really need BOTH OpenCL apps to run VLAR's?

Another question is whether both of the apps use the same "opencl_nvidia_SoG" plan_class as they have the same entries in their aistub files? How does that work with the scheduler?

Is there a preference about which app to run for VLARs on GTX970's or does it really matter? Is the performance the same with either app?

Can you post the best suggested parameters for VLAR's for the MB_command_line files.

Thanks in advance.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1785220 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34330
Credit: 79,922,639
RAC: 80
Germany
Message 1785297 - Posted: 6 May 2016, 21:23:13 UTC
Last modified: 6 May 2016, 21:29:14 UTC

Yes, you can use the "beta" apps here as i do.
We just use them at beta first to see if they work as expected.
The scheduler don`t care which app is in use so long it knows the plan class.

No, of course you only need one OpenCL app to run VLARs.
For Nvidias it looks like SoG is the better option.

Since one only usues one of the apps the plan class can be the same.

You can add the following comand line params to the cmdline.txt file.

-sbs 512 -period_iterations_num 80 -spike_fft_thresh 2048 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 32 -oclfft_tune_cw 32

Just make sure the first character is a space.

Just for Info.

No VLARS beeing sent to GPU at this time.


With each crime and every kindness we birth our future.
ID: 1785297 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1785458 - Posted: 7 May 2016, 4:39:30 UTC - in response to Message 1785297.  

Ok, giving Raistmer's idea about shifting the CPU work to the GPUs.

The app_info.xml he gave will work but with a caveat.

I am unable to limit the amount of work distributed to the different GPUs. It will load as many work units as CPU cores you have.

The only difference was, it was trying to restrict how much CPU was being used.

Example, I had it set for 2 work units per GPU, 4 GPUs so I expected 8 total. Unfortunately 16 loaded on the the cards, 8 with full CPU support and 8 with about 10% of a core.

I tried changing the value in his command line, if I increased to 3 per GPU and 12 total then of the 16 loaded 12 now had full CPU support and 4 were down around 10% CPU. Unfortunately, it got very sluggish and eventually locked up requiring a forced quit.

I also tried limiting the number of instances in the app_config.xml for SoG plan class but it didn't do anything.

What finally worked was placing a <project_max_concurrent > to 8 and that pushed 8 to waiting to run and leaving the original 8 running with full core support.

Time to complete a GUPPI vlar is 21 minutes for 1 VLAR/GPU

41 minutes for 2 VLAR/GPU

VLAR on CPU is around 53 min-1 hr.
ID: 1785458 · Report as offensive
AMDave
Volunteer tester

Send message
Joined: 9 Mar 01
Posts: 234
Credit: 11,671,730
RAC: 0
United States
Message 1785497 - Posted: 7 May 2016, 14:15:39 UTC - in response to Message 1785458.  

What finally worked was placing a <project_max_concurrent > to 8 and that pushed 8 to waiting to run and leaving the original 8 running with full core support.

Time to complete a GUPPI vlar is 21 minutes for 1 VLAR/GPU

41 minutes for 2 VLAR/GPU

VLAR on CPU is around 53 min-1 hr.

What is the % of GPU usage for 1 VLAR/GPU? for 2 VLAR/GPU?
ID: 1785497 · Report as offensive
1 · 2 · 3 · 4 . . . 10 · Next

Message boards : Number crunching : GBT ('guppi') .vlar tasks will be send to GPUs, what you think about this?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.