GBT ('guppi') .vlar tasks will be send to GPUs, what you think about this?

Message boards : Number crunching : GBT ('guppi') .vlar tasks will be send to GPUs, what you think about this?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 10 · Next

AuthorMessage
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1785503 - Posted: 7 May 2016, 14:26:56 UTC - in response to Message 1785497.  

What is the % of GPU usage for 1 VLAR/GPU? for 2 VLAR/GPU?


94-96% GPU Usage for 1 VLAR/GPU

98-99% GPU Usage for 2 VLAR/GPU but the system starts to slow down in responsiveness. Not a problem for a dedicated cruncher but some people might get annoyed with trying to use the computer.
ID: 1785503 · Report as offensive
Profile tullio
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 8797
Credit: 2,930,782
RAC: 1
Italy
Message 1785575 - Posted: 7 May 2016, 18:07:24 UTC

I am running 2 Guppi Vlar on the CPU of my Windows 10 PC, which has a NVidia graphic board. I have installed an AMD board on my main Linux box, and it is running both SETI CPU tasks and SETI GPU tasks but so far no Vlar.
Tullio
ID: 1785575 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14674
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1785580 - Posted: 7 May 2016, 18:25:12 UTC - in response to Message 1785575.  

I started the same test this morning, with a view to evaluating whether OpenCL (specifically MB8_win_x86_SSE3_OpenCL_NV_r3430_SoG.exe) is ready for prime time in the installer yet. It's running - deliberately - "as stock" on Beta host 23492. It's running 2-up, initially on the GTX 750 Ti (headless), but now on the GTX 970 which is also the display provider for my daily driver. The questions being (a) can I live with the display while it's working on guppi VLARs, and (b) what sort of efficiency (=speed) do I see at VLAR? So far, Murphy's server has dished out mostly Arecibo mid-ARs, and I've noticed occasional display lags - mostly when the screen attempts to display movement, rather than preventing typing as CUDA used to do at VLAR.

Mid ARs are running through at about 14 minutes the pair, so I'll start the VLAR test in earnest in a bit over an hour from now.

There's no reason not to do an installer refresh at some point, if only to strip out the v7 transition content after the last 168 WUs have cleared through. But I need to be sure that SoG is ready, and that BOINC handles the switch between CUDA and OpenCL plan classes transparently when people upgrade with tasks in flight. So, I'm not rushing into it.
ID: 1785580 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1785652 - Posted: 7 May 2016, 20:58:10 UTC - in response to Message 1785580.  
Last modified: 7 May 2016, 20:59:28 UTC

Thanks for the installer update Richard. If I understand Mike's and Zalster's posts, it doesn't really matter at the moment if the R3430 OpenCL SoG app is run on Main because there aren't any VLAR's being sent to Nvidia cards still. Only at Beta are VLAR's being sent to Nvidia cards. Is that correct?

I did look at the Beta host you linked and it looks like it has processed some Guppi VLARs.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1785652 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1785659 - Posted: 7 May 2016, 21:08:49 UTC - in response to Message 1785652.  

VLAR are not sent to GPU on main. You can force it thou to move CPU VLAR to the GPU with Raistmer's app_info he posted.

Otherwise you can run them on Beta where they are sent to the GPU

I processed some here on Main yesterday but suspended them last night.

I'll probably test some more tonight again.
ID: 1785659 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34348
Credit: 79,922,639
RAC: 80
Germany
Message 1785664 - Posted: 7 May 2016, 21:17:35 UTC - in response to Message 1785580.  

I started the same test this morning, with a view to evaluating whether OpenCL (specifically MB8_win_x86_SSE3_OpenCL_NV_r3430_SoG.exe) is ready for prime time in the installer yet. It's running - deliberately - "as stock" on Beta host 23492. It's running 2-up, initially on the GTX 750 Ti (headless), but now on the GTX 970 which is also the display provider for my daily driver. The questions being (a) can I live with the display while it's working on guppi VLARs, and (b) what sort of efficiency (=speed) do I see at VLAR? So far, Murphy's server has dished out mostly Arecibo mid-ARs, and I've noticed occasional display lags - mostly when the screen attempts to display movement, rather than preventing typing as CUDA used to do at VLAR.

Mid ARs are running through at about 14 minutes the pair, so I'll start the VLAR test in earnest in a bit over an hour from now.

There's no reason not to do an installer refresh at some point, if only to strip out the v7 transition content after the last 168 WUs have cleared through. But I need to be sure that SoG is ready, and that BOINC handles the switch between CUDA and OpenCL plan classes transparently when people upgrade with tasks in flight. So, I'm not rushing into it.


So i better find some nice params for you NV guys soon.


With each crime and every kindness we birth our future.
ID: 1785664 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14674
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1785669 - Posted: 7 May 2016, 21:47:15 UTC - in response to Message 1785652.  
Last modified: 7 May 2016, 22:04:15 UTC

I did look at the Beta host you linked and it looks like it has processed some Guppi VLARs.

Yes, it's done a few already, and another four to report any moment now. Run times vary from 30 minutes to 39 minutes - in other words, starting from more than double mid-AR and upwards from there. Two points observed so far: some bad lags, including (rarely) complete screen freezes for several seconds. And <fraction_done> is still out of synch with <progress>: <fraction_done> is what is displayed in BOINC Manager these days, and it's getting ahead of itself. So the visible progress meter implies a shorter runtime, and slows down as it progresses - that's the opposite of what the old CUDA apps used to display with VLARs, starting slowly and speeding up as they progressed. More observations tomorrow.

Yes, with no VLARs sent to NV at main at the moment, there's no rush for SoG in the installer - but we should get it polished ready for deployment if the need arises. Eric has said (recently - see Beta) that very little data is being recorded at Arecibo these days, so the day will come when it's Guppi or nothing. And with the current state of play, that's mostly VLAR - and no AP.

Edit - forgot to mention CPU usage. This is how BoincView sees my daily driver.



The middle column is 'CPU efficiency'. An ideal CPU application will show 1.0000 - full use of one core. An ideal GPU application will show 0.0000 - all work on GPU (not monitored by this app), no CPU used.

The two Beta project tasks are SoG on GTX 970. The two SETI@Home tasks are cuda50 on GTX 750Ti. Einstein@home is an OpenCL app running on intel GPU - very well behaved.

Numberfields is a CPU app, I think mostly integer. Two instances are just about nominal, but the third has triggered the yellow 'poor efficiency' warning, down below 25% of a core. On machines which don't also have 16 browser tabs open, I can run four of those plus four cuda tasks and not drop below 80%:


(well, maybe one core is a little lower)
ID: 1785669 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34348
Credit: 79,922,639
RAC: 80
Germany
Message 1785671 - Posted: 7 May 2016, 21:55:34 UTC - in response to Message 1785669.  

I did look at the Beta host you linked and it looks like it has processed some Guppi VLARs.

Yes, it's done a few already, and another four to report any moment now. Run times vary from 30 minutes to 39 minutes - in other words, starting from more than double mid-AR and upwards from there. Two points observed so far: some bad lags, including (rarely) complete screen freezes for several seconds. And <fraction_done> is still out of synch with <progress>: <fraction_done> is what is displayed in BOINC Manager these days, and it's getting ahead of itself. So the visible progress meter implies a shorter runtime, and slows down as it progresses - that's the opposite of what the old CUDA apps used to display with VLARs, starting slowly and speeding up as they progressed. More observations tomorrow.

Yes, with no VLARs sent to NV at main at the moment, there's no rush for SoG in the installer - but we should get it polished ready for deployment if the need arises. Eric has said (recently - see Beta) that very little data is being recorded at Arecibo these days, so the day will come when it's Guppi or nothing. And with the current state of play, that's mostly VLAR - and no AP.


ATM its no VLAR`s for GPU`s at all.
I emailed Eric about it a while ago but he seems to be to busy atm.


With each crime and every kindness we birth our future.
ID: 1785671 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14674
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1785673 - Posted: 7 May 2016, 22:06:22 UTC - in response to Message 1785671.  

ATM its no VLAR`s for GPU`s at all.
I emailed Eric about it a while ago but he seems to be to busy atm.

Agreed, that's the policy for Main at the moment. That's why I'm running this test against Beta, where VLARs are allowed on NV Kepler GPUs and above.
ID: 1785673 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34348
Credit: 79,922,639
RAC: 80
Germany
Message 1785676 - Posted: 7 May 2016, 22:16:07 UTC - in response to Message 1785673.  

ATM its no VLAR`s for GPU`s at all.
I emailed Eric about it a while ago but he seems to be to busy atm.

Agreed, that's the policy for Main at the moment. That's why I'm running this test against Beta, where VLARs are allowed on NV Kepler GPUs and above.


Yes, but my first response was that AMD`s should get VLAR`s.
So i`m certain its still not easy to seperate both vendors.
At least the plan was different.


With each crime and every kindness we birth our future.
ID: 1785676 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14674
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1785804 - Posted: 8 May 2016, 10:36:01 UTC

As a general observation, if my car was mis-firing this badly, I'd pull over at the next garage and ask them to check it out before I completed my journey. The screen just "feels wrong" when running VLARs, in a way which would make me worry about my safety in a car.

More significantly, I've just received my first error result - our old friend "finish file present too long". The timetable is:

11:03:17 (40088): called boinc_finish(0)
08/05/2016 11:03:30 | SETI@home Beta Test | [sched_op] Reason: Unrecoverable error for task 24mr10ac.7768.9486.6.40.124_1
08/05/2016 11:03:35 | SETI@home Beta Test | Computation for task 24mr10ac.7768.9486.6.40.124_1 finished

So it looks as if there was an 18-second gap between calling finish and the app quitting, with BOINC pulling the plug at 13 seconds.

That was Beta task 23790517, a normal Arecibo AR=0.429234. It was sharing the GPU with Beta VLAR task 23789897: I hope the VLAR lag didn't hold up the BOINC finish process.

BOINC on this machine is the standard v7.6.22 (Windows 7/64) - I think that one has the shortest finish file tolerance of all. I'll check a few more timetables (delay between boinc_finish call and application exit), and see if I can find a pattern.
ID: 1785804 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34348
Credit: 79,922,639
RAC: 80
Germany
Message 1785810 - Posted: 8 May 2016, 11:29:28 UTC - in response to Message 1785804.  

As a general observation, if my car was mis-firing this badly, I'd pull over at the next garage and ask them to check it out before I completed my journey. The screen just "feels wrong" when running VLARs, in a way which would make me worry about my safety in a car.

More significantly, I've just received my first error result - our old friend "finish file present too long". The timetable is:

11:03:17 (40088): called boinc_finish(0)
08/05/2016 11:03:30 | SETI@home Beta Test | [sched_op] Reason: Unrecoverable error for task 24mr10ac.7768.9486.6.40.124_1
08/05/2016 11:03:35 | SETI@home Beta Test | Computation for task 24mr10ac.7768.9486.6.40.124_1 finished

So it looks as if there was an 18-second gap between calling finish and the app quitting, with BOINC pulling the plug at 13 seconds.

That was Beta task 23790517, a normal Arecibo AR=0.429234. It was sharing the GPU with Beta VLAR task 23789897: I hope the VLAR lag didn't hold up the BOINC finish process.

BOINC on this machine is the standard v7.6.22 (Windows 7/64) - I think that one has the shortest finish file tolerance of all. I'll check a few more timetables (delay between boinc_finish call and application exit), and see if I can find a pattern.


Without increasing single buffer size its no surprise to me.

Maybe it would be a good idea to increase default params for NV builds.
At least single buffer size and period_iterations_num.


With each crime and every kindness we birth our future.
ID: 1785810 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14674
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1785815 - Posted: 8 May 2016, 12:26:08 UTC - in response to Message 1785810.  

Without increasing single buffer size its no surprise to me.

Maybe it would be a good idea to increase default params for NV builds.
At least single buffer size and period_iterations_num.

If a single value can be found which is suitable for all cards [*], then absolutely, yes. If it has to be variable, then we need some sort of automatic tuning to pick the most appropriate values to suit the card the apps find themselves running on.

[*] all cards the SoG app is appropriate for, that is. Do I remember it needs plenty of VRAM? We need an automatic 'minimum card' rule for stock deployment, and manual minimum card advice for the installer.
ID: 1785815 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34348
Credit: 79,922,639
RAC: 80
Germany
Message 1785821 - Posted: 8 May 2016, 12:55:55 UTC - in response to Message 1785815.  
Last modified: 8 May 2016, 12:58:52 UTC

Without increasing single buffer size its no surprise to me.

Maybe it would be a good idea to increase default params for NV builds.
At least single buffer size and period_iterations_num.

If a single value can be found which is suitable for all cards [*], then absolutely, yes. If it has to be variable, then we need some sort of automatic tuning to pick the most appropriate values to suit the card the apps find themselves running on.

[*] all cards the SoG app is appropriate for, that is. Do I remember it needs plenty of VRAM? We need an automatic 'minimum card' rule for stock deployment, and manual minimum card advice for the installer.


That`s what i try to find out next week.
Will run some tests on a 640 and 730.
I have some values in mind, if the cards can cope with them we are another step in the right direction.

Most cards now have at least 1GB VRAM.
This would still be enough to run 2 instances.
So this shouldn`t be a problem.
I dont think one would run 2 instances on a 720.


With each crime and every kindness we birth our future.
ID: 1785821 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1785949 - Posted: 8 May 2016, 21:31:00 UTC - in response to Message 1785815.  
Last modified: 8 May 2016, 21:31:39 UTC

Without increasing single buffer size its no surprise to me.

Maybe it would be a good idea to increase default params for NV builds.
At least single buffer size and period_iterations_num.

If a single value can be found which is suitable for all cards [*], then absolutely, yes. If it has to be variable, then we need some sort of automatic tuning to pick the most appropriate values to suit the card the apps find themselves running on.

[*] all cards the SoG app is appropriate for, that is. Do I remember it needs plenty of VRAM? We need an automatic 'minimum card' rule for stock deployment, and manual minimum card advice for the installer.


App check available GPU RAM. IF there is low - lower memory path used. If possible, app increase RAM usage for speedup.

Also, I don't see how this connected with computation error from BOINC API. GPU computations were finished at that point, whatever buffer was.
ID: 1785949 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14674
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1785954 - Posted: 8 May 2016, 21:47:29 UTC - in response to Message 1785949.  

Without increasing single buffer size its no surprise to me.

Also, I don't see how this connected with computation error from BOINC API. GPU computations were finished at that point, whatever buffer was.

That part of Mike's reply was probably addressed to my opening remark:

As a general observation, if my car was mis-firing this badly, I'd pull over at the next garage and ask them to check it out before I completed my journey. The screen just "feels wrong" when running VLARs, in a way which would make me worry about my safety in a car.

The apps and screen are running on

OpenCL: NVIDIA GPU 0: GeForce GTX 970 (driver version 350.12, device version OpenCL 1.2 CUDA, 4096MB, 3903MB available, 4087 GFLOPS peak)
ID: 1785954 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1785957 - Posted: 8 May 2016, 21:59:23 UTC - in response to Message 1785954.  


OpenCL: NVIDIA GPU 0: GeForce GTX 970 (driver version 350.12, device version OpenCL 1.2 CUDA, 4096MB, 3903MB available, 4087 GFLOPS peak)


WU true angle range is :  0.429234
Used GPU device parameters are:
	Number of compute units: 13
	Single buffer allocation size: 128MB
	Total device global memory: 4096MB
	max WG size: 1024
	local mem type: Real
	FERMI path used: yes
	LotOfMem path: yes
	LowPerformanceGPU path: no
period_iterations_num=50


I suppose VLAR ran with same defaults?
Well, as I wrote few posts earlier:
Instead of turning this thread in another point of rant I would propose for high-end GPU cards owners to more deeply explore quite a big parameter space of current OpenCL app and report back options that could speedup VLAR processing.

Such set of params can be made new defaults for high-end devices. If we could decrease performance drop on VLAR this would be most appropriate solution to thread topic issue.


And:
(and very first attempt should be to add -sbs 512 to tuning line).


It's still a way to go.
ID: 1785957 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14674
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1785958 - Posted: 8 May 2016, 22:00:50 UTC - in response to Message 1785957.  

It's still a way to go.

Before the current app can be considered for release to run as stock.
ID: 1785958 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1785960 - Posted: 8 May 2016, 22:11:48 UTC - in response to Message 1785958.  

It's still a way to go.

Before the current app can be considered for release to run as stock.


Before VLAR could be enabled on main, perhaps? Don't mix things up.
ID: 1785960 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14674
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1785962 - Posted: 8 May 2016, 22:27:55 UTC - in response to Message 1785960.  

It's still a way to go.

Before the current app can be considered for release to run as stock.

Before VLAR could be enabled on main, perhaps? Don't mix things up.

Yes indeed, and that keeps us on topic for this thread. I don't think that GBT VLARs should be sent to (NVidia) GPUs until we have an application ready to handle them, and I don't think we have such an application - either CUDA or OpenCL - yet.
ID: 1785962 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 . . . 10 · Next

Message boards : Number crunching : GBT ('guppi') .vlar tasks will be send to GPUs, what you think about this?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.