OpenCL NV MultiBeam v8 SoG edition for Windows

Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · 15 · 16 . . . 21 · Next

AuthorMessage
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5516
Credit: 528,817,460
RAC: 242
United States
Message 1779314 - Posted: 15 Apr 2016, 15:10:15 UTC
Last modified: 15 Apr 2016, 15:13:51 UTC

Was observing the SoG and noticed they start off fast but when they hit 70% complete they start to crawl.....dragging out the completion

example...

6 minutes into computations, 70% done then they slow down and take another 6 minutes to finish those last 30%

I'm wondering if this the fix Raistmer came up with to prevent the 100% usage of the CPU when nearing the completion of the work unit.
ID: 1779314 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14474
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1779318 - Posted: 15 Apr 2016, 15:27:20 UTC - in response to Message 1779314.  

Which precise SoG version was this, please? If it was for NV card, please cite r number and I'll check the actual <frac_done> values with my modded client - it was a progress reporting problem last time.
ID: 1779318 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5516
Credit: 528,817,460
RAC: 242
United States
Message 1779319 - Posted: 15 Apr 2016, 15:33:29 UTC - in response to Message 1779318.  

I'm using the last one Raistmer had on his dropbox

Mb8_win_x86_SSE3_OpenCl_NV_r3430_SoG.7z
ID: 1779319 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14474
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1779323 - Posted: 15 Apr 2016, 15:59:46 UTC - in response to Message 1779319.  
Last modified: 15 Apr 2016, 16:53:14 UTC

OK, got it, thanks. Did you happen to notice any task details - what AR, etc.? Hooked it up to Beta and got guppi_VLAR as the first fetch.

It does seem to be an issue related to the one we were discussing at Beta - but at VLAR, perhaps not so drastic.

wu_name: blc3_2bit_guppi_57451_20612_HIP62472_0007.12525.831.18.21.146.vlar
WU true angle range is :  0.008175

            <prog>        <fraction_done>
17:13:25
17:14:25    0.01748931    0.043302
17:15:25    0.04244621    0.098808
17:16:26    0.06566018    0.159925
17:17:27    0.09050095    0.215225
17:18:32    0.11596886    0.277116
17:19:36    0.14290232    0.341525
17:20:36    0.16774924    0.396144
17:21:43    0.18374324    0.442169
17:22:50    0.20764196    0.497730
17:23:56    0.23405194    0.559920
17:24:59    0.26072407    0.622799
17:26:03    0.28700744    0.684736
17:27:04    0.32917878    0.704677
17:28:05    0.37018072    0.726249
17:29:06    0.41237318    0.745490
17:30:07    0.45388002    0.766894
17:31:08    0.49561945    0.785576
17:32:08    0.53885943    0.807092
17:33:08    0.58042432    0.826390
17:34:10    0.62295983    0.845603
17:35:10    0.66387519    0.866467
17:36:11    0.70599017    0.885643
17:37:12    0.74688920    0.906469

The SoG application keeps two different counters for how far it's got: "Progress" and "Fraction done". We're seeing fraction done in the progress column in BOINC Manager. You'd expect them to be the same, but clearly either fraction done is counting up too fast in the early stages, or progress is counting more slowly (which is what we're used to from the CUDA apps, which - in reporting terms - start slow and speed up). I don't think there's any definite answer for which is right - perhaps we need to arrange another developer brainstorming party, if that desert island is still free.
ID: 1779323 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5516
Credit: 528,817,460
RAC: 242
United States
Message 1779338 - Posted: 15 Apr 2016, 17:08:26 UTC - in response to Message 1779323.  
Last modified: 15 Apr 2016, 17:23:25 UTC

Looking all the SoG I've run this am, they are all 0.008175

I'll look farther back to last night

Edit..

If it is just the progress we see versus the fraction done than I guess there is nothing we can do.

I do like that they don't suddenly jump up to 100% of CPU like they were doing.

Still 13 minutes GPU is much better than 55 minutes CPU
ID: 1779338 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14474
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1779343 - Posted: 15 Apr 2016, 17:34:10 UTC

Now I'm back upstairs, I can see that SoG is still using a heck of a lot of CPU: current display of "CPU efficiency" in BoincView is 0.9788, or 97.88% of a core - pretty good for a CPU application, but compare with ~22% for GPUGrid cuda65, ~11% for SETI cuda50, or <2% for Einstein intel_gpu tasks.

Note that CPU efficiency is (some form of) direct measurement: I think BoincTasks has something similar, though I don't know exactly how either of them work. But they're certainly more realistic than BOINC Manager's echo back of whatever value is written in app_info.xml or estimated by the server.
ID: 1779343 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5516
Credit: 528,817,460
RAC: 242
United States
Message 1779352 - Posted: 15 Apr 2016, 18:02:18 UTC - in response to Message 1779343.  
Last modified: 15 Apr 2016, 18:23:19 UTC

Yes, I use BoincTask as well and they all use 97% of a core each.

It used to be running more than 1 per card reduced that amount but I just finished testin 2 at a time on the cards and there is no difference.

About to try 3 at a time and will see if that reduces any time to complete or % of CPU.

If neither of those happen then I may run r3366 again to see how it compares.
ID: 1779352 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5516
Credit: 528,817,460
RAC: 242
United States
Message 1779371 - Posted: 15 Apr 2016, 19:43:17 UTC - in response to Message 1779352.  

So 3 per Card generates 36-37 minutes, about 1 minute faster than running a single Guppi by itself on the GPU

CPU utilization is down anywhere from 80-92% of a CPU rather than 97%

Only problem is, this require large number of CPU cores to make this happen.
ID: 1779371 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51445
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1779374 - Posted: 15 Apr 2016, 19:46:40 UTC - in response to Message 1779371.  

So 3 per Card generates 36-37 minutes, about 1 minute faster than running a single Guppi by itself on the GPU

CPU utilization is down anywhere from 80-92% of a CPU rather than 97%

Only problem is, this require large number of CPU cores to make this happen.

Not looking very good, is it?
I gotz GPU power, but CPU power is a limitation.
This may be a hard nut to crack for the optimization crew.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1779374 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5516
Credit: 528,817,460
RAC: 242
United States
Message 1779380 - Posted: 15 Apr 2016, 20:05:41 UTC - in response to Message 1779374.  
Last modified: 15 Apr 2016, 20:06:05 UTC

Kinda, but in the long run, it is still faster than the CPU version.

Regarding Guppi...

1 work unit CPU is 55 minutes vs 14 minutes GPU

where issues arises is when comparing how fast non vlars are crunched, especially when multiple instances.

I would think a separate plan class would be need in the app_config if one planned on running the Guppi along with non-vlars on a low core system and the configurations would have to be worked out. Rough idea would be along the lines

1 GPU on a dual core, single instance

1 GPU in a 4 core CPU probably not a problem

2 GPU in a 4 core, ok as long as not running multiple instance

2 GPU in a 8 core, probably ok

3 GPU in a 8 core, might be manageable but limited instances

3 GPU on a 12 core, mangable

4 GPU on a 12 core, won't recommend it other than single instance or limited

4 GPU on a 16 core, might be possible but limited instances

I didn't throw in 6 cores but they would be between the 2 and 3 GPU set ups.

But here's the good news, with each year we get better equipment that allows us to build upon these things. So who is to say what we can do in 1-2 years time.
ID: 1779380 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51445
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1779383 - Posted: 15 Apr 2016, 20:09:03 UTC - in response to Message 1779380.  
Last modified: 15 Apr 2016, 20:11:12 UTC

I am running vintage equipment. And I cannot afford to upgrade.
The kitty farm is what it is.
9 old rigs getting older by the day.
Every day I wake up and don't find one crashed is a good day.

If some think that it is OK to spend CPU cycles to support a weak GPU app, I am afraid I cannot agree.
The GPU apps should be better able to stand on their own with minimal CPU support.
That is why they are GPU apps.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1779383 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5516
Credit: 528,817,460
RAC: 242
United States
Message 1779394 - Posted: 15 Apr 2016, 20:35:16 UTC - in response to Message 1779383.  

It's the GBT data that is requiring such large amount of CPU usage.

On normal MB, the SoG uses very little CPU time.

Actually, it's faster than cuda on nonvlar MB.

But it's main purpose is those VLARs.

So to crunch Guppi on GPU or not to crunch Guppi on GPU that is the question, lol....
ID: 1779394 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6324
Credit: 106,370,077
RAC: 121
Russia
Message 1779405 - Posted: 15 Apr 2016, 21:08:03 UTC - in response to Message 1779394.  

Try with -use_sleep and increased sizes of PulseFind kernel (-sbs 512 for example)
ID: 1779405 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5516
Credit: 528,817,460
RAC: 242
United States
Message 1779418 - Posted: 15 Apr 2016, 21:59:13 UTC - in response to Message 1779371.  
Last modified: 15 Apr 2016, 21:59:30 UTC

So 3 per Card generates 36-37 minutes, about 1 minute faster than running a single Guppi by itself on the GPU

CPU utilization is down anywhere from 80-92% of a CPU rather than 97%

Only problem is, this require large number of CPU cores to make this happen.


I need to correct this.. I was using commandlines supplied by Mike at this point

When I ran this same experiment without commandlines, the time to complete were 46-47 minutes

Going to try Raistmer's recommendations now...
ID: 1779418 · Report as offensive
Bruce
Volunteer tester

Send message
Joined: 15 Mar 02
Posts: 123
Credit: 124,955,234
RAC: 11
United States
Message 1779433 - Posted: 15 Apr 2016, 22:53:46 UTC

I'm not seeing the same results that you are, but then I am using AMD cpus and not Intel.

The r3430_SoG seems to be running just like the r3401_SoG.
The shorties run pretty quick and use very little cpu resources. When you get past those and into the mid and lower AR it takes a full core per WU.

This seems to be the same problem that I have always had running APs.

I am still using the same command lines that I used for r3401, so may need to retune, but I don't expect any reduction in cpu usage.

Just tried a quick test using the sleep switch and it does not seem to work for me, still used a full core per.
Bruce
ID: 1779433 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5516
Credit: 528,817,460
RAC: 242
United States
Message 1779448 - Posted: 15 Apr 2016, 23:56:14 UTC - in response to Message 1779433.  

Hey Bruce,

Sorry I should have specified that I was talking about SoG for Nvidia.

I don't know how they do for ATI..

I'm restarting my test with the -use_sleep and nothing else and will progress over the evening to see how it does.
ID: 1779448 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6324
Credit: 106,370,077
RAC: 121
Russia
Message 1779543 - Posted: 16 Apr 2016, 6:04:55 UTC - in response to Message 1779448.  


I'm restarting my test with the -use_sleep and nothing else and will progress over the evening to see how it does.

-use_sleep can be used along with full tuning line.
ID: 1779543 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5516
Credit: 528,817,460
RAC: 242
United States
Message 1779545 - Posted: 16 Apr 2016, 6:22:47 UTC - in response to Message 1779543.  
Last modified: 16 Apr 2016, 6:28:45 UTC

Was just testing them to see how they all combine.

I've found that -use_sleep with -sbs 512 along with the command line Mike gave me works the best if I use the -use_sleep

16 minutes with 3-5% CPU usage running 1 work unit per card

at 3 work units per card

38 minutes average with 3% CPU along with -use_sleep -sbs 512 and command line

I'm going to try again 1, 2, and 3 at a time per card but without the -use_sleep

Tomorrow I will post result in a better format.

I've run into a problem this evening with over 12 errors, not sure why it occurred was only using -use_sleep and nothing else. Maybe a bad batch of work,not sure. Will have to wait and see if wingmen also error out or if they complete the work units.

Edit...

Just check those that error, some of the wingman also errored out, only arm wingmen seemed to have completed them but in very very short time. So I have to doubt those results.
ID: 1779545 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5516
Credit: 528,817,460
RAC: 242
United States
Message 1779548 - Posted: 16 Apr 2016, 6:44:08 UTC
Last modified: 16 Apr 2016, 6:44:59 UTC

Looks like a bad batch of GBT on beta...

They start off normal then quickly CPU usage does all the way down to below 1%...and I'm not using -use_sleep on these so they should be using close to 97% of a core each...

I've seen this happen with all the ones that have errored out tonight.

At first I thought it was the -use_sleep but I removed it and restarted the machine.

But the errors continue to happen, plus I can see some of my wingmen are erroring out too.

Anyone else seeing these?

I'm not currently crunching any GBT CPU work units on Main so don't know if the same thing is happening here...

Edit...

Going to run out of work soon on Beta due to the high number of errors, restricted number of work units downloaded.
ID: 1779548 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1849
Credit: 268,616,081
RAC: 1,349
United States
Message 1779559 - Posted: 16 Apr 2016, 7:28:05 UTC - in response to Message 1779548.  

Looks like a bad batch of GBT on beta...


Anyone else seeing these?

I'm not currently crunching any GBT CPU work units on Main so don't know if the same thing is happening here...


On Beta, 10 of ~150 GPU WUs (both SAH & SOG) errored out with
"ERROR: Possible wrong computation state on GPU, host needs reboot or maintenance"
the rest validated OK. Duplicated this on at least 2 of 4 machines crunching beta.
Also saw some errors like that on GBT work done on main (CPU). Guess it's a GUPPI issue, not just GPU or CPU.
Vanilla setup here, no special command line info going on.
ID: 1779559 · Report as offensive
Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · 15 · 16 . . . 21 · Next

Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows


 
©2022 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.