Posts by Tutankhamon


log in
1) Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows (Message 1764155)
Posted 12 hours ago by Tutankhamon
OK, SoG beats CUDA50 on my 980 at least. No doubt about that any longer.
https://setiathome.berkeley.edu/results.php?hostid=7585453&offset=0&show_names=0&state=4&appid=29

No need to continue running MB only. I will now allow AP too.
2) Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows (Message 1763485)
Posted 3 days ago by Tutankhamon
Bam, bam, bam.

I'm back on main, to continue to punish my 980 Strix, with more SoG work.
Keep 'em coming....
https://setiathome.berkeley.edu/results.php?hostid=7585453&offset=0&show_names=0&state=4&appid=29
3) Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows (Message 1763258)
Posted 4 days ago by Tutankhamon
OK, Looks good so far. I will now spend 24 hours or so on Beta, with the same settings (although stock, so I may get a few of the other apps too.)

Results as of now:
https://setiathome.berkeley.edu/results.php?hostid=7585453&offset=0&show_names=0&state=4&appid=29
4) Message boards : Number crunching : Panic Mode On (102) Server Problems? (Message 1763052)
Posted 5 days ago by Tutankhamon
Yes, it's an enormous amount of VLARs out there now, and the AR's we get for our GPU's are mostly low ARs relatively close to being classified as VLARs.

Remind me again ........ what is the cutoff AR range that gets classified as .VLAR???? I saw some .09~ range tasks on the GPU's that awarded ~~160 or so credits. They were not tagged as VLAR. Took about twice as long to run than the typical .40~ AR range tasks ...about 24 minutes on my 970's doing .5 tasks each. I don't think I had ever seen tasks with that low an AR on the GPU's before.


From post: https://setiathome.berkeley.edu/forum_thread.php?id=77990&postid=1715144#1715144

"Work Units fall into 3 Angle Rate (AR) ranges - Very Low (VLAR, <0.12), Mid-Range (0.12 - 0.99) and Very High (VHAR, aka "Shorties", >1.0). Those numbers are approximate."
5) Message boards : Number crunching : Panic Mode On (102) Server Problems? (Message 1762996)
Posted 5 days ago by Tutankhamon
I'm not getting any GPU work, either for nVidia or ATI...

I am, but it's taking anywhere between 3-7 requests to get it.
There's been a lot of VLAR work around for a while now, but it looks like the percentage of it versus shorties/normal WUs has increased even further over the last couple of days.

Yes, it's an enormous amount of VLARs out there now, and the AR's we get for our GPU's are mostly low ARs relatively close to being classified as VLARs.

The mix of ARs we get now for GPU, is certainly not what we are used to get. That complicates my testing of the new OpenCL SoG app, comparing it to CUDA50. I'm almost out of hair to pull from my head by now :-)
6) Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows (Message 1762963)
Posted 5 days ago by Tutankhamon

I can live with 50% of a full core per task. Should I raise or lower the -use_sleep_ex 5, to achieve higher CPU usage?


Well, this number provided to Sleep() call and means number of milliseconds for that thread going to sleep.

I would not recommend to use -use_sleep_ex N in production w/o variation of N first to find sweet spot.

From other side -use_sleep uses Sleep(1) call so should be equal to -use_sleep_ex 1. Both do as many iterations as required to really complete kernel.

Why these 2 options and not just one:
1) Let suppose real time to complete processing is 6ms. Doing it with -use_sleep app will make 6 sleep iterations of 1ms long. From other side, doing it with -use_sleep_ex 5 app will make 2 iterations 5ms each so spend 10ms (!) in sleep.
2) Let suppose real time is 600ms. Properly tuned "ex" could reduce number of iterations (and hence CPU overhead) considerably.
3) Unfortunately, it's simplified picture cause under Windows app will not sleep 5ms or 1ms even if told to do that. Real time will be very different, I spent much time studying that. So, though app has ability (-v 6) to show how many iterations particular wait did it's impossible just to make run with -use_sleep -v 6, and then set -use_sleep_ex N to that number of iterations from stderr. Experimentation required with N number.

And last remark: would be interesting to see how -use_sleep_ex 0 behaves with high-performance GPU. Quite possible that just yelding control w/o any sleep time will be enough to reduce CPU usage w/o too much GPU slowdown.

Oiiii!!!!

Too complicated for an old man Raistmer...Geeze that is geek talk, something not understandable for someone who is 60. I feel as if I have Alzheimers Light everytime I see such explanations. My jaw just dropped down to the level of my keyboard. :-)

Anyhow, I have tried -use_sleep_ex with several settings from 1-10, and there seems to be no way to get me to 50% CPU usage per core/task. They all drop it down to around 30%, same as the simpler -use_sleep does. I see no difference at all between the simpler -use_sleep and -use_sleep_ex X

I also tried as you were interested in -use_sleep_ex 0, and that does nothing at all for the CPU usage. It's 98-99% of a full core per task, same as without any sleep settings at all.

I'm now running 4 tasks at a time, to make a valid comparison to CUDA50, where I also did run 4 tasks at a time. 3 tasks at a time didn't load the 980 to 100% on CUDA, and it doesn't on this app either, despite using the command line settings for high end cards. The 980 Strix, seems to need lots of punishment, to be loaded to almost 100%.

In total, 4 task at a time was faster on CUDA, than 3 at a time, so that's why I ended up running 4 at a time really. Let's see where this app gets in comparison. If it is faster than CUDA, then I could live with giving it a full core per task. Now I run only 2 CPU tasks too, so the CPU doesn't get too strained either, and it's water cooled also.

Also, this almost 100% CPU usage per task, only happens with lower AR's, and usually when running 4 at a time, there's always at least one with higher AR in the mix.
7) Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows (Message 1762936)
Posted 5 days ago by Tutankhamon
Question:

Using -use_sleep_ex 5 on low or normal AR WU's, brings down the CPU usage from at most a full core per task, to approx 30%. However the run times becomes not so good, compared to CUDA50.

I can live with 50% of a full core per task. Should I raise or lower the -use_sleep_ex 5, to achieve higher CPU usage?
8) Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows (Message 1762778)
Posted 6 days ago by Tutankhamon
Further comments:

It's quite clear now, that the AR's of the WU's we use to test new apps on Beta, is not representative of the mix of different AR's we will meet here on main. There's a need for a better mix of AR's on Beta, that's for sure.

The results we get on Beta for the new apps, is in no way the results we will get for those apps here on main. That much is totally clear by now.

Anyhow, I'll continue with the SoG, until I can say whether or not it crunches V8 MB's faster or slower than CUDA50 on my GTX980 Strix, on the long run. So far, it's unclear, mostly because the SoG reacts not so good to "normal" AR's, and lower AR's, compared to how CUDA50 reacts to them. AR's between 2.xx and up to (so far unknown AR's), is where SoG really shines bright.

And that ends the comment from the Swedish jury, for now. :-)
9) Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows (Message 1762747)
Posted 6 days ago by Tutankhamon
Well, had a chance to look at some of these processed. They are now slower than Cuda here on main. Also seeing unusually high kernal usage. Within the last 20% of the analysis, kernal activity spikes, all CPUs go to 100%. I had been using a command line but removed it when it appears to be actually hampering the work, so now it's just running stock 3 at a time.

Well YMMV of course. I can not say that this app is slower than CUDA, on the contrary, for me it is much faster than CUDA50.

But we'll see. I'll let it run as it does for now. I know how my production rate on CUDA looked like (per day), so I will pretty fast be able to compare with this app.
10) Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows (Message 1762732)
Posted 6 days ago by Tutankhamon
Definitely using a lot more CPU than Beta, also seems to be taking longer to process.

Yeah, but then the WU's we got on Beta, was consistently over 2 on the AR. Not one was a "normal" AR.

Here, we see all kinds of AR's. Just now I'm crunching a bunch with an extreme AR of over 51, yes 51 :-) Those behave strange on the app, not any high CPU usage, but the progress is iffy to say the least (in the Boinc manager, the progress % doesn't even move until it suddenly jumps to 100% for these extreme WU's, but the progress indicator in BoincTasks work for these too), but they're done in 9-10 minutes, running 3 at a time. Too high to be fast, or something.

Example of one of those "crazy" WU's:
WU true angle range is : 51.249186
11) Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows (Message 1762696)
Posted 6 days ago by Tutankhamon
Also would be interesting to check how it responds to -cpu_lock.
OpenCL NV quite uncharted area and what we know on ATi side not always directly applicable here.

You want me to add -cpu_lock to the command line?


Just as part of app parameter space exploration, later, when you establish some baseline impression how it behaves on different ARs. Baseline required to have smth to compare with. Then such things like -use_sleep and/or -cpu_lock and -sbs N variations can be tested.

OK, I'll keep it running as it is now.
12) Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows (Message 1762693)
Posted 6 days ago by Tutankhamon
Also would be interesting to check how it responds to -cpu_lock.
OpenCL NV quite uncharted area and what we know on ATi side not always directly applicable here.

You want me to add -cpu_lock to the command line?
13) Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows (Message 1762688)
Posted 6 days ago by Tutankhamon

Thats why i suggested -use_sleep_ex 5.
Shouldn`t be much slower running 3 instances but reduces CPU usage at least a little bit.

Running benches atm.

Thanks for the suggestions Mike, always appreciated. I'll keep it in mind, if I get really bothered about the CPU usage at some AR's.
14) Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows (Message 1762685)
Posted 6 days ago by Tutankhamon
Very high CPU usage for WU's other than High AR's. Almost a full core, for AR's other than VHAR's where the CPU usage is 8-10% only.

Since the WU's I tested this with on BETA, was all above 2.something in AR, the low CPU usage was what surprised me most. However here on main, with mostly lower AR's the high CPU usage really shows.

Thanks Dog, that we do not get VLAR's for CPU here, or even this GTX980 would come to a screeching halt :-)

EDIT: But SoG is fast, scaringly fast. So I can live with high CPU usage, just dropping a CPU core from CPU crunching.

Geeze....


You can try _use_sleep or -use_sleep_ex 5 to reduce CPU usage.
But i suggst to use this only running multiple instances.

Well, running 3 at a time is indeed multiple instances. However, I'll wait and see if I can live with this, because by using -use_sleep, this app will not be any faster than CUDA50.
15) Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows (Message 1762681)
Posted 6 days ago by Tutankhamon
Very high CPU usage for WU's other than High AR's. Almost a full core, for AR's other than VHAR's where the CPU usage is 8-10% only.

Since the WU's I tested this with on BETA, was all above 2.something in AR, the low CPU usage was what surprised me most. However here on main, with mostly lower AR's the high CPU usage really shows.

Thanks Dog, that we do not get VLAR's for CPU here, or even this GTX980 would come to a screeching halt :-)

EDIT: But SoG is fast, scaringly fast. So I can live with high CPU usage, just dropping a CPU core from CPU crunching.

Geeze....
16) Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows (Message 1762678)
Posted 6 days ago by Tutankhamon
SoG up and running with the <plan_class>opencl_nvidia_SoG</plan_class>

Let's burn a GTX980 :-)
17) Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows (Message 1762668)
Posted 6 days ago by Tutankhamon
Tut are you getting work for that OpenCL SOG?

Haven't switched yet. Will finish a few more CUDA's in the cache first. Will switch in about 20 minutes.
18) Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows (Message 1762666)
Posted 6 days ago by Tutankhamon

-sbs 256

Did you reconsider omitting this one?

Well, I will try it here at first. Then maybe delete it. I think I need to do many more WU's with -sbs 256, before I can say for sure, whether or not it is faster or slower than the default setting.


Also try -sbs 192 and 384.

Should be faster on your GPU.

Thanks Mike. I'll do like this. I will start with -sbs 192, and let that run for a while, then work myself up to 256, and then to 384.

(That is, if my GTX980Strix hasn't gone up in smoke before I reach 384 LOL)
19) Message boards : Number crunching : Panic Mode On (102) Server Problems? (Message 1762659)
Posted 6 days ago by Tutankhamon
Still getting no tasks available from Beta.

Beta splitters are not running.

https://setiweb.ssl.berkeley.edu/beta/server_status.php
20) Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows (Message 1762653)
Posted 6 days ago by Tutankhamon

-sbs 256

Did you reconsider omitting this one?

Well, I will try it here at first. Then maybe delete it. I think I need to do many more WU's with -sbs 256, before I can say for sure, whether or not it is faster or slower than the default setting.


Next 20

Copyright © 2016 University of California