Posts by Raistmer


log in
1) Message boards : Number crunching : Better sleep on Windows - new round (Message 1812196)
Posted 1 hour ago by Profile Raistmer
It seems we have quite different usage cases hence different outcome.
As I understood you have multithreaded app.
What differencies it implies:
1) to make use of idle CPU cycles no need to switch context.
So, you always consider context switching as "absolute evil" and as overhead only.
In current SETI case we have 2 different processes: GPU-driving process and anothr CPU-only process.
Hence, to make use of idle CPU cycles I inevitably should switch to another process that is, to switch context. This diminishes advantage of STT versus Sleep.
2) In current build there are places of few hundreds of us long where CPU could be idle but NV OpenCL runtime use it for polling and make it busy. I looking for possibility to sleep less than 1ms for these areas.
Unfortunately, STT can't be used for this task.
As prev test show on fully loaded CPU (that is, when GPU-driving process and CPU-only process share same CPU) STT yield to lower-priority CPU-only process (but Sleep(0) and Sleep(1) do too). But it yield remaining time slice. Cause time slice ~10ms, average remaining time slice will be ~5ms (and indeed I see ~4ms average sleep quantum in this case). Obviously, it's too much for ~300us GPU kernel - GPU performance will drop much.
From the over side, Sleep(1)+increased mm timer precision allows stable 1ms-only (much better!) yield to another process. Of course, it is too much for 300us kernel too hence I use such sleep only where few ms kernel/kernel sequencies possible.
All this leaves STT just as Sleep(0) out of usable application in current app.

I will try to implement mm_pause in those places where sleep less than 1ms would be required. It will be topic for next experiment.
2) Message boards : Number crunching : Better sleep on Windows - new round (Message 1812171)
Posted 4 hours ago by Profile Raistmer
LoL, and while re-looking for typo seems I got an answer: STT doesn't switch context if no active thread on that particular CPU. Mike had idle core, so no other active threads, not slice give up and so on... Uh...

EDIT: if so, these results can provide estimation of context switching overhead!

0.0017884213-0.001445154 ~ 0.00034(ms)=34us - approx cost of single context switch for Mike's host :)

@Mike - could you repeat with ALL 8 cores busy please now.
3) Message boards : Number crunching : GUPPI Rescheduler for Linux and Windows - Move GUPPI work to CPU and non-GUPPI to GPU (Message 1812168)
Posted 4 hours ago by Profile Raistmer

I can read it -- it just seems wrong to talk about windows threading model in "GUPPI Rescheduler for Linux and Windows - Move GUPPI work to CPU and non-GUPPI to GPU"

http://setiathome.berkeley.edu/forum_thread.php?id=80173
4) Message boards : Number crunching : Better sleep on Windows - new round (Message 1812167)
Posted 4 hours ago by Profile Raistmer
To enable non-Lunatics members discussion possibility.

Here tests done so far:
http://lunatics.kwsn.info/index.php/topic,1812.msg61015.html#msg61015

What is strange regarding Sleep(0) vs SwitchToThread (STT) behavior:

r3500:class SleepQuantum: total=2.8579862, N=3, <>=0.95266207, min=0.93661302 max=0.97626472
Sleep0: class SleepQuantum: total=4.8358912, N=2704, <>=0.0017884213, min=0.00054984231 max=0.4228799
Sleep1: class SleepQuantum: total=2148.8459, N=1791, <>=1.1998023, min=0.86739361 max=3.0483601
STT: class SleepQuantum: total=3.9076965, N=2704, <>=0.001445154, min=0.0004952898 max=0.0027276319

Yep 7 cores were in use.


That shows the need of fixed amount sleep in case of underloaded CPU.
GPU app has bigger priority so, if some free CPU resource awailable, it will be scheduled for exection there.
What strange is no differencies in STT and Sleep(0) behavior. From what I read on main forums Sleep(0) should return to the same process immediately so just spin with full CPU busy while STT should give up CPU slice always. So, in SleepQuantum counter it should have bigger mean value (hard to imagine that with absolute most of 2704 occurencies process was exactly at the end of its current time slice). Nevertheless once can see VERY close mean times (<>) for Sleep(0) and STT. Strange. If so I don't see any advantage of STT at all :-\
[NB: Windows time slice ~10-15 ms and STT mean is 0.0014 ms]

5) Message boards : Number crunching : GUPPI Rescheduler for Linux and Windows - Move GUPPI work to CPU and non-GUPPI to GPU (Message 1812165)
Posted 4 hours ago by Profile Raistmer
@Raistmer - I tried to create an account on that forum but got "An Error Has Occurred! Sorry, registration is currently disabled."

Might be better to talk code over there than this thread.

Ok, I'll create separate thread.
6) Message boards : Number crunching : GUPPI Rescheduler for Linux and Windows - Move GUPPI work to CPU and non-GUPPI to GPU (Message 1812162)
Posted 4 hours ago by Profile Raistmer
I tried to create an account on that forum but got "An Error Has Occurred! Sorry, registration is currently disabled."

It's not possible to create account there AFAIK. But that thread should be visible to anonymous guests too, not? If not please report back I'll re-post elsewhere.


This is in the dev area so he couldn`t read it anyways.
He needs at least alpha tester status.

Nope. Discussion thread, I just checked with private anonymous mode under Chrome.
7) Message boards : Number crunching : GUPPI Rescheduler for Linux and Windows - Move GUPPI work to CPU and non-GUPPI to GPU (Message 1812158)
Posted 4 hours ago by Profile Raistmer
I tried to create an account on that forum but got "An Error Has Occurred! Sorry, registration is currently disabled."

It's not possible to create account there AFAIK. But that thread should be visible to anonymous guests too, not? If not please report back I'll re-post elsewhere.
8) Message boards : Number crunching : SETI@home v8.12 Windows GPU applications support thread (Message 1812153)
Posted 4 hours ago by Profile Raistmer
In my understanding interesting part is -tt 60 in this case (overflow) not revision number.
Because of longer kernel time signal logging is different because kernel now 60ms instead of 15ms.
One of the reasons we think validator needs adjustment for overflows.


And, of course, "ms" is the time measurement unit from real world. Amount of work different GPU models can do during similar time intervals is different.
As I said there is no sense to compare subsets!
Acquire full set first then do comparison. Or just waste of time occurs.
9) Message boards : Number crunching : SETI@home v8.12 Windows GPU applications support thread (Message 1812144)
Posted 4 hours ago by Profile Raistmer
How to acquire meaningful data from suspicious overflow:
1) download corresponding task for offline testing
2) edit corresponding task for signal restriction removal/increase (big increase!)
3) re-run edited task with reference CPU app
4) compare all results from that run with reported subset by app under investigation.
5) report if there are false positives, with considerable excess power over threshold in that subset (that is, no match for particular subset result versus full list of reported signals for task).

Now details:
3*, 5* :
<analysis_cfg>
<spike_thresh>24</spike_thresh>
<spikes_per_spectrum>1</spikes_per_spectrum>
<autocorr_thresh>17.8</autocorr_thresh>
<autocorr_per_spectrum>1</autocorr_per_spectrum>
<autocorr_fftlen>131072</autocorr_fftlen>
<gauss_null_chi_sq_thresh>2.43685937</gauss_null_chi_sq_thresh>
<gauss_chi_sq_thresh>1.41999996</gauss_chi_sq_thresh>
<gauss_power_thresh>3</gauss_power_thresh>
<gauss_peak_power_thresh>3.20000005</gauss_peak_power_thresh>
<gauss_pot_length>64</gauss_pot_length>
<pulse_thresh>19.7340908</pulse_thresh>
<pulse_display_thresh>0.5</pulse_display_thresh>
<pulse_max>40960</pulse_max>
<pulse_min>16</pulse_min>
<pulse_fft_max>8192</pulse_fft_max>
<pulse_pot_length>256</pulse_pot_length>
<triplet_thresh>9.73841</triplet_thresh>
<triplet_max>131072</triplet_max>
<triplet_min>16</triplet_min>
<triplet_pot_length>256</triplet_pot_length>
<pot_overlap_factor>0.5</pot_overlap_factor>
<pot_t_offset>1</pot_t_offset>
<pot_min_slew>0.00209999993</pot_min_slew>
<pot_max_slew>0.0104999999</pot_max_slew>
<chirp_resolution>0.333</chirp_resolution>
<analysis_fft_lengths>262136</analysis_fft_lengths>
<bsmooth_boxcar_length>8192</bsmooth_boxcar_length>
<bsmooth_chunk_size>32768</bsmooth_chunk_size>
<chirps>
<chirp_parameter_t>
<chirp_limit>3</chirp_limit>
<fft_len_flags>262136</fft_len_flags>
</chirp_parameter_t>
<chirp_parameter_t>
<chirp_limit>10</chirp_limit>
<fft_len_flags>65528</fft_len_flags>
</chirp_parameter_t>
</chirps>
<pulse_beams>1</pulse_beams>
<max_signals>30</max_signals>
<max_spikes>8</max_spikes>
<max_gaussians>0</max_gaussians>
<max_pulses>0</max_pulses>
<max_triplets>0</max_triplets>
<keyuniq>-7344129</keyuniq>
<credit_rate>2.8499999</credit_rate>
</analysis_cfg>

Points of interest in bold.

Also, my builds have extended ability regarding signals info in stderr.
Per ReadMe:
Levels from 2 to 5 reserved for increasing verbosity, higher levels reserved for specific usage.
-v 2 enables all signals output.

So, -v 2 will allow to follow "bests" formation through whole task processing.
10) Message boards : Number crunching : SETI@home v8.12 Windows GPU applications support thread (Message 1812142)
Posted 4 hours ago by Profile Raistmer

The interesting thing about this one is that NV SoG r3500 was invalid, while ATi SoG r3430 was valid.

What exact interesting here? The evidence that GPU is multiprocessor device w/o strong ordering indeed? This fact represented in any GPGPU review.
11) Message boards : Number crunching : SETI@home v8.12 Windows GPU applications support thread (Message 1812140)
Posted 4 hours ago by Profile Raistmer
Please keep overflowed results out of this thread until solid evidence of false positive or real signal omitting will be acquired.
I strongly refuse to spend time on any discussion of partial subsets reported.

FYI those enthusiasts who still don't know what "overflow" is:
SETI@Home Informational message -9 result_overflow in stderr means overflow.
12) Message boards : Number crunching : GUPPI Rescheduler for Linux and Windows - Move GUPPI work to CPU and non-GUPPI to GPU (Message 1812038)
Posted 10 hours ago by Profile Raistmer
Hm...
http://stackoverflow.com/questions/1383943/switchtothread-vs-sleep1
In general, Sleep(0) will be much more likely to yield a timeslice, and will ALWAYS yield to the OS, even if there are no other threads waiting. This is why adding a Sleep(0) in a loop will take the processor usage from 100% (per core) to near 0% in many cases. SwitchToThread will not, unless another thread is waiting for a time slice.


Sounds just diametrally different... definitely worth to try and see by yourself.


My experience has been that Sleep(0) won't yield to lower-priority threads but STT will; we had priority-inversion bugs as a result of using Sleep(0).

I'd try for(;;} { if(!SwitchToThread()) { CallMMPauseForAWhile(); } PollCUDA(); }

Hi, Shaggie76. Could you look at that thread please: http://lunatics.kwsn.info/index.php/topic,1812.msg61053.html#msg61053 - do you have any explanation of those results regarding Sleep(0) and STT behavior?
13) Message boards : Number crunching : SETI@home v8.12 Windows GPU applications support thread (Message 1811723)
Posted 1 day ago by Profile Raistmer

Does anything else need testing more than just a random choice among the other graphics boards?



http://setiathome.berkeley.edu/forum_thread.php?id=79765&postid=1809598
14) Message boards : Number crunching : I've Built a Couple OSX CUDA Apps... (Message 1811309)
Posted 2 days ago by Profile Raistmer
So I asked to exclude 15.6 too.
This will make plan class easier - only lower than 15.4 should be allowed.
15) Message boards : Number crunching : I've Built a Couple OSX CUDA Apps... (Message 1811003)
Posted 3 days ago by Profile Raistmer
Unfortunately, we stuck with OS X plan classes modifications for now:

It'll necessitate dividing the plan class, as was done with ATI. As such it will take bit of work.


Monitoring of OpenCL MB app performance on main shows that Darwin 15.4 and 15.5 versions (OS X 10.11.4 and 10.11.5 ) should be excluded from distribution of OpenCL NV application. Under these version app generates inconclusives in excess. Older versions of OS X works OK in this sense. There is no consensus about upcoming Darwin 15.6 yet.

Please do such corrections.

wbr


BTW, should I change restricted range for OS X versions or apps for the next round?
16) Message boards : Number crunching : SETI@home v8.12 Windows GPU applications support thread (Message 1810735)
Posted 4 days ago by Profile Raistmer
Isn't that the sanity check? I might be wrong but I thought that is what they determined it was.


. . I am pretty sure that is what Raistmer attributed it to.


There are problems with sanity checks in general, and IIRC with this one specifically, because the author has dissapeared, and the rationale isn't documented.

What issues you know with Spike sanity check?


I misspoke and withdraw that comment - Now the ones I recall involved autocorrelation power rather than spikes, and probably removed or changed already.

The problems with sanity checks in general, is there need to be assumptions made about telescopes and signal character, on top of what stock CPU assumes, which can change a lot now that multiple telescopes come in. That's fine if the checks are documented somewhere and not likely to need change. Example is now many of the searches are targeted, so certain signals are more likely.

Starting to notice particular issues with overclocked Cuda GPUs, and usually leave Boinc validation to sort it out, however that seems to be cracking under the strain in some situations (mostly broken hosts). Something where it's tempting to put such safeties in place, though probably will attempt some kindof CPU spot check instead.


There are maximum power restrictions for Spike coming from number of data points array has. This allowed range much less than what float number can represent.
Hence if power has value bigger than max theoretically possible it works just like CRC failure - that power can't be produced by correct computations and indicates random assignment to particular memory address.
That's how sanity check works.
17) Message boards : Number crunching : SETI@home v8.12 Windows GPU applications support thread (Message 1810729)
Posted 4 days ago by Profile Raistmer
SETI@home: Notice from BOINC
Task postponed: Suspicious spike results, host needs reboot or maintenance
8/19/2016 4:20:55 PM

If it's what I think it is, it relates to the SoG application- it was a bit aggressive in it's settings for determining if something was noisy or not.
I don't know when the stock application was last updated, but the current SoG application available through the Lunatics installer doesn't have that issue.

r3500 has autocorr sanity check disabled. And it was not SoG but ALL Lunatics apps but CUDA that lack such features still.
The issue with autocorr sanity check was because of no reasonable theoretic limit for signal power. Hence value was chosen arbitrary and failed with GBT data introduction.
AFAIK autocorr is the only search with such properties.
18) Message boards : Number crunching : SETI@home v8.12 Windows GPU applications support thread (Message 1810724)
Posted 4 days ago by Profile Raistmer
I got a message twice on one of my desktops, the one with the GTX 560:

SETI@home: Notice from BOINC
Task postponed: Suspicious spike results, host needs reboot or maintenance
8/19/2016 4:20:55 PM

I restarted Windows after the first one; no difference.

Note that there's no indication of which task, so I'm unable to check if it finished properly.

This GPU appears to be working properly for all other BOINC projects sending it GPU tasks.


Your config is broken.
Check GPU hardware or drivers.
19) Message boards : Number crunching : SETI@home v8.12 Windows GPU applications support thread (Message 1810723)
Posted 4 days ago by Profile Raistmer
Isn't that the sanity check? I might be wrong but I thought that is what they determined it was.


. . I am pretty sure that is what Raistmer attributed it to.


There are problems with sanity checks in general, and IIRC with this one specifically, because the author has dissapeared, and the rationale isn't documented.

What issues you know with Spike sanity check?
20) Message boards : Number crunching : SETI@home v8.12 Windows GPU applications support thread (Message 1810384)
Posted 5 days ago by Profile Raistmer
Raistmer, check Lunatics.

Thanks!


Next 20

Copyright © 2016 University of California