Monitoring inconclusive GBT validations and harvesting data for testing

Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 12 · 13 · 14 · 15 · 16 · 17 · 18 . . . 36 · Next

AuthorMessage
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1816151 - Posted: 10 Sep 2016, 15:18:28 UTC - in response to Message 1815886.  

Any suggestions?


Crashes *after* finish are a pure symptom of standard boincapi non threadsafe shutdown procedure (Cuda uses internal helper threads that should not be killed). If you're running into this using current boincapi, or similarly finished file present too long etc, despite multiple workarounds added recently, you'll probably see variable symptoms depending on current driver and Cuda versions, until such time I can generalise a custom Boincapi patch.

[Edit:] won't know about the stalls for a bit. Would have to poke at Petri's stream event chains, perhaps install some timeouts/restarts if some conditions can cause events to go AWOL.

FYI - Charlie Fenton make some minor checkins for the Mac BOINC Client code today. Just in case he's back for a while after his long absence, now might be a good time to contact him with any questions you might have about the stock application.
ID: 1816151 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1816279 - Posted: 11 Sep 2016, 2:56:53 UTC - in response to Message 1816127.  

Task blc5_2bit_guppi_57449_46749_HIP83043_0021.14243.831.18.27.242.vlar_2 exited with zero status but no 'finished' file

Yes, that is the task running on the 750Ti. It usually happens sometime before the 750Ti Stalls.
Oh well, I was able to apply the Pulsefind fix, and the Blocking Sync to zi3c, and it did pass the benchmark.
I suppose I could boot back to Darwin 15.4 and try it with Driver 8.0.29, 'cause it looks like it's going to Stall in 14.5 with driver 7.5.27.

Hi TBar,

I had to make a amall change into cudaAcceleration.cu to make the autocorr work with 8.0
      //cu_errf = cufftPlan1d(&cudaAutoCorr_plan, ac_fftlen*2, CUFFT_R2C, 8); // RFFT method, batch of 8
      int size = ac_fftlen*2;
      cu_errf =  cufftPlanMany(&cudaAutoCorr_plan, 1, &size, NULL, 0, 0, 0, 0, 0, CUFFT_R2C, 8);

The plan1d with batch is deprecated and does not work correctly in 8.0. The PlanMany works ok.
EDIT: no it does not. I have found one wu that still gives an ac error. That is good - now i can debug.

I made the changes to the modified zi3c in Darwin 14.5, updated to cufftPlanMany and added the print lines. After running for a little while the 750Ti Stalled once again, this time without the "exited with zero status but no 'finished' file" line. So far that line hasn't appeared. I checked the stderr_txt file for the stalled task and it appeared normal. It had found and printed two Spikes and then just stopped. I suspended the task and it was finished by a different GPU. Now, guess what the other GPU found after those two Spikes, it's here;
http://setiathome.berkeley.edu/result.php?resultid=5147954190
SETI@home using CUDA accelerated device GeForce GTX 750 Ti
Using unroll = 4 from command line args

setiathome v8 enhanced x41p_zi3x, Cuda 7.50 special
Modifications done by petri33. Compiled by TBar

Detected setiathome_enhanced_v8 task. Autocorrelations enabled, size 128k elements.
Work Unit Info:
...............
WU true angle range is :  0.415722
Sigma 3
Thread call stack limit is: 1k
Spike: peak=24.44856, time=73.82, d_freq=1420127661.26, chirp=4.6573, fft_len=128k
Spike: peak=24.40519, time=73.82, d_freq=1420127661.26, chirp=4.6583, fft_len=128k
bad arg: -bs
setiathome_CUDA: Found 3 CUDA device(s):
  Device 1: Graphics Device, 2047 MiB, regsPerBlock 65536
     computeCap 5.2, multiProcs 6 
     pciBusID = 1, pciSlotID = 0
  Device 2: GeForce GTX 750 Ti, 2047 MiB, regsPerBlock 65536
     computeCap 5.0, multiProcs 5 
     pciBusID = 2, pciSlotID = 0
  Device 3: Graphics Device, 2047 MiB, regsPerBlock 65536
     computeCap 5.2, multiProcs 6 
     pciBusID = 5, pciSlotID = 0
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
   Device 1: Graphics Device is okay
SETI@home using CUDA accelerated device Graphics Device
Using unroll = 4 from command line args
Restarted at 22.62 percent, with setiathome enhanced x41p_zi3x, Cuda 7.50 special
Detected setiathome_enhanced_v7 task. Autocorrelations enabled, size 128k elements.
Sigma 3
Thread call stack limit is: 1k
Autocorr: peak=18.53877, time=33.55, delay=2.9907, d_freq=1420126056.01, chirp=-26.736, fft_len=128k
Pulse: peak=9.092787, time=100.9, period=3.303, d_freq=1420131536.88, score=1.021, chirp=59.024, fft_len=1024 
Pulse: peak=9.06137, time=100.9, period=3.303, d_freq=1420131538.82, score=1.018, chirp=59.516, fft_len=1024 
cudaAcc_free() called...
cudaAcc_free() running...
cudaAcc_free() PulseFind freed...
cudaAcc_free() Gaussfit freed...
cudaAcc_free() AutoCorrelation freed...
1,2,3,4,5,6,7,8,9,10,10,11,12,cudaAcc_free() DONE.

Yes, it found Autocorr: peak=18.53877

So it would appear the 750Ti is Still Stalling on the Autocorr, even with cufftPlanMany and CUFFT_COMPATIBILITY_FFTW_PADDING. Since switching to cufftPlanMany, I haven't noticed any Extreme AC Peaks though. I suppose I could try compiling zi3h in Toolkit 7.5/Yosemite and see how that does. Just comparing the two, it seems the modified zi3c (aka zi3x) is a little faster than zi3h, at least on the BLC4 tasks it's run so far.
ID: 1816279 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1816287 - Posted: 11 Sep 2016, 4:03:47 UTC - in response to Message 1816151.  

Any suggestions?


Crashes *after* finish are a pure symptom of standard boincapi non threadsafe shutdown procedure (Cuda uses internal helper threads that should not be killed). If you're running into this using current boincapi, or similarly finished file present too long etc, despite multiple workarounds added recently, you'll probably see variable symptoms depending on current driver and Cuda versions, until such time I can generalise a custom Boincapi patch.

[Edit:] won't know about the stalls for a bit. Would have to poke at Petri's stream event chains, perhaps install some timeouts/restarts if some conditions can cause events to go AWOL.

FYI - Charlie Fenton make some minor checkins for the Mac BOINC Client code today. Just in case he's back for a while after his long absence, now might be a good time to contact him with any questions you might have about the stock application.


Will be looking further into the Mac situation from a general development perspective a little later today, with a view to getting the Mac portion of my new build system operational. From some digging yesterday, for CPU apps it looks as though the safest bet might be building against Apple's ~10.4-10.5 SDKs. Will trial something along those lines with stock cpu (as Tbar's been doing by building on older OS versions directly), and bounce Charlie an email if I can't figure out what SDK version he built against from the makefile or stock binary.

I'd expect whatever comes out of that, as far as widest compatibility for deployment, would apply to the client and libraries similarly. The manager may or may not run into issues depending on wxWidgets and other library dependencies.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1816287 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1816344 - Posted: 11 Sep 2016, 11:02:09 UTC

I suppose it's time to go back to cuda driver 8 for a while. The new build of zi3h isn't any better than zi3x with driver 7.5.x. Seems a little worse actually. The zi3x build started out with just a few 750Ti Stalls, then progressively became unusable. Some of the tasks didn't care if there was an Autocorr present or not, it would just stall. The zi3h build Stalled on the third task it ran, also indiscriminately;
http://setiathome.berkeley.edu/result.php?resultid=5148764525
SETI@home using CUDA accelerated device GeForce GTX 750 Ti
Using unroll = 4 from command line args

setiathome v8 enhanced x41p_zi3h, Cuda 7.50 special
Modifications done by petri33. Compiled by TBar

Detected setiathome_enhanced_v8 task. Autocorrelations enabled, size 128k elements.
Work Unit Info:
...............
WU true angle range is :  5.951709
Sigma 0
Thread call stack limit is: 1k
Spike: peak=24.48883, time=60.4, d_freq=1419598842.49, chirp=5.5511, fft_len=128k
Spike: peak=25.90003, time=60.4, d_freq=1419598842.49, chirp=5.5548, fft_len=128k
Spike: peak=26.03821, time=60.4, d_freq=1419598842.49, chirp=5.5585, fft_len=128k
Spike: peak=24.89074, time=60.4, d_freq=1419598842.49, chirp=5.5622, fft_len=128k
setiathome_CUDA: Found 3 CUDA device(s):
  Device 1: Graphics Device, 2047 MiB, regsPerBlock 65536
     computeCap 5.2, multiProcs 6 
     pciBusID = 5, pciSlotID = 0
  Device 2: GeForce GTX 750 Ti, 2047 MiB, regsPerBlock 65536
     computeCap 5.0, multiProcs 5 
     pciBusID = 2, pciSlotID = 0
  Device 3: Graphics Device, 2047 MiB, regsPerBlock 65536
     computeCap 5.2, multiProcs 6 
     pciBusID = 1, pciSlotID = 0
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
   Device 1: Graphics Device is okay
SETI@home using CUDA accelerated device Graphics Device
Using unroll = 4 from command line args
Restarted at 70.68 percent, with setiathome enhanced x41p_zi3h, Cuda 7.50 special
Detected setiathome_enhanced_v7 task. Autocorrelations enabled, size 128k elements.
Sigma 0
Thread call stack limit is: 1k
Spike: peak=25.31298, time=100.7, d_freq=1419600009.31, chirp=28.357, fft_len=128k
Spike: peak=24.55535, time=100.7, d_freq=1419600009.33, chirp=28.358, fft_len=128k
...

Interesting the 950s don't have this problem with the zi3 Apps.
So, I'll go back to Darwin 15.4 with driver 8 and run the zi3x App a while, zi3x did very well in Yosemite, except for all the 750Ti Stalling.
ID: 1816344 · Report as offensive
Kiska
Volunteer tester

Send message
Joined: 31 Mar 12
Posts: 302
Credit: 3,067,762
RAC: 0
Australia
Message 1816388 - Posted: 11 Sep 2016, 13:23:51 UTC
Last modified: 11 Sep 2016, 13:32:55 UTC

Tries compiling windows version.
Attempt 1: nvcc complains that it needs Visual Studio 2010, 2012 or 2013, I have VS 2015 community, why do I need an older version.
Attempt 2: nvcc still complains even though I have installed VS 2012 Pro and have done a restart.
Attempt 3: nvcc now complains that there are missing header files. *Sigh*

I give up on trying to compile a windows version


EDIT: Got no idea what IDE petri is using.
ID: 1816388 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1816407 - Posted: 11 Sep 2016, 14:25:50 UTC - in response to Message 1816388.  
Last modified: 11 Sep 2016, 14:26:07 UTC


EDIT: Got no idea what IDE petri is using.

He develops under Linux it seems. So hardly any VS IDE will be quite up to date in that project. You need to do some porting work.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1816407 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1816414 - Posted: 11 Sep 2016, 14:51:58 UTC - in response to Message 1816388.  
Last modified: 11 Sep 2016, 15:02:31 UTC

Tries compiling windows version.
Attempt 1: nvcc complains that it needs Visual Studio 2010, 2012 or 2013, I have VS 2015 community, why do I need an older version.
Attempt 2: nvcc still complains even though I have installed VS 2012 Pro and have done a restart.
Attempt 3: nvcc now complains that there are missing header files. *Sigh*

I give up on trying to compile a windows version


EDIT: Got no idea what IDE petri is using.


Which version of visual studio is supported by a given Cuda toolkit version depends on when it was made. Nv tends to maintain them only so far back, because of some elaborate rules files used in visual studio projects to build the Cuda files (with nvcc). That VS integration is pretty much changed/broken by m$ more or less every couple of versions.

Part of x42's design (in progress) is to move to a common buildsystem for Windows, Linux and Mac. That's mostly in light of that juggling Visual Studio versions on Windows to suit multiple toolkits, and other different frustrations cropping up on Mac + Linux due to deprecations there.

In general, the current setup is pretty involved, probably the most complex of the 3 main platforms, for Windows to make the full set of builds. The Gnu-tools Linux+Mac situation is pretty finicky for other reasons (e.g. for stock, the project would like to see older Linux version support, but Linux C libraries changed too much over time, so overrides to old versions are needed: Similar possible issues are cropping up for Mac)

So no, at this stage Windows builds are not simple, however fortunately Petri's input with optimisation for new hardware has freed me up a lot for thinking about modernising and streamlining the infrastructure for builds. Most of that will happen slowly while test 'special' alpha builds are in circulation (Looks like Petri & TBar are hard at work figuring out some of the issues there, so probably not far off viable alpha)
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1816414 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1816586 - Posted: 12 Sep 2016, 4:22:46 UTC

Those blasted AC Peaks. They are Still there even when going back to zi3c and applying minimum updates.
Best autocorr: peak=66513.18
Best autocorr: peak=130987.5
Best autocorr: peak=69153.47
etc...

Kinda makes you wonder where they came from.
Oh well, time to try out zi3h.
ID: 1816586 · Report as offensive
Kiska
Volunteer tester

Send message
Joined: 31 Mar 12
Posts: 302
Credit: 3,067,762
RAC: 0
Australia
Message 1816590 - Posted: 12 Sep 2016, 4:35:33 UTC - in response to Message 1816586.  
Last modified: 12 Sep 2016, 4:36:25 UTC

ID: 1816590 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1817075 - Posted: 14 Sep 2016, 10:29:55 UTC
Last modified: 14 Sep 2016, 10:31:30 UTC

Hi,

I found and fixed a bug concerning autocorrelations giving 'big' results.

One of the GPU memory buffers (dev_ac_partials) was used as in input and an output and depending of the parallel order of calculations sometimes a part of the input to another thread was already overwritten by another thread that completed a bit earlier.

It was hard to find since I know that parallel implementations are prone to this kind of error and I try to avoid using the same memory area for multiple purposes.

I'm testing zi3i now in beta and in main.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1817075 · Report as offensive
Kiska
Volunteer tester

Send message
Joined: 31 Mar 12
Posts: 302
Credit: 3,067,762
RAC: 0
Australia
Message 1817084 - Posted: 14 Sep 2016, 10:53:28 UTC - in response to Message 1817075.  

I do hope you can find a solution to the cache problem as well. Pascal has 24-48KB(Depending on GP100 or GP104) per SM, whilst Maxwell has 48KB per SM of l1 cache........
ID: 1817084 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1817086 - Posted: 14 Sep 2016, 11:06:20 UTC - in response to Message 1817084.  
Last modified: 14 Sep 2016, 11:07:56 UTC

Yeah that's where limits of the current app design, combined with Petri's hand optimisations get tricky. At this stage I'm expecting the new code to be 'fast enough' to the point that for wider consumption it may need some mild brakes on by default. Likely where architecture specific considerations, like that cache geometry one, come into play, is where a line will have to be drawn to dispatch internally (as opposed to multiple dedicated builds). Naturally that's the longer more boring part of the work, but will end up better come x42
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1817086 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1817088 - Posted: 14 Sep 2016, 11:24:58 UTC - in response to Message 1817084.  
Last modified: 14 Sep 2016, 11:25:46 UTC

I do hope you can find a solution to the cache problem as well. Pascal has 24-48KB(Depending on GP100 or GP104) per SM, whilst Maxwell has 48KB per SM of l1 cache........


The cache management is taking place already. The CUDA API allows for different kinds of reads and writes to be specified. WE have 3 major platforms that can use the current special version: Kepler, Maxwell and Pascal. They all may benefit a few percents from different kind of memory optimizations, but they all receive a 50% reduction (or more) in run time from what is implemented now.

Cache modes include Use L1 and L2, Use L2, Use none, Use streaming (mark ready to be discarded after use). They are well documented in the CUDA specification manuals that come with the devkit.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1817088 · Report as offensive
Profile -= Vyper =-
Volunteer tester
Avatar

Send message
Joined: 5 Sep 99
Posts: 1652
Credit: 1,065,191,981
RAC: 2,537
Sweden
Message 1818384 - Posted: 20 Sep 2016, 6:02:29 UTC
Last modified: 20 Sep 2016, 6:14:11 UTC

I've got alot of Those overflow units and it gets invalid at first on my big host.
Petri, do you Think your code differ in error management or priority until buffer overflow and sent to the validator if so i Think there is much to be had just to get the numbers sorted equally on the output of those Quick overflows.
Otherwise the app seems very solid when crunching real "non-overflowed" work a few invalids here and there but mostly the numbers (Tripplet, Spike, Pulse, Gaussian) etc are matching.
Just a hint to grab a few quickies and compare because they error out on regular code, SoG code so there is an anomalie there it seems.

If this issue with quickies get fixed i seriousky Think most of the irregularities would actually be solved. Perhaps it's just subroutine juggling that needs to be addressed in the correct order for the output to match for validator to grab.

Shaggie76: Could you write a script that gets the internal data and matches it to host number when a workunit falls into the inconclusive column?
What i want to accomplish is a database of what the inconclusive are matched to (cuda, SoG, IntelX86) and when clicked further the summary of pulses, spikes, etc etc compared to the other host. That could just be called statistics alpha gathering to see if the main numbers differ anywhere.
You could monitor my two Linux hosts and others using the new Code. Perhaps a percentage between validated and inconclusive also! Is this asking too much? :)

My 2 cents!

Example:

5165706276 8055485 19 Sep 2016, 13:13:35 UTC 19 Sep 2016, 14:13:36 UTC Completed, validation inconclusive 11.56 11.53 pending SETI@home v8 v8.00 (cuda32)
windows_intelx86
5165706277 8053171 19 Sep 2016, 13:13:44 UTC 20 Sep 2016, 5:06:44 UTC Completed, validation inconclusive 4.11 1.75 pending SETI@home v8
Anonymous platform (NVIDIA GPU)
5167724779 --- --- --- Unsent --- --- --- ---

5165608260 8053171 19 Sep 2016, 12:14:52 UTC 20 Sep 2016, 4:15:01 UTC Completed, validation inconclusive 4.22 1.79 pending SETI@home v8
Anonymous platform (NVIDIA GPU)
5165608261 7737824 19 Sep 2016, 12:14:52 UTC 19 Sep 2016, 12:19:59 UTC Completed, validation inconclusive 21.14 12.95 pending SETI@home v8 v8.12 (opencl_nvidia_SoG)
windows_intelx86
5167629550 --- --- --- Unsent --- --- --- ---

5165582012 8053171 19 Sep 2016, 11:59:22 UTC 20 Sep 2016, 4:15:01 UTC Completed, validation inconclusive 4.21 1.86 pending SETI@home v8
Anonymous platform (NVIDIA GPU)
5165582013 7740995 19 Sep 2016, 11:59:21 UTC 19 Sep 2016, 12:04:31 UTC Completed, validation inconclusive 13.76 10.52 pending SETI@home v8 v8.12 (opencl_ati5_cat132)
windows_intelx86
5167629376 --- --- --- Unsent --- --- --- ---

5150302343 7923287 11 Sep 2016, 13:31:35 UTC 19 Sep 2016, 8:33:13 UTC Completed, validation inconclusive 123.85 118.97 pending SETI@home v8 v8.12 (opencl_nvidia_SoG)
windows_intelx86
5150302344 7814899 11 Sep 2016, 13:31:37 UTC 12 Sep 2016, 7:26:21 UTC Completed, validation inconclusive 641.23 517.47 pending SETI@home v8
Anonymous platform (NVIDIA GPU)
5165739708 8053171 19 Sep 2016, 13:34:25 UTC 20 Sep 2016, 5:27:26 UTC Completed, validation inconclusive 41.50 16.48 pending SETI@home v8
Anonymous platform (NVIDIA GPU)
5167766155 --- --- --- Unsent --- --- --- ---

5165287979 8053171 19 Sep 2016, 8:59:20 UTC 20 Sep 2016, 2:16:02 UTC Completed, validation inconclusive 4.22 1.76 pending SETI@home v8
Anonymous platform (NVIDIA GPU)
5165287980 8096298 19 Sep 2016, 8:59:20 UTC 19 Sep 2016, 18:33:42 UTC Completed, validation inconclusive 22.57 21.66 pending SETI@home v8 v8.00
windows_intelx86
5167459854 --- --- --- Unsent --- --- --- ---

_________________________________________________________________________
Addicted to SETI crunching!
Founder of GPU Users Group
ID: 1818384 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1818399 - Posted: 20 Sep 2016, 7:39:53 UTC - in response to Message 1818384.  

1) The order of processing is different. The check for triplets, pulses, spikes, gaussians and autocorrelations is not done in the same order as in main version. Pulses tend to take longest on GPU, so I check them last. I see no problem in sending a 4 second task for rechecking with another host. The data is invalid anyway. I could store the findings and report them at the same order as main but that is not my priority right now.

There may be over 30 pulses, over 30 triplets, over 30 autocorrelations over 30 spikes in the same packet. Any of them can cause an overflow and some of them may have not been processed yet. Parallel execution is different from sequential.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1818399 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1818403 - Posted: 20 Sep 2016, 8:17:08 UTC - in response to Message 1818399.  


There may be over 30 pulses, over 30 triplets, over 30 autocorrelations over 30 spikes in the same packet. Any of them can cause an overflow and some of them may have not been processed yet. Parallel execution is different from sequential.

Also, there can be much more than 30 spikes in overflow for example. And cause many arrays processed at once per kernel call some ordering required not only between kernel calls but inside single kernel call too.

I made attempt to emulate serial order as much as possible w/o real sacrifices in performance. For example, first 50 icffts done with sync on each iterations with SoG. This allows to catch most of early overflows, but late overflows remain and give inconclusives time to time.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1818403 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1818405 - Posted: 20 Sep 2016, 8:44:12 UTC - in response to Message 1818399.  

I see no problem in sending a 4 second task for rechecking with another host.

Provided the new host's owner has a fast and free (no marginal cost) internet connection. Not all parts of the world have that.
ID: 1818405 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1818406 - Posted: 20 Sep 2016, 8:51:59 UTC - in response to Message 1818399.  
Last modified: 20 Sep 2016, 8:57:37 UTC

1) The order of processing is different. The check for triplets, pulses, spikes, gaussians and autocorrelations is not done in the same order as in main version. Pulses tend to take longest on GPU, so I check them last. I see no problem in sending a 4 second task for rechecking with another host. The data is invalid anyway. I could store the findings and report them at the same order as main but that is not my priority right now.

There may be over 30 pulses, over 30 triplets, over 30 autocorrelations over 30 spikes in the same packet. Any of them can cause an overflow and some of them may have not been processed yet. Parallel execution is different from sequential.

I hate to keep bringing up the same subject ;) but, the older p_zi doesn't seem to be having that problem with all these quick overflows. This machine is running the same p_zi that is posted at Crunchers Anonymous, Computer 7942417. Not only are they validating on the first try, the overall number of Inconclusives is very low for a 'Special' build. My two machines running the current 'Special' versions are listing most of these Overflows as Inconclusive. The biggest problem with the App at C.A. is it fails on most resumed tasks.
ID: 1818406 · Report as offensive
Profile -= Vyper =-
Volunteer tester
Avatar

Send message
Joined: 5 Sep 99
Posts: 1652
Credit: 1,065,191,981
RAC: 2,537
Sweden
Message 1818410 - Posted: 20 Sep 2016, 9:47:46 UTC - in response to Message 1818399.  
Last modified: 20 Sep 2016, 9:49:56 UTC

1) The order of processing is different. The check for triplets, pulses, spikes, gaussians and autocorrelations is not done in the same order as in main version. Pulses tend to take longest on GPU, so I check them last. I see no problem in sending a 4 second task for rechecking with another host. The data is invalid anyway. I could store the findings and report them at the same order as main but that is not my priority right now.

There may be over 30 pulses, over 30 triplets, over 30 autocorrelations over 30 spikes in the same packet. Any of them can cause an overflow and some of them may have not been processed yet. Parallel execution is different from sequential.


Hmm, i don't question your parallell thingy you've been doing, that's great and awesome! Progress is the key.
I'm more thinking that the validator server software is "stupid" written and don't sort the incoming data in terms of pulse strength, spike strength etc etc so it gets aligned and compared row-for-row.

If we take the main software (CPU) and it for instance reports on row 3 a result with a strength of 23.45 and then your application sends Another of 24.32 and then later on reports 23.45 in that Place i really Think the validator gets confused and your application gets a "inconclusive mark" even if we dig through all the results back in fact is real when sorted and matched.

In my World thats called a "false positive" but is incorrectly used here when it actually is legit and get a huge amount of high "inconclusive result" tags. The priority is ofcourse to iron out miscalculations etc but it tends to drown in a 12% inconclusive list but if the output is sorted correctly as the validator expects it it may could go down to 1-2% instead.

Much more easy to spot the real problem WUs intend of drowning in "false positives".

What do you others Think of it? Is it the way the validator server code actually works? Or does the validator itself do the sorting and rechecking when two Machines get a mismatch and then when the third gets along and sends it data and suddenly all Machines get a "valid result" and you are awarded the credit?!

Just trying to sort out the right thing to do to easy the load and Resources to all developers and codewizards now, if it is an quite easy fix to do then i actually suggest sorting and shuffling the calculated data to match the real application in the end Before sending the result back to S@H for validation :)

_________________________________________________________________________
Addicted to SETI crunching!
Founder of GPU Users Group
ID: 1818410 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1818422 - Posted: 20 Sep 2016, 10:29:17 UTC - in response to Message 1818410.  


I'm more thinking that the validator server software is "stupid" written and don't sort the incoming data in terms of pulse strength, spike strength etc etc so it gets aligned and compared row-for-row.

No, validator not so stupid.
If same results reported in different order it will handle them correctly.
On overflow case only subset of results reported so really different signals come at validator.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1818422 · Report as offensive
Previous · 1 . . . 12 · 13 · 14 · 15 · 16 · 17 · 18 . . . 36 · Next

Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.