Invalid, due to difference of opinion.

Message boards : Number crunching : Invalid, due to difference of opinion.
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1673276 - Posted: 3 May 2015, 4:45:51 UTC

WU
result marked as Invalid for me (CUDA50), validated for the others (OpenCL ATI).
I came up with 4 triplets, they came up with 30 Autocorrelation counts.

Luckily 0.28 isn't a great loss, even with the poor pay rate of Credit New.
Grant
Darwin NT
ID: 1673276 · Report as offensive
Rasputin42
Volunteer tester

Send message
Joined: 25 Jul 08
Posts: 412
Credit: 5,834,661
RAC: 0
United States
Message 1673284 - Posted: 3 May 2015, 6:00:14 UTC

Hi,
I had a wu marked "invalid".
When i checked it, one wingman had finished it and a third still calculating.
If a WU get two different results, why is it marked invalid BEFORE the third one confirms, which is right?

http://setiathome.berkeley.edu/result.php?resultid=4115671844
ID: 1673284 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1673304 - Posted: 3 May 2015, 7:54:57 UTC - in response to Message 1673284.  

If a WU get two different results ...

http://setiathome.berkeley.edu/workunit.php?wuid=1773860844

They look the same according to stderr_txt (Spike count: 3)

I may only guess that your result file (which is a different file than stderr.txt) had some garbage or was truncated (missing some lines/bytes at the end).

Result files have names like:
26no12ag.13645.20272.438086664198.12.68_1_0
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1673304 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1673309 - Posted: 3 May 2015, 8:06:57 UTC - in response to Message 1673276.  
Last modified: 3 May 2015, 8:15:21 UTC

WU
result marked as Invalid for me (CUDA50), validated for the others (OpenCL ATI).
I came up with 4 triplets, they came up with 30 Autocorrelation counts.

Luckily 0.28 isn't a great loss, even with the poor pay rate of Credit New.

http://setiathome.berkeley.edu/workunit.php?wuid=1777150544
Worth to get task for checking with CPU.
http://boinc2.ssl.berkeley.edu/sah/download_fanout/291/21jn12aa.27468.9985.438086664200.12.14
(unfortunately, task removed already)
ID: 1673309 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1673313 - Posted: 3 May 2015, 8:37:26 UTC - in response to Message 1673309.  

Since both 'other' computers:
http://setiathome.berkeley.edu/results.php?hostid=6158143&offset=0&show_names=0&state=5&appid=
http://setiathome.berkeley.edu/results.php?hostid=6775319&offset=0&show_names=0&state=5&appid=

... have many 'Invalid' tasks and the 'Valid' GPU tasks have the same short 'Run time' of < 1 minute you can guess which result was OK ;)
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1673313 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1673318 - Posted: 3 May 2015, 10:41:20 UTC - in response to Message 1673313.  

Since both 'other' computers:
http://setiathome.berkeley.edu/results.php?hostid=6158143&offset=0&show_names=0&state=5&appid=
http://setiathome.berkeley.edu/results.php?hostid=6775319&offset=0&show_names=0&state=5&appid=

... have many 'Invalid' tasks and the 'Valid' GPU tasks have the same short 'Run time' of < 1 minute you can guess which result was OK ;)

Thanks for observation.

Also, in initial result both ATi devices are from the same model: BeaverCreek

Worth to check then, if those devices tend to produce invalids or it was just such unfortunate coincidence.

Situation when false positives validate agains each other not new. We saw the same with NV GPUs for CUDA already for Spikes if I recall correctly. Seems the same detected with ATi GPUs and Autocorrelation now. Worth to spot what could act as sign of issue for sanity check to disallow such false positives reach validator.
ID: 1673318 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1673319 - Posted: 3 May 2015, 10:51:20 UTC
Last modified: 3 May 2015, 10:54:03 UTC

Also, this example is good illustration of how current quota management system acts. IMHO, it acts incorrectly. Host has many invalids being checked versus different devices. But once a time unfortunate event happens - it matched versus similarly broken device. And what happens?: false result validates that immediately gives great boost to host's daily quota. And that, in turn, rises chances to find similarly broken host again. Hence, quota system allows positive feedback loop here that acts like invalid results amplifier.

IMO worth to approach BOINC devs with idea of additional measures to ensure device variety. Currently we have such variety on host ID level only. Same host can't get tasks from same WU. But in case of broken device model such devices can reside in different hosts. Maybe worth to add variety on device level too (as usual we come to conclusion that basic entity BOINC should manage is device, not host. Conclusion that made over 10 years ago...).
ID: 1673319 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1673323 - Posted: 3 May 2015, 11:08:37 UTC - in response to Message 1673319.  

Also, this example is good illustration of how current quota management system acts. IMHO, it acts incorrectly. Host has many invalids being checked versus different devices. But once a time unfortunate event happens - it matched versus similarly broken device. And what happens?: false result validates that immediately gives great boost to host's daily quota. And that, in turn, rises chances to find similarly broken host again. Hence, quota system allows positive feedback loop here that acts like invalid results amplifier.

IMO worth to approach BOINC devs with idea of additional measures to ensure device variety. Currently we have such variety on host ID level only. Same host can't get tasks from same WU. But in case of broken device model such devices can reside in different hosts. Maybe worth to add variety on device level too (as usual we come to conclusion that basic entity BOINC should manage is device, not host. Conclusion that made over 10 years ago...).

BOINC (at the server/administration level) already has two features available: Homogeneous Redundancy and Homogeneous App Version. What you're suggesting is effectively the inverse of those - a sort of anti-homogeneous redundancy, or enforced divergency: that might well work for this project, which does indeed have a wide range of divergent hardware and software available for processing. But just as with Einstein's Locality Scheduling, I'd be worried that the extra decision-making in the scheduler would add to the server workloads, and possibly delay work allocation until an allowable host requested work if, say, a malformed workunit caused a succession of errors.

It's an interesting idea, and by all means float it past Eric and the rest of the BOINC development team, but be prepared to consider the response that it might cause more problems than it solves.
ID: 1673323 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1673329 - Posted: 3 May 2015, 12:09:39 UTC - in response to Message 1673323.  
Last modified: 3 May 2015, 12:21:34 UTC


BOINC (at the server/administration level) already has two features available: Homogeneous Redundancy and


0
No homogeneous redundancy (all hosts are numerically equivalent)
[b]1
A fine-grained classification with 80 classes (4 OS and 20 CPU types).
[/b]2
A coarse-grained classification in which there are 4 classes: Windows, Linux, Mac-PPC and Mac-Intel.


As one can see, no mention of GPU (!) And GPU diversity is much bigger than any CPU ones. That makes this feature quite inefficient at least in its current form.
What I propose is quite opposite direction indeed but I don't see how it could increase scheduler logic and overhead too much over this particular option.
For this option to work one needs to classify hosts by devices, then chose hosts with proper devices (in this case, similar ones).
For my proposal same classification by devices required, here both are identical. Then instead of similar one need dissimilar ones. Same expenses.
Moreover, to take best of 2 approaches one could leave coarse grained similarity approach (to ensure numerical similarity) but required different devices to reduce chances to get particular model-induced issues. But that would require more fine graining that could increase overhead indeed.


Homogeneous App Version.

Again this could work with addition of diversity by device inside same app class (inside same GPU vendor) to reduce numerical divergency.

But, AFAIK SETI project has not enabled both these options. So we have all that numerical diversity (and time to time suffer from it via increased inconclusive rate).
Hence, as first approach we could just get something similar to one of those options but enabled for GPU (and with opposite sign)(hence perhaps it should base on secondary one).
Also, no need to think in absolute terms. If server can't find appropriate combo it could just send to any. But preferably send with such criteria. This will not worsen task allocation and queues.
And of course just as any policy it should be made switchable. If project needs it it uses it. It's not for hardwired default approach of course.



It's an interesting idea, and by all means float it past Eric and the rest of the BOINC development team, but be prepared to consider the response that it might cause more problems than it solves.

Any volunteers to represent it and promote to BOINC team review?
ID: 1673329 · Report as offensive
Cavalary

Send message
Joined: 15 Jul 99
Posts: 104
Credit: 7,507,548
RAC: 38
Romania
Message 1673676 - Posted: 3 May 2015, 23:57:49 UTC
Last modified: 3 May 2015, 23:58:47 UTC

Been wondering about this actually, whenever I checked my results and saw an inconclusive due to wingman's 30 spikes on a GPU that outputs loads of invalids of that type, so what if the third would be the same. Don't recall seeing 30 autocorrs, but do lately see a fair number of inconclusives due to 1-2 autocorrs (or 1-2 more) on a GPU, though in those cases all validate after the third since that seems to be the only difference.

So yes, Raistmer's idea seems very good, and I say try to pair GPU and CPU if possible, then if not and since GPUs can crunch so much more, at least different chipset makers, possibly different generations too. Oh, and likely different driver versions even before generations.
ID: 1673676 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11361
Credit: 29,581,041
RAC: 66
United States
Message 1673678 - Posted: 4 May 2015, 0:22:24 UTC - in response to Message 1673676.  

Good ideas but due the cost of the man power and machinery I don't see it happening, keep in mind this project is big science on a small budget.
ID: 1673678 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1673684 - Posted: 4 May 2015, 1:02:45 UTC - in response to Message 1673276.  

WU
result marked as Invalid for me (CUDA50), validated for the others (OpenCL ATI).
I came up with 4 triplets, they came up with 30 Autocorrelation counts.

Luckily 0.28 isn't a great loss, even with the poor pay rate of Credit New.

This might be the same problem I brought up over 15 months ago in Two wrongs make a right. At one point I had a list of about a dozen of these ATI rigs that were validating against each other with Autocorr counts of 30 that were clearly bogus, driving out legitimate results. Since there didn't seem to be any interest in fixing the problem, I gave up trying to track them. I think most of those have either been cleaned up or gone away by now, but new ones like these two have probably come along.
ID: 1673684 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1673686 - Posted: 4 May 2015, 1:28:29 UTC - in response to Message 1673684.  
Last modified: 4 May 2015, 1:28:41 UTC

This might be the same problem I brought up over 15 months ago in Two wrongs make a right.

I reckon it is; I remembered seeing a post about something similar but couldn't remember just what or when it was. And I didn't realise it was made so long ago; I though it was only a few months a go.
Grant
Darwin NT
ID: 1673686 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1673762 - Posted: 4 May 2015, 8:27:24 UTC - in response to Message 1673684.  
Last modified: 4 May 2015, 8:31:39 UTC

WU
result marked as Invalid for me (CUDA50), validated for the others (OpenCL ATI).
I came up with 4 triplets, they came up with 30 Autocorrelation counts.

Luckily 0.28 isn't a great loss, even with the poor pay rate of Credit New.

This might be the same problem I brought up over 15 months ago in Two wrongs make a right. At one point I had a list of about a dozen of these ATI rigs that were validating against each other with Autocorr counts of 30 that were clearly bogus, driving out legitimate results. Since there didn't seem to be any interest in fixing the problem, I gave up trying to track them. I think most of those have either been cleaned up or gone away by now, but new ones like these two have probably come along.


Actually, your highlight of that issue along with similar reports for iGPUs ultimatelly led to development additional check inside app itself to prevent false results even reach validator. This bunch of checks we internally call as sanity checks. Some sanity checks were implemented both for MultiBeam and AstroPulse with main constrains developed by Josef. So, not useless.

Another side of issue that not all such results can be easely spotted and marked as invalids programatically. Human can look other host results and on whole set of data decide what is right and what is not. Application should decide during task computations. Hence I asked to spot any signs that can distinguish false positives of such kind from regular overflow (overflow itself can't be enough).
ID: 1673762 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1673825 - Posted: 4 May 2015, 15:27:20 UTC - in response to Message 1673762.  

WU
result marked as Invalid for me (CUDA50), validated for the others (OpenCL ATI).
I came up with 4 triplets, they came up with 30 Autocorrelation counts.

Luckily 0.28 isn't a great loss, even with the poor pay rate of Credit New.

This might be the same problem I brought up over 15 months ago in Two wrongs make a right. At one point I had a list of about a dozen of these ATI rigs that were validating against each other with Autocorr counts of 30 that were clearly bogus, driving out legitimate results. Since there didn't seem to be any interest in fixing the problem, I gave up trying to track them. I think most of those have either been cleaned up or gone away by now, but new ones like these two have probably come along.


Actually, your highlight of that issue along with similar reports for iGPUs ultimatelly led to development additional check inside app itself to prevent false results even reach validator. This bunch of checks we internally call as sanity checks. Some sanity checks were implemented both for MultiBeam and AstroPulse with main constrains developed by Josef. So, not useless.

Another side of issue that not all such results can be easely spotted and marked as invalids programatically. Human can look other host results and on whole set of data decide what is right and what is not. Application should decide during task computations. Hence I asked to spot any signs that can distinguish false positives of such kind from regular overflow (overflow itself can't be enough).

Those sanity checks were completed at about rev 2421 last June, so in line with the Official Seti v7 binary vs. Optimized one thread it would be good to get the release builds updated past their current rev 1831. First step will have to be getting the Beta splitter configuration fixed so the pfb_splitters used there consistently produce WUs with the intended analysis_cfg parameters.

When I first saw this thread yesterday the task details had already been purged. If any one managed to capture those I'd be very interested in the signals as shown in the stderr sections of the OpenCL ATi tasks. Those would reveal whether the Autocorr sanity check would have been effective for this specific case. Assuming nobody saved those details, consider this a request to do so in future similar cases.
                                                                  Joe
ID: 1673825 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1673853 - Posted: 4 May 2015, 16:40:15 UTC - in response to Message 1673825.  

When I first saw this thread yesterday the task details had already been purged. If any one managed to capture those I'd be very interested in the signals as shown in the stderr sections of the OpenCL ATi tasks. Those would reveal whether the Autocorr sanity check would have been effective for this specific case. Assuming nobody saved those details, consider this a request to do so in future similar cases.
                                                                  Joe

Okay, I found my old list and quickly found a host, 6772486, that still appears to be doing its damage. The first example I see is WU 1776725339, which has two overflow tasks (29 Autocorr, 1 triplet) from consistently bad ATI rigs causing a non-overflow task (3 spikes, 4 pulses, 3 triplets) from a normally clean rig to be marked Invalid. If I take the time, I can probably find many more.
ID: 1673853 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1674017 - Posted: 5 May 2015, 3:10:38 UTC - in response to Message 1673853.  

When I first saw this thread yesterday the task details had already been purged. If any one managed to capture those I'd be very interested in the signals as shown in the stderr sections of the OpenCL ATi tasks. Those would reveal whether the Autocorr sanity check would have been effective for this specific case. Assuming nobody saved those details, consider this a request to do so in future similar cases.
                                                                  Joe

Okay, I found my old list and quickly found a host, 6772486, that still appears to be doing its damage. The first example I see is WU 1776725339, which has two overflow tasks (29 Autocorr, 1 triplet) from consistently bad ATI rigs causing a non-overflow task (3 spikes, 4 pulses, 3 triplets) from a normally clean rig to be marked Invalid. If I take the time, I can probably find many more.

Yes, hosts 6772486 and 7278254 are definitely producing false overflows on Autocorrs which would be caught by the added sanity check. Also host 6320677 which I found while looking at results from the other two.

The Autocorr sanity check for OpenCL builds after rev 2421 is an overflow with one or more Autocorr peaks above 100. Given that pair of conditions the task will be errored out. It's done that way because I did spot one case where a single Autocorr peak well above 100 was found by reliable hosts on a non-overflow task, one running a CPU build IIRC.
                                                                  Joe
ID: 1674017 · Report as offensive
Profile cliff
Avatar

Send message
Joined: 16 Dec 07
Posts: 625
Credit: 3,590,440
RAC: 0
United Kingdom
Message 1674018 - Posted: 5 May 2015, 3:21:25 UTC - in response to Message 1673676.  

Hi Cavalary,


So yes, Raistmer's idea seems very good, and I say try to pair GPU and CPU if possible, then if not and since GPUs can crunch so much more, at least different chipset makers, possibly different generations too. Oh, and likely different driver versions even before generations.


Bear in mind that in the case of Nvidia GPU's if a user has a modern/recent model GPU and a much older one in their rig, they can only install the drivers for the new model. And cannot revert to an older driver.

So it may well be that NV users will pretty much have 3xx.xx series drivers installed.

What the score is for ATI GPU's I have no idea, never used one.
So for NV at least there probably isn't much driver diversity.

Regards,
Cliff,
Been there, Done that, Still no damm T shirt!
ID: 1674018 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1690897 - Posted: 13 Jun 2015, 16:57:08 UTC - in response to Message 1673825.  
Last modified: 13 Jun 2015, 16:58:21 UTC

Those sanity checks were completed at about rev 2421 last June, so in line with the Official Seti v7 binary vs. Optimized one thread it would be good to get the release builds updated past their current rev 1831. First step will have to be getting the Beta splitter configuration fixed so the pfb_splitters used there consistently produce WUs with the intended analysis_cfg parameters.
                                                                  Joe

Any progress in getting those sanity checks implemented?

The reason I ask is that I found this morning that I was on the losing end of yet another WU where two ATI rigs with Autocorr counts of 30 invalidated my task with 2 Spikes and 6 Triplets. In following the trail of those two rigs, I turned up 11 more (before I quit looking) which are happily validating against each other, often trashing what are likely good science results. At least 3 of those 11 are new machines that have been signed up in the last couple of months, and none of them are ones that were on the list I made early last year.

For what it's worth, the IDs are: 7456351, 7537898, 6889285, 6641343, 7453260, 7114470, 7010985, 7556063, 7504170, 7084572, 6759774, 7553275, 6722861.

Of course, while the sanity checks would be a big help, the real solution might be to figure out what could be causing all these ATI rigs, and only ATI rigs, to be producing these wacko (yet consistent) Autocorr results in the first place?
ID: 1690897 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1690947 - Posted: 13 Jun 2015, 18:58:13 UTC - in response to Message 1690897.  

Those sanity checks were completed at about rev 2421 last June, so in line with the Official Seti v7 binary vs. Optimized one thread it would be good to get the release builds updated past their current rev 1831. First step will have to be getting the Beta splitter configuration fixed so the pfb_splitters used there consistently produce WUs with the intended analysis_cfg parameters.
                                                                  Joe

Any progress in getting those sanity checks implemented?

The reason I ask is that I found this morning that I was on the losing end of yet another WU where two ATI rigs with Autocorr counts of 30 invalidated my task with 2 Spikes and 6 Triplets. In following the trail of those two rigs, I turned up 11 more (before I quit looking) which are happily validating against each other, often trashing what are likely good science results. At least 3 of those 11 are new machines that have been signed up in the last couple of months, and none of them are ones that were on the list I made early last year.

For what it's worth, the IDs are: 7456351, 7537898, 6889285, 6641343, 7453260, 7114470, 7010985, 7556063, 7504170, 7084572, 6759774, 7553275, 6722861.

Of course, while the sanity checks would be a big help, the real solution might be to figure out what could be causing all these ATI rigs, and only ATI rigs, to be producing these wacko (yet consistent) Autocorr results in the first place?

All 13 of the ATI hosts you found are using Catalyst 11.10 or 11.11 drivers which have AMD APP SDK 2.5 support. AMD APP SDK 2.6 or better (Catalyst 11.12+) is needed for the Windows SaHv7 OpenCL 7.03 builds to do proper Autocorr processing.

Those 7.03 builds are from rev 1831, builds from rev 1870 or later will error all SaHv7 tasks on too old drivers. That will drive the "Max tasks per day" for their SaHv7 OpenCL app versions down to 1, so even if the user never updates the drivers the only remaining issue would be relatively few tasks being sent to the host and returned as errors. The current Windows 7.07 OpenCL ATI versions under test at Beta are from rev 2929.
                                                                  Joe
ID: 1690947 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : Invalid, due to difference of opinion.


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.