concerns about CUDA and seti.....

Message boards : Number crunching : concerns about CUDA and seti.....
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
nick
Volunteer tester
Avatar

Send message
Joined: 22 Jul 05
Posts: 284
Credit: 3,902,174
RAC: 0
United States
Message 1064180 - Posted: 7 Jan 2011, 8:26:35 UTC

Hi, I have noticed that several of the work units I have pending are going to error out and lose seti data, as i have a machine that has not had an error before, but when it runs against a CUDA GPU, they do not match,as in this case, http://setiathome.berkeley.edu/workunit.php?wuid=677997049 which I think will error out and be tossed out....but in this case, same machine, the third run of the WU is being done by a cpu, http://setiathome.berkeley.edu/workunit.php?wuid=677765004 and i think should match mine machine....

anyways, just thoughts on how we are crunching...

Nick


ID: 1064180 · Report as offensive
Profile Helli_retiered
Volunteer tester
Avatar

Send message
Joined: 15 Dec 99
Posts: 707
Credit: 108,785,585
RAC: 0
Germany
Message 1064183 - Posted: 7 Jan 2011, 8:40:48 UTC

Yup. I have seen that also (because my RAC reduced each day) on all of my Rigs since last outage.
Several Task waiting because the Status is "Completed, validation inconclusive" with 2, 3 or 4 Results with different Client Software...

Helli
A loooong time ago: First Credits after SETI@home Restart
ID: 1064183 · Report as offensive
Profile Miep
Volunteer moderator
Avatar

Send message
Joined: 23 Jul 99
Posts: 2412
Credit: 351,996
RAC: 0
Message 1064188 - Posted: 7 Jan 2011, 9:36:28 UTC - in response to Message 1064180.  
Last modified: 7 Jan 2011, 9:47:38 UTC

Hi, I have noticed that several of the work units I have pending are going to error out and lose seti data, as i have a machine that has not had an error before, but when it runs against a CUDA GPU, they do not match,as in this case, http://setiathome.berkeley.edu/workunit.php?wuid=677997049 which I think will error out and be tossed out....but in this case, same machine, the third run of the WU is being done by a cpu, http://setiathome.berkeley.edu/workunit.php?wuid=677765004 and i think should match mine machine....

anyways, just thoughts on how we are crunching...

Nick


http://setiathome.berkeley.edu/workunit.php?wuid=677997049
http://setiathome.berkeley.edu/workunit.php?wuid=677765004

Looks like one idiot running old V12 opimized on a Fermi and 6.09 got false -9 overflows. Would need to go to CPU or maybe somebody running x32f which produces much less false -9.

I've no idea if the correct result will be thrown out when it clashes with two -9's.

Bad luck - the errors are not on your side.

Edit: correction. two idiots with V12 on a Fermi
Carola
-------
I'm multilingual - I can misunderstand people in several languages!
ID: 1064188 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14682
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1064193 - Posted: 7 Jan 2011, 9:56:24 UTC - in response to Message 1064188.  

Edit: correction. two idiots with V12 on a Fermi

Both hosts (5305178, 5257703) are known offenders already on Joe Segur's list.
ID: 1064193 · Report as offensive
nick
Volunteer tester
Avatar

Send message
Joined: 22 Jul 05
Posts: 284
Credit: 3,902,174
RAC: 0
United States
Message 1064196 - Posted: 7 Jan 2011, 10:16:53 UTC - in response to Message 1064188.  

so i just happened to get matched with a pair for miss configured GPUS, and not all, or even most do this?

Nick


ID: 1064196 · Report as offensive
Profile Frizz
Volunteer tester
Avatar

Send message
Joined: 17 May 99
Posts: 271
Credit: 5,852,934
RAC: 0
New Zealand
Message 1064197 - Posted: 7 Jan 2011, 10:27:01 UTC - in response to Message 1064188.  

Looks like one idiot running old V12 opimized on a Fermi ...


Too much money - too little brains ;)

Statistics like this make me laugh - or cry. Not decided yet :/

Number of tasks completed 3665
Max tasks per day 100
Number of tasks today 224
Consecutive valid tasks 0

(Possibly this has been discussed a million times, but why is "Max tasks per day" not reduced to ZERO for such hosts?)
Petition against 1366x768 glare displays: http://www.facebook.com/home.php?sk=group_153240404724993
ID: 1064197 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1064198 - Posted: 7 Jan 2011, 10:31:05 UTC - in response to Message 1064197.  


(Possibly this has been discussed a million times, but why is "Max tasks per day" not reduced to ZERO for such hosts?)


Yes it was discussed.
Looks like FARMI not compatible V12 binary can correctly process tasks on some ARs. This allows small amount of validations. So, quota system is ineffective to inhibit such mis-configured hosts.
ID: 1064198 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14682
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1064202 - Posted: 7 Jan 2011, 10:44:25 UTC - in response to Message 1064196.  

so i just happened to get matched with a pair for miss configured GPUS, and not all, or even most do this?

Nick

Exactly so. The tasks affected will be re-checked by another computer, and in the vast majority of cases, the result from the mis-configured host will be thrown out.

Unfortunately, the bad hosts get through (and waste) a huge number of tasks every day. 5257703 has downloaded about 2,000 tasks (over 700 MB of data) in the last 24 hours. So you come across their droppings more often than you might expect.

Given that they are consuming a non-trivial proportion of a scarce resource (bandwidth), I wonder whether the time has come to suggest that the hosts Joe identified should be blacklisted by the project (until/unless the software installation is corrected, of course).
ID: 1064202 · Report as offensive
Profile Helli_retiered
Volunteer tester
Avatar

Send message
Joined: 15 Dec 99
Posts: 707
Credit: 108,785,585
RAC: 0
Germany
Message 1064203 - Posted: 7 Jan 2011, 10:45:17 UTC

We need a Server based Blacklist. LOL

hehe

Helli
A loooong time ago: First Credits after SETI@home Restart
ID: 1064203 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1064205 - Posted: 7 Jan 2011, 10:49:10 UTC - in response to Message 1064202.  


Given that they are consuming a non-trivial proportion of a scarce resource (bandwidth), I wonder whether the time has come to suggest that the hosts Joe identified should be blacklisted by the project (until/unless the software installation is corrected, of course).


Or to improve BOINC's quota system. It should protect from such cases too....
ID: 1064205 · Report as offensive
Kevin Olley

Send message
Joined: 3 Aug 99
Posts: 906
Credit: 261,085,289
RAC: 572
United Kingdom
Message 1064206 - Posted: 7 Jan 2011, 10:49:56 UTC - in response to Message 1064197.  

Looks like one idiot running old V12 opimized on a Fermi ...


Too much money - too little brains ;)

Statistics like this make me laugh - or cry. Not decided yet :/

Number of tasks completed 3665
Max tasks per day 100
Number of tasks today 224
Consecutive valid tasks 0




The problem with looking at "Consecutive valid tasks" is that anyone who has an invalid task for whatever reason will have it set back to zero, a more accurate assesment would be to look at the percentage of valid tasks over a set time period or specified quantity of work units, unfortunately this would probably be too server intensive.


Kevin


Kevin


ID: 1064206 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1064208 - Posted: 7 Jan 2011, 10:52:50 UTC - in response to Message 1064206.  

Looks like one idiot running old V12 opimized on a Fermi ...


Too much money - too little brains ;)

Statistics like this make me laugh - or cry. Not decided yet :/

Number of tasks completed 3665
Max tasks per day 100
Number of tasks today 224
Consecutive valid tasks 0




The problem with looking at "Consecutive valid tasks" is that anyone who has an invalid task for whatever reason will have it set back to zero, a more accurate assesment would be to look at the percentage of valid tasks over a set time period or specified quantity of work units, unfortunately this would probably be too server intensive.


Kevin


Not too server intancive. to calculate such proportion only number of failed tasks + number of total tasks are needed. Number of tasks today already kept. Nothing prevent to keep number of invalid tasks today too.
The big problem with "daily" based % - task validation can be much later than task reception.
So better to keep "total" % for host.
ID: 1064208 · Report as offensive
Profile Westsail and *Pyxey*
Volunteer tester
Avatar

Send message
Joined: 26 Jul 99
Posts: 338
Credit: 20,544,999
RAC: 0
United States
Message 1064209 - Posted: 7 Jan 2011, 10:56:13 UTC
Last modified: 7 Jan 2011, 10:58:02 UTC

Call me an extremist but...
Why not have: Consecutive valid tasks 0 = quota 1/day ?

If you get an occasional invalid it will build back up fast..
also It will encourage running with an eye to accuracy.

Just a thought...be gentle.. ;) *ducks*
"The most exciting phrase to hear in science, the one that heralds new discoveries, is not Eureka! (I found it!) but rather, 'hmm... that's funny...'" -- Isaac Asimov
ID: 1064209 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14682
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1064211 - Posted: 7 Jan 2011, 10:59:59 UTC - in response to Message 1064203.  

We need a Server based Blacklist. LOL

hehe

Helli

The facility exists: http://boinc.berkeley.edu/trac/wiki/BlackList (though there is some suggestion it may have been broken by the app_version breakdown). We'll never know unless the project decide it's enough of a problem to intervene.
ID: 1064211 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1064213 - Posted: 7 Jan 2011, 11:01:22 UTC - in response to Message 1064209.  
Last modified: 7 Jan 2011, 11:03:46 UTC

Call me an extremist but...
Why not have: Consecutive valid tasks 0 = quota 1/day ?

If you get an occasional invalid it will build back up fast..
also It will encourage running with an eye to accuracy.

Just a thought...be gentle.. ;) *ducks*


well, it willharm project performance but will not solve problem.
Occasional valids will build up quota fast too.
Something that takes host history into account is needed...

EDIT: history is total % of validation for host for example. Hosts with low % should be penalized for long time aven if last few results are good ones.
ID: 1064213 · Report as offensive
Profile Frizz
Volunteer tester
Avatar

Send message
Joined: 17 May 99
Posts: 271
Credit: 5,852,934
RAC: 0
New Zealand
Message 1064217 - Posted: 7 Jan 2011, 11:05:54 UTC - in response to Message 1064208.  

Not too server intancive. to calculate such proportion only number of failed tasks + number of total tasks are needed. Number of tasks today already kept. Nothing prevent to keep number of invalid tasks today too.
The big problem with "daily" based % - task validation can be much later than task reception.
So better to keep "total" % for host.


+1
Petition against 1366x768 glare displays: http://www.facebook.com/home.php?sk=group_153240404724993
ID: 1064217 · Report as offensive
Profile Dr Grey

Send message
Joined: 27 May 99
Posts: 154
Credit: 104,147,344
RAC: 21
United Kingdom
Message 1064299 - Posted: 7 Jan 2011, 16:40:08 UTC
Last modified: 7 Jan 2011, 16:41:30 UTC

I've got one here. Because my set up is new I keep checking for invalids. Today I found this one:

http://setiathome.berkeley.edu/workunit.php?wuid=677758481

The trouble is if you look at the record of the two other 'valid' computers they don't look so good. Who would you trust? My Mum used to tell me two wrongs don't make a right.
Not saying my machine is right but it doesn't look as wrong as the others.
ID: 1064299 · Report as offensive
Profile Miep
Volunteer moderator
Avatar

Send message
Joined: 23 Jul 99
Posts: 2412
Credit: 351,996
RAC: 0
Message 1064304 - Posted: 7 Jan 2011, 16:49:11 UTC - in response to Message 1064299.  

I've got one here. Because my set up is new I keep checking for invalids. Today I found this one:

http://setiathome.berkeley.edu/workunit.php?wuid=677758481

The trouble is if you look at the record of the two other 'valid' computers they don't look so good. Who would you trust? My Mum used to tell me two wrongs don't make a right.
Not saying my machine is right but it doesn't look as wrong as the others.


made link clickable.

Yours is the right result that sadly (and fatally for the science database) was thrown out because it paired with two black sheep running a faulty app (or rather an app not usable on Fermi).

bottom line - we are still getting false results inserted into the database because of Fermis set up the wrong way.
Carola
-------
I'm multilingual - I can misunderstand people in several languages!
ID: 1064304 · Report as offensive
Profile James Sotherden
Avatar

Send message
Joined: 16 May 99
Posts: 10436
Credit: 110,373,059
RAC: 54
United States
Message 1064308 - Posted: 7 Jan 2011, 16:59:46 UTC

I have an inconclusive . I have 30 spikes, he has 31 pulses. Its from my CPU his fermi with V12 apps. Both -9 overflows. Wonder who will validate it? Its our buddy 5472266. here is the linkhttp://setiathome.berkeley.edu/workunit.php?wuid=679452515
[/quote]

Old James
ID: 1064308 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14682
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1064315 - Posted: 7 Jan 2011, 17:06:13 UTC - in response to Message 1064308.  

I have an inconclusive . I have 30 spikes, he has 31 pulses. Its from my CPU his fermi with V12 apps. Both -9 overflows. Wonder who will validate it? Its our buddy 5472266. here is the link http://setiathome.berkeley.edu/workunit.php?wuid=679452515

Stock Linux CPU app with a 1.9 day turnround - plenty of time to place your bets while we wait. I'll go for a second inconclusive and another resend ;-)
ID: 1064315 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : concerns about CUDA and seti.....


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.