Do not get credits etc

Message boards : Number crunching : Do not get credits etc
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
JAF
Avatar

Send message
Joined: 9 Aug 00
Posts: 289
Credit: 168,721
RAC: 0
United States
Message 41776 - Posted: 31 Oct 2004, 21:14:07 UTC

I believe Ned is on the right track with his explanation. Since the project needs (and should support) the many processors and operating systems we have today, loosening up the validation might not be the best solution.

However, I wonder if the project could write and distribute a "seti stress test program" that one could run to check their computer. The program could cycle on "Seti like calculations" using canned data so the expected results would be consistent. A simple pass/fail message would suffice.

If a person starts getting "invalid results" they could run this program to see if their system is having problems.

Of course this wouldn't be 100 percent conclusive, since there could still be some intermittent problems that would be hard to catch.
ID: 41776 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 41787 - Posted: 31 Oct 2004, 21:55:44 UTC - in response to Message 41776.  

I think the project itself is the stress test.

The cool bit would be adding an "index of accuracy" of some sort, maybe as simple as saying "on average, 0.1% of all results get 0 credit, for your machine, 20% of all results get 0 credit."

> However, I wonder if the project could write and distribute a "seti stress
> test program" that one could run to check their computer. The program could
> cycle on "Seti like calculations" using canned data so the expected results
> would be consistent. A simple pass/fail message would suffice.

ID: 41787 · Report as offensive
Profile Captain Avatar
Volunteer tester
Avatar

Send message
Joined: 17 May 99
Posts: 15133
Credit: 529,088
RAC: 0
United States
Message 41788 - Posted: 31 Oct 2004, 21:59:33 UTC - in response to Message 41776.  

> However, I wonder if the project could write and distribute a "seti stress
> test program" that one could run to check their computer. The program could
> cycle on "Seti like calculations" using canned data so the expected results
> would be consistent. A simple pass/fail message would suffice.


I believe there was a test unit back in beta'''

ID: 41788 · Report as offensive
Profile The worm that turned
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 100
Credit: 4,872,533
RAC: 0
Australia
Message 41805 - Posted: 31 Oct 2004, 23:23:31 UTC
Last modified: 1 Nov 2004, 1:36:15 UTC

AAAAGGGGHH

This thread is driving me crazy
It's a Berkeley Hardware or software problem I tell you.





visit boinc@Australia

ID: 41805 · Report as offensive
JAF
Avatar

Send message
Joined: 9 Aug 00
Posts: 289
Credit: 168,721
RAC: 0
United States
Message 41807 - Posted: 31 Oct 2004, 23:33:33 UTC - in response to Message 41787.  

> I think the project itself is the stress test.
>
Yes, the project is the stress test. But there are so many variables with different work units, processors, operating systems, and Internet connections, that it's hard for one to narrow down a potential problem with their system or the project.

What I'm thinking of, if one starts getting invalid results, one could run a floating point test that performs "seti like calculations" with a set data package (internal to the program and the same for all platforms). The program would crunch the data and compare the results. Run it long enough to get the system temperatures up and see if errors occur or not. Then if there are errors, there's an indication that you can't process Seti data on your system without error. No project "detaches or resets" just to see if the WU's are the problem. If you get errors, post a message about your system and you will probably get a response from someone with the same setup that doesn't get errors running the test program.

I know there are a lot of system diagnostic/performance programs out there, like SIS Sandra, Passmark, Performance Test, etc. But they are not consistent for each processor and operating system. I'm thinking of a much simpler test that duplicates Seti calculations so each and every system should give a pass/fail indication. Rule out the uncontrollable data like WU's and Internet connections.
ID: 41807 · Report as offensive
Profile Sir Ulli
Volunteer tester
Avatar

Send message
Joined: 21 Oct 99
Posts: 2246
Credit: 6,136,250
RAC: 0
Germany
Message 41809 - Posted: 31 Oct 2004, 23:39:47 UTC

I have no Probs with 3 Hosts here...

Greetings from Germany NRW
Ulli S@h Berkeley's Staff Friends Club m7 ©
ID: 41809 · Report as offensive
Profile Sir Ulli
Volunteer tester
Avatar

Send message
Joined: 21 Oct 99
Posts: 2246
Credit: 6,136,250
RAC: 0
Germany
Message 41812 - Posted: 31 Oct 2004, 23:45:46 UTC

I have no Probs with 3 Hosts here...

Greetings from Germany NRW
Ulli S@h Berkeley's Staff Friends Club m7 ©
ID: 41812 · Report as offensive
Profile Sir Ulli
Volunteer tester
Avatar

Send message
Joined: 21 Oct 99
Posts: 2246
Credit: 6,136,250
RAC: 0
Germany
Message 41813 - Posted: 31 Oct 2004, 23:57:21 UTC

I have no Probs with 3 Hosts here...

Greetings from Germany NRW
Ulli S@h Berkeley's Staff Friends Club m7 ©
ID: 41813 · Report as offensive
texasfit
Avatar

Send message
Joined: 11 May 03
Posts: 223
Credit: 500,626
RAC: 0
United States
Message 41827 - Posted: 1 Nov 2004, 2:44:58 UTC - in response to Message 41787.  

> I think the project itself is the stress test.
>
> The cool bit would be adding an "index of accuracy" of some sort, maybe as
> simple as saying "on average, 0.1% of all results get 0 credit, for your
> machine, 20% of all results get 0 credit."
>
>

Ned
I agree with your analysis about certain unstable overclocked systems.

From the failed wu's that I have researched however, there has always been a connection with either a hardware / software related problem at or close to that time period. Many of the people that have been getting these 0 credit errors do not have their systems overclocked.

I have personnally only had 7 wu's show this 0 credit with an Invalid result and that is on all three of my systems the are currently running Seti Boinc. All 7 of these I traced to a Berkeley hardware or software issue. I am sure that I could very easily have had many many more as did some others but usually caught the news and disabled network access or whatever I felt necessary at that time.

Rachel, is the perfect example of this senerio. If I recall, see is running a Dell 2.8Ghz system that her hubby bought for her. The wu's she was having problems with were during the last upload/download directory hard lock. Berkeley had to reboot that server and sync the directory. Rachel did not have any of these problems before this or after she reset the project.

I think that most of us have the main objective with the science in mind but if there are some wu's downloaded that can not be returned with valid results then they are of no value to the project or the seti participant.

I think the Boinc Team is doing an outstanding job with correcting the bugs and dealing with the hardware they have on hand at this time. Keep up the good work Boinc Team and we'll just keep on crunching those units for you.

----------<br>
<img src=\"http://boinc.mundayweb.com/seti2/stats.php?userID=924&amp;trans=off\"><br>
<a href=\"http://setiweb.ssl.berkeley.edu/team_join_form.php?id=30199\">Join</a> the <a href=\"http://ocforums.com\">Overclockers.com</a> SETI Team!
ID: 41827 · Report as offensive
texasfit
Avatar

Send message
Joined: 11 May 03
Posts: 223
Credit: 500,626
RAC: 0
United States
Message 41829 - Posted: 1 Nov 2004, 2:50:50 UTC - in response to Message 41729.  

> texasfit,
>
> Thanks for your summary of the old tech news briefs from that period. I
> suspect that the wu's in question fall somewhere within the server problems
> back then. The missing/lost credits are not a big deal. Out of curiosity,
> where did you pull those old tech notices from?
>
>

vvinyyc

The link for the news is on the Main Home Page below current news. Just hit the link at the bottom of current news called '....more'
ID: 41829 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 41834 - Posted: 1 Nov 2004, 4:03:11 UTC - in response to Message 41827.  
Last modified: 1 Nov 2004, 4:33:01 UTC

I've been reading Rachel's posts for a while and I can't really spot a correlation between system wide problems and her complaints over the past weeks.

Not picking on Rachel, just looking at the data.

A problem like this is either going to affect everyone equally, or it's going to be specific to certain machines (or users).

So, let's take something like a 30 day window. If the project receives a million work units back and 5000 of those get zero credit, then project-wide we're losing 0.5%.

If we do the same calculation for each host seperately, that should come out somewhere around 0.5% as well. If we instead calculate a 20% loss, then it's time to take a good hard look at that computer.

That doesn't require test work units (that may not show the error), or a test long enough to spot one error in a trillion.

> I agree with your analysis about certain unstable overclocked systems.
>
> From the failed wu's that I have researched however, there has always been a
> connection with either a hardware / software related problem at or close to
> that time period. Many of the people that have been getting these 0 credit
> errors do not have their systems overclocked.

ID: 41834 · Report as offensive
texasfit
Avatar

Send message
Joined: 11 May 03
Posts: 223
Credit: 500,626
RAC: 0
United States
Message 41840 - Posted: 1 Nov 2004, 4:48:32 UTC
Last modified: 1 Nov 2004, 4:57:30 UTC

I am still not convinced it is a computer specific issue. Even though I agree with your statement:

>
>If we do the same calculation for each host seperately, that should come out
>somewhere around 0.5% as well. If we instead calculate a 20% loss, then it's
>time to take a good hard look at that computer.
>

Every computer downloads a different quantity of wu's on different days which would make it necessary to work your host calculation on a time specific variable. My P4-HT computers will download about 20 new wu's ever day or two vs your systems of only 3 ever day or two. This would mean that if I downloaded or uploaded during a problem time period of a day I would have 85% more units to handle.

I checked some of your wu's that showed errors and all the units I checked with errors on your computers were related to hardware/software problems on those days. An example of yours would be Invalid's returned on 10/12-14 which was during the time of the:
October 9, 2004
A problem with file uploads has existed for the last few days.
October 12, 2004 - 21:00 UTC
We just had an unexpected crash of the new replica database while we trying to format it. Bad disks? Bad cables? Who knows? But we're going to abandon the replica database project for now.
October 12, 2004
We just had (about 21:00 UTC) an unexpected outage due to a disk crash. Everything is back up now.


Your turn :-)

ID: 41840 · Report as offensive
Profile Rachel
Avatar

Send message
Joined: 13 Apr 02
Posts: 978
Credit: 449,704
RAC: 0
United Kingdom
Message 41858 - Posted: 1 Nov 2004, 6:20:54 UTC - in response to Message 41840.  
Last modified: 1 Nov 2004, 6:35:29 UTC

I think it was a bad batch of wu's.Everything is working great now,thanks

Rach
......In Space No One Can Hear You Scream......



ID: 41858 · Report as offensive
Tony Martin

Send message
Joined: 5 Dec 99
Posts: 91
Credit: 69,723
RAC: 0
United States
Message 41869 - Posted: 1 Nov 2004, 7:48:33 UTC

I don't agree look at this WU http://setiweb.ssl.berkeley.edu/workunit.php?wuid=2998766
now tell me why my machine got 0 credit and was invalid and the other 2 were valid and got credit? My computer worked 16 hours to complete it and report it back to seti and all I got was 0 credit. The Wu completed with no errors that I can see yet it was marked invalid. Why??????

ID: 41869 · Report as offensive
Profile Rachel
Avatar

Send message
Joined: 13 Apr 02
Posts: 978
Credit: 449,704
RAC: 0
United Kingdom
Message 41905 - Posted: 1 Nov 2004, 12:36:48 UTC - in response to Message 41869.  

> I don't agree look at this WU
> http://setiweb.ssl.berkeley.edu/workunit.php?wuid=2998766
> now tell me why my machine got 0 credit and was invalid and the other 2 were
> valid and got credit? My computer worked 16 hours to complete it and report it
> back to seti and all I got was 0 credit. The Wu completed with no errors that
> I can see yet it was marked invalid. Why??????
>
>
because it did not match the other 2 results?I had many like that too.
......In Space No One Can Hear You Scream......



ID: 41905 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 41973 - Posted: 1 Nov 2004, 18:12:26 UTC - in response to Message 41869.  

> I don't agree look at this WU
> http://setiweb.ssl.berkeley.edu/workunit.php?wuid=2998766
> now tell me why my machine got 0 credit and was invalid and the other 2 were
> valid and got credit? My computer worked 16 hours to complete it and report it
> back to seti and all I got was 0 credit. The Wu completed with no errors that
> I can see yet it was marked invalid. Why??????

Tony,

Your Result was marked as "Invalid" and thus failed to compare with the Canonical ("ideal" or "Example" result for this WU) and tehrefore gathered no credit. Since there is no specific error for this WU in the error text all we can do is to guess as to the exact reason for the failure.

Worm,

> AAAAGGGGHH
>
> This thread is driving me crazy
> It's a Berkeley Hardware or software problem I tell you.

And it may very well be a problem on the back end. On the other hand, it may not be an error anywhere. One of the nicer parts of the SETI@Home on BOINC is that we have visibility like never before. In the old days you had no idea if what you were doing was of any value at all. Yes you did get a "bean" for each WU processed, but, no idea of what your contribution really was to the science. We have that now, but the freedom and information does come with its own price...
ID: 41973 · Report as offensive
Profile The worm that turned
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 100
Credit: 4,872,533
RAC: 0
Australia
Message 42011 - Posted: 1 Nov 2004, 20:20:55 UTC

Paul,
All i'm saying is don't get everyone worried that their computer systems are faulty, when all the evidence is pointing to a Berkeley hardware failure of some kind.
Some of the earlier posts in this thread were plainly going along that track.


visit boinc@Australia

ID: 42011 · Report as offensive
Profile Mr Slick
Avatar

Send message
Joined: 18 May 01
Posts: 7
Credit: 113,202
RAC: 0
United States
Message 42019 - Posted: 1 Nov 2004, 20:45:53 UTC

i seem to loose THOUSANDS of credits
look here
http://setiweb.ssl.berkeley.edu/results.php?hostid=205192&offset=60

I get this message on one computer..

SETI@home - 2004-11-01 13:59:18 - Scheduler RPC to http://setiboincdata.ssl.berkeley.edu/sah_cgi/cgi failed

But on another computer i see...

SETI@home - 2004-11-01 13:59:26 - Scheduler RPC to http://setiboincdata.ssl.berkeley.edu/sah_cgi/cgi succeeded

I have many data units completed between 3 computers, but for a week now, no connections. All 3 computers have version 4.13 one works fine two do not connect right now. I have tried to swap the program folder to the computer that i know connects, and the same problem exsists, now the one that did connect wont upload, so i know it is not the individual computer.

I had the same problem with 4.09 version. When the computers finally uploads the files (sometimes after several weeks), i get no credit for the 'old' files.
I have no firewall issues, or connectivity problems. And usually this only seems to happen to one computer at a time.

I know there is nothing i have done wrong here, but it is frusterating when i loose thousands of credits.

<img src=\"http://boinc.mundayweb.com/seti2/stats.php?userID=1548\">
ID: 42019 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 42173 - Posted: 2 Nov 2004, 13:37:47 UTC - in response to Message 42011.  

> Paul,
> All i'm saying is don't get everyone worried that their computer systems
> are faulty, when all the evidence is pointing to a Berkeley hardware failure
> of some kind.
> Some of the earlier posts in this thread were plainly going along that track.

I did not think that I was doing that. But I conceed the point that I may have given that impression. All I am saying is that there are many reasons for the 0 credit WU/Result. The possible causes are many and so pointing only to one possible cause is not a complete listing of the possibilities. Possible causes include (but are not limited to):

1) Errors in the back-end software (validator, assimulator, etc.)
2) Errors in the BOINC Software
3) Errors in the SETI@Home science application
4) Errors caused by differences in compilers used for the various OS
5) Errors cause by client side hardware problems (Flakey memory, overclocking, etc.)
6) Errors caused by differences in the Floating Point hardware (FPU)
7) Errors caused by faulty uploads and downloads
8) Errors caused by back end hardware failures
9) Errors caused by magic and karma

To seize upon only one of these possibilities as a source of all problems is not,um, good.

Many of the Results that I have looked at are marked as "Invalid" and that could be caused by any one of the possibilities I listed above. All we know is that it is marked as invalid but we do not know why. The simple explanation is that this result (when so marked) did not compare with the others and in the voting scheme used, they lose. Note that this does not mean that the work unit is bad, or even that the singleton result is not the correct one. This is why a project may do a "World Series" type of compare by running the same WU more than thrice and demanding a much higher coorelation between Results before they will consider that they have the proper answer.

As you have noted, one of the reasons for errors is the hardware failures we have seen on the back end systems. But the highest probability of the occurance of problems lies in the domain of the Participant's computer because it is not as controlled as the other aspects. Many participants have low cost machines, others wanting higher performance may over-clock them, and so forth ...

ID: 42173 · Report as offensive
Profile The worm that turned
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 100
Credit: 4,872,533
RAC: 0
Australia
Message 42315 - Posted: 2 Nov 2004, 23:41:28 UTC

Paul I wasn't refering to you In regard to getting people worried about their computer systems failing. I said some of the earlier posts.
I do find it hard to understand though why you can't accept that in most cases the current problem with zero credits is a Berkeley hardware or software error.
This problem started to appear quickly on crunchers using different systems all around the world.I know we will always find the odd instance of zero credit appearing to any of us but this was something else.
Logic tells me it can't be individual computer errors and thus we should look to Berkeley first.
The problem (whatever it was/is) seems to be feeding out of the system now and I expect people to see instances of zero credit falling to normal levels.

PS By the way Paul your'e a legend down here in oz and a credit to the forum
don't ever stop


visit boinc@Australia

ID: 42315 · Report as offensive
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Do not get credits etc


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.