You might want to check this one Again...


log in

Advanced search

Message boards : Number crunching : You might want to check this one Again...

1 · 2 · 3 · Next
Author Message
TBar
Volunteer tester
Send message
Joined: 22 May 99
Posts: 1198
Credit: 44,389,616
RAC: 114,731
United States
Message 1353938 - Posted: 6 Apr 2013, 1:13:42 UTC

I have serious doubts about the results of this task...

Workunit 1199573476

Task ---- Computer ----------- Sent ----------- Time reported ---------- Status -------- Run time - CPU time - Credit ---- Application
2899037611 4202271 30 Mar 2013, 10:58:09 UTC 30 Mar 2013, 13:12:55 UTC Completed and validated 15.30 11.98 2.39 SETI@home Enhanced Anonymous platform (NVIDIA GPU)
2899037612 6864181 30 Mar 2013, 10:58:15 UTC 31 Mar 2013, 2:05:58 UTC Completed, marked as invalid 1,415.40 100.99 0.00 SETI@home Enhanced Anonymous platform (NVIDIA GPU)
2900744232 6829067 31 Mar 2013, 5:49:03 UTC 4 Apr 2013, 14:03:02 UTC Completed, marked as invalid 10,004.00 9,842.60 0.00 SETI@home Enhanced v6.03
2904766778 6680152 4 Apr 2013, 19:45:26 UTC 5 Apr 2013, 5:25:27 UTC Completed and validated 49.10 12.54 2.39 SETI@home Enhanced Anonymous platform (NVIDIA GPU)


The two validating computers don't have a very good history.

Profile betregerProject donor
Avatar
Send message
Joined: 29 Jun 99
Posts: 2210
Credit: 4,635,663
RAC: 9,778
United States
Message 1353942 - Posted: 6 Apr 2013, 1:28:51 UTC - in response to Message 1353938.

I wonder why people are so sloppy as to run machines like that.
That of course does not say that your specific result was not bad, but they are producing a lot of garbage.
____________

TBar
Volunteer tester
Send message
Joined: 22 May 99
Posts: 1198
Credit: 44,389,616
RAC: 114,731
United States
Message 1353943 - Posted: 6 Apr 2013, 1:32:31 UTC - in response to Message 1353942.
Last modified: 6 Apr 2013, 1:36:02 UTC

I wonder why people are so sloppy as to run machines like that.
That of course does not say that your specific result was not bad, but they are producing a lot of garbage.

My results match the CPU results. Even the Host with the CPU results has problems with his GPU, but, I kinda trust his CPU.

Profile Wiggo
Avatar
Send message
Joined: 24 Jan 00
Posts: 6712
Credit: 92,349,438
RAC: 73,663
Australia
Message 1353944 - Posted: 6 Apr 2013, 1:33:22 UTC - in response to Message 1353938.

Ah yes those pair, this was bound to happen sooner or later (just like the old v12 days all over again).

I've sent these 2 guys so many PM's over the last year about their unstable rigs that it's far from a joke now. :-(

Cheers.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5774
Credit: 57,575,272
RAC: 48,408
Australia
Message 1353948 - Posted: 6 Apr 2013, 1:40:29 UTC - in response to Message 1353944.

I've sent these 2 guys so many PM's over the last year about their unstable rigs that it's far from a joke now. :-(

The problem is most people don't come to the forums, and many of those that do probably aren't even aware of PMs.


In the community preferences it has the option to be notified by PMs to the users email address (either for each PM or one email daily).

It might be worth the team considering changing the default to Notify by email, and even changing all the current settings to that. Send a PM to each person (over several days) advising them of the change, and how to change it back.
It would then allow people to send PMs to those with problem systems, and they would at least be notified they have a message even if they choose not to look at it.

The only other option is for a Mod or Admin that is able to view email addresses to send or forward messages to each problem user individually.
____________
Grant
Darwin NT.

Profile Zapped SparkyProject donor
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 30 Aug 08
Posts: 7139
Credit: 1,225,942
RAC: 1,220
United Kingdom
Message 1353956 - Posted: 6 Apr 2013, 1:54:56 UTC - in response to Message 1353948.
Last modified: 6 Apr 2013, 1:59:30 UTC

I've sent these 2 guys so many PM's over the last year about their unstable rigs that it's far from a joke now. :-(

The problem is most people don't come to the forums, and many of those that do probably aren't even aware of PMs.


In the community preferences it has the option to be notified by PMs to the users email address (either for each PM or one email daily).

It might be worth the team considering changing the default to Notify by email, and even changing all the current settings to that. Send a PM to each person (over several days) advising them of the change, and how to change it back.
It would then allow people to send PMs to those with problem systems, and they would at least be notified they have a message even if they choose not to look at it.

The only other option is for a Mod or Admin that is able to view email addresses to send or forward messages to each problem user individually.

No none here can see any e-mail address, you'll have to send a PM and hope it's someone who watches such things, takes your advice and then corrects it. Although experience says at least nine times out of ten it'll be ignored, sadly.

[EDIT]Discussion of Invalid Host Messaging thread bumped[/EDIT]
____________
In an alternate universe, it was a ZX81 that asked for clothes, boots and motorcycle.

Client error 418: I'm a teapot

Tropical Goldfish Fish 15: Squeaky bras 'R us

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5774
Credit: 57,575,272
RAC: 48,408
Australia
Message 1353969 - Posted: 6 Apr 2013, 3:06:21 UTC - in response to Message 1353956.

The only other option is for a Mod or Admin that is able to view email addresses to send or forward messages to each problem user individually.

No none here can see any e-mail address, you'll have to send a PM and hope it's someone who watches such things, takes your advice and then corrects it.

So only the forum admin can access that information.
They could set it so all mods are able to, or they could set it so one particular mod can do so, or they'd be the one to pass on such messages.
Otherwise things stay as they are- systems pumpimg out rubbish continuously because their owns never check on them.
____________
Grant
Darwin NT.

Horacio
Send message
Joined: 14 Jan 00
Posts: 536
Credit: 71,189,217
RAC: 82,203
Argentina
Message 1353975 - Posted: 6 Apr 2013, 3:35:01 UTC - in response to Message 1353969.

But even if someone were able to send them an email or any other kind of message, there is no warranty that they will fix it...
It was discussed a lot of times that the way in which BOINC handles errors is too permisive and that the real fix is to change that so it can effectively cut down the amount of task sent to those hosts...
____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5774
Credit: 57,575,272
RAC: 48,408
Australia
Message 1353979 - Posted: 6 Apr 2013, 3:49:48 UTC - in response to Message 1353975.
Last modified: 6 Apr 2013, 3:50:05 UTC

But even if someone were able to send them an email or any other kind of message, there is no warranty that they will fix it...

Nope.
But no one will fix something if they don't know it's broken. If they know, then there's a chance.
____________
Grant
Darwin NT.

Horacio
Send message
Joined: 14 Jan 00
Posts: 536
Credit: 71,189,217
RAC: 82,203
Argentina
Message 1353981 - Posted: 6 Apr 2013, 4:07:25 UTC - in response to Message 1353979.

But no one will fix something if they don't know it's broken. If they know, then there's a chance.

That's true, but I think that they (BOINC/Projects) dont't want people freaking out on paranoia about their personal data beeing made available...
____________

TBar
Volunteer tester
Send message
Joined: 22 May 99
Posts: 1198
Credit: 44,389,616
RAC: 114,731
United States
Message 1353983 - Posted: 6 Apr 2013, 4:12:23 UTC
Last modified: 6 Apr 2013, 4:18:31 UTC

Something that would work in this instance would be a simple script that precludes sending a tie-breaker to Hosts with over a set number of invalids. That might prevent the broken clock from being correct twice a day...

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2245
Credit: 8,555,906
RAC: 4,290
United States
Message 1353984 - Posted: 6 Apr 2013, 4:45:44 UTC

I haven't had many "error" WUs myself over the years (I think I've had maybe 10 in total?). I have noticed that the "maximum per day" does reset back to 100 if you were anywhere over 100, and it is supposed to cut in half for every consecutive error, down to 1.

By those rules.. if you were at say.. 1500/day and one became an error, you are down to 100. The next one is valid, so you are at 101. Next one is an error, and you're at 100 again, etc.

I'm thinking that should be reduced to something smaller to keep runaway machines from going rampant. Something like 10 or 25 should do. If your machine does good work and just had one bad WU, then you won't have a problem rebuilding back up to a decent number again. If your machine is a runaway, then it won't have a detrimental effect on the overall science.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Terror Australis
Volunteer tester
Send message
Joined: 14 Feb 04
Posts: 1682
Credit: 204,141,248
RAC: 24,888
Australia
Message 1353987 - Posted: 6 Apr 2013, 5:23:39 UTC

This problem runs all the way up to the top rigs. 6656656 which is currently number 8 on the RAC scoreboard has a crook GTX580 that has been producing bad results for months. I've PM'ed the owner a couple of times but nothing has been done about it.

T.A.

Profile Wiggo
Avatar
Send message
Joined: 24 Jan 00
Posts: 6712
Credit: 92,349,438
RAC: 73,663
Australia
Message 1353991 - Posted: 6 Apr 2013, 5:48:03 UTC - in response to Message 1353987.

This problem runs all the way up to the top rigs. 6656656 which is currently number 8 on the RAC scoreboard has a crook GTX580 that has been producing bad results for months. I've PM'ed the owner a couple of times but nothing has been done about it.

T.A.

Yes, that is another one but at least it's not as bad as it use to be.

Cheers.

Horacio
Send message
Joined: 14 Jan 00
Posts: 536
Credit: 71,189,217
RAC: 82,203
Argentina
Message 1353992 - Posted: 6 Apr 2013, 5:48:24 UTC - in response to Message 1353984.

The way it works is more complex than that, it takes into account the basic quota, the success or error outcome (how was it reported: normally finished or as a computation error) and the validation outcome (after compairing the different results of wingmen)

Given this project's setting of 100 for daily_result_quota:
- If an error is reported the "Max tasks per day" is reduced to less than the basic 100 quota, 99 if the host was previously OK or subtract one if it was already below.
- If a "success" is reported and the host was below the basic quota, "Max tasks per day" is doubled but capped at 100.
- A task judged valid increases "Max tasks per day" by one.
- A task judged invalid reduces "Max tasks per day" by one, but only if it was above the basic quota.
(quoted from a post by Josef Segur in this thread)

Some of my hosts have currently a daily quota of more than a thousand, if one of them start to fail only on the validation then it will take a lot of time to get the quota reduced specially because they will have a lot of previous tasks that are going to succeed on validation (rising the quota) while the invalids will need a 3rd wingman and are going to take more time to get the invalid mark...
As it is, it works for avoiding seriuos hardware issues, but not to effectively throttle subtle errors...
____________

andybuttProject donor
Volunteer tester
Avatar
Send message
Joined: 18 Mar 03
Posts: 251
Credit: 112,695,401
RAC: 99,188
United Kingdom
Message 1353997 - Posted: 6 Apr 2013, 5:59:23 UTC - in response to Message 1353987.

TA
Sorry but I have only received the one PM this morning. I know the card is playing up a little and have been playing around with it to see if it was something i'd changed but to no avail. Just off to the computer store to get a new card so should be replaced today.

Andy
____________

Terror Australis
Volunteer tester
Send message
Joined: 14 Feb 04
Posts: 1682
Credit: 204,141,248
RAC: 24,888
Australia
Message 1354005 - Posted: 6 Apr 2013, 6:51:08 UTC - in response to Message 1353997.
Last modified: 6 Apr 2013, 6:52:04 UTC

TA
Sorry but I have only received the one PM this morning. I know the card is playing up a little and have been playing around with it to see if it was something i'd changed but to no avail. Just off to the computer store to get a new card so should be replaced today.

Andy

Thanks Andy. I've seen that rig with over 1000 invalid tasks and 1500 inconclusives which was a worry.

The only gripe I have is that now you're going to get further ahead of me :)

T.A.

andybuttProject donor
Volunteer tester
Avatar
Send message
Joined: 18 Mar 03
Posts: 251
Credit: 112,695,401
RAC: 99,188
United Kingdom
Message 1354039 - Posted: 6 Apr 2013, 9:01:34 UTC - in response to Message 1354005.

TA
I don't remember being anywhere near that high! Just ordered two more 690's, should be here Monday

Andy
____________

Terror Australis
Volunteer tester
Send message
Joined: 14 Feb 04
Posts: 1682
Credit: 204,141,248
RAC: 24,888
Australia
Message 1354049 - Posted: 6 Apr 2013, 9:24:06 UTC - in response to Message 1354039.

It was few months ago. Before the last couple of extended outages.

T.A.

Profile Floyd
Avatar
Send message
Joined: 19 May 11
Posts: 524
Credit: 1,870,625
RAC: 0
United States
Message 1354142 - Posted: 6 Apr 2013, 17:57:52 UTC - in response to Message 1353984.

I haven't had many "error" WUs myself over the years (I think I've had maybe 10 in total?). I have noticed that the "maximum per day" does reset back to 100 if you were anywhere over 100, and it is supposed to cut in half for every consecutive error, down to 1.

By those rules.. if you were at say.. 1500/day and one became an error, you are down to 100. The next one is valid, so you are at 101. Next one is an error, and you're at 100 again, etc.

I'm thinking that should be reduced to something smaller to keep runaway machines from going rampant. Something like 10 or 25 should do. If your machine does good work and just had one bad WU, then you won't have a problem rebuilding back up to a decent number again. If your machine is a runaway, then it won't have a detrimental effect on the overall science.


Well there has been a bit of an unknown issue of abandoned tasks , by boinc or the seti servers , that show up as errors , that shouldn't effect our max per day quota , as it doesn't seem to be caused by anything our machines have done .
As shown in the abandoned tasks thread :

http://setiathome.berkeley.edu/forum_thread.php?id=70946
____________

1 · 2 · 3 · Next

Message boards : Number crunching : You might want to check this one Again...

Copyright © 2014 University of California