This is not fair

Message boards : Number crunching : This is not fair

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile Tim
Volunteer tester
Avatar

Send message
Joined: 19 May 99
Posts: 211
Credit: 278,573,354
RAC: 182
Greece
Message 1382939 - Posted: 20 Jun 2013, 6:09:22 UTC

Today I had 3 invalid AP tasks at my top rig. (ID: 6716400)

All 3 tasks were ‘’Completed, can't validate’’ by my rig.

As I saw all the wings were with ATI gpus.

Why didn’t the server send anything to a different Nvidia gpu and trash the wu with ‘’Too many errors (may have bug)’’?

Tim

ID: 1382939 · Report as offensive
Profile Wiggo "Socialist"
Avatar

Send message
Joined: 24 Jan 00
Posts: 10534
Credit: 135,466,190
RAC: 41,163
Australia
Message 1382944 - Posted: 20 Jun 2013, 6:35:49 UTC - in response to Message 1382939.  

Doesn't it just P&%# ya off when that happens?

Sadly I've been there far too many times, but what can 1 person do?

Cheers.
ID: 1382944 · Report as offensive
Profile Tim
Volunteer tester
Avatar

Send message
Joined: 19 May 99
Posts: 211
Credit: 278,573,354
RAC: 182
Greece
Message 1382946 - Posted: 20 Jun 2013, 6:45:16 UTC - in response to Message 1382944.  

Doesn't it just P&%# ya off when that happens?

Sadly I've been there far too many times, but what can 1 person do?

Cheers.


We are 2 now :-)
ID: 1382946 · Report as offensive
Lionel

Send message
Joined: 25 Mar 00
Posts: 665
Credit: 351,287,103
RAC: 139,506
Australia
Message 1382948 - Posted: 20 Jun 2013, 6:54:02 UTC - in response to Message 1382946.  


It's poorly thought out code/operation. They should have thought more than they did.
ID: 1382948 · Report as offensive
Sakletare
Avatar

Send message
Joined: 18 May 99
Posts: 132
Credit: 22,922,427
RAC: 3,426
Sweden
Message 1382950 - Posted: 20 Jun 2013, 6:57:13 UTC

Sometimes I wish that the scheduler would send the workunit to different types of applications to safeguard against bugs, especially when there's an error.
ID: 1382950 · Report as offensive
Profile Wiggo "Socialist"
Avatar

Send message
Joined: 24 Jan 00
Posts: 10534
Credit: 135,466,190
RAC: 41,163
Australia
Message 1382953 - Posted: 20 Jun 2013, 7:03:33 UTC - in response to Message 1382946.  


We are 2 now :-)

I bet that there are a lot more than just us around here. ;-)

Cheers.
ID: 1382953 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 3070
Credit: 122,723,190
RAC: 91,860
United States
Message 1382961 - Posted: 20 Jun 2013, 7:28:29 UTC

What do yo think about this;

ap_01mr09ad_B1_P1_00224_20130619_24332.wu
3044311467 	7008627 	20 Jun 2013, 1:05:10 UTC 	20 Jun 2013, 1:10:18 UTC 	Error while computing 	0.00 	0.00 	--- 	AstroPulse v6 v6.06 (cal_ati)
3044311468 	6797524 	20 Jun 2013, 1:05:12 UTC 	15 Jul 2013, 1:05:12 UTC 	In progress 	--- 	--- 	--- 	AstroPulse v6 Anonymous platform (ATI GPU)
3044318853 	7016051 	20 Jun 2013, 1:10:24 UTC 	20 Jun 2013, 1:33:16 UTC 	Error while computing 	0.00 	0.00 	--- 	AstroPulse v6 v6.06 (cal_ati)
3044348714 	6958381 	20 Jun 2013, 1:33:24 UTC 	20 Jun 2013, 1:38:32 UTC 	Error while computing 	0.00 	0.00 	--- 	AstroPulse v6 v6.06 (cal_ati)
3044354760 	6743006 	20 Jun 2013, 1:38:43 UTC 	20 Jun 2013, 1:44:29 UTC 	Error while computing 	0.00 	0.00 	--- 	AstroPulse v6 v6.01
3044363043 	5944441 	20 Jun 2013, 1:44:35 UTC 	20 Jun 2013, 1:49:43 UTC 	Error while computing 	0.00 	0.00 	--- 	AstroPulse v6 v6.06 (ati_opencl_100)
3044369811 	5856725 	20 Jun 2013, 1:49:48 UTC 	20 Jun 2013, 1:54:57 UTC 	Error while computing 	0.00 	0.00 	--- 	AstroPulse v6 v6.06 (ati_opencl_100)

Why should I even bother? This thing is gonna die. I'm going to run it and then receive an Invalid for my trouble. Whut?
ID: 1382961 · Report as offensive
Profile Tim
Volunteer tester
Avatar

Send message
Joined: 19 May 99
Posts: 211
Credit: 278,573,354
RAC: 182
Greece
Message 1382965 - Posted: 20 Jun 2013, 8:03:56 UTC - in response to Message 1382961.  

What do yo think about this;

ap_01mr09ad_B1_P1_00224_20130619_24332.wu
3044311467 	7008627 	20 Jun 2013, 1:05:10 UTC 	20 Jun 2013, 1:10:18 UTC 	Error while computing 	0.00 	0.00 	--- 	AstroPulse v6 v6.06 (cal_ati)
3044311468 	6797524 	20 Jun 2013, 1:05:12 UTC 	15 Jul 2013, 1:05:12 UTC 	In progress 	--- 	--- 	--- 	AstroPulse v6 Anonymous platform (ATI GPU)
3044318853 	7016051 	20 Jun 2013, 1:10:24 UTC 	20 Jun 2013, 1:33:16 UTC 	Error while computing 	0.00 	0.00 	--- 	AstroPulse v6 v6.06 (cal_ati)
3044348714 	6958381 	20 Jun 2013, 1:33:24 UTC 	20 Jun 2013, 1:38:32 UTC 	Error while computing 	0.00 	0.00 	--- 	AstroPulse v6 v6.06 (cal_ati)
3044354760 	6743006 	20 Jun 2013, 1:38:43 UTC 	20 Jun 2013, 1:44:29 UTC 	Error while computing 	0.00 	0.00 	--- 	AstroPulse v6 v6.01
3044363043 	5944441 	20 Jun 2013, 1:44:35 UTC 	20 Jun 2013, 1:49:43 UTC 	Error while computing 	0.00 	0.00 	--- 	AstroPulse v6 v6.06 (ati_opencl_100)
3044369811 	5856725 	20 Jun 2013, 1:49:48 UTC 	20 Jun 2013, 1:54:57 UTC 	Error while computing 	0.00 	0.00 	--- 	AstroPulse v6 v6.06 (ati_opencl_100)

Why should I even bother? This thing is gonna die. I'm going to run it and then receive an Invalid for my trouble. Whut?



Same thing. Server prefer to send wu to ATI and cpu.

I wonder how many of my 500 pending AP wu’s are the same.

Tim

ID: 1382965 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 3070
Credit: 122,723,190
RAC: 91,860
United States
Message 1382966 - Posted: 20 Jun 2013, 8:21:14 UTC - in response to Message 1382965.  

It appears a large number of the older machines are having a problem with the new cal_ati app. I also had a problem with the cal_ati app with the 13.1 Legacy driver. There are a few others, but a large number are using that one driver. The app seems to work fine with the older driver 11.12. Interesting...

Workunit 1266285264
3044305625 	5095320 	20 Jun 2013, 1:00:31 UTC 	20 Jun 2013, 1:05:37 UTC 	Error while computing 	0.00 	0.00 	--- 	AstroPulse v6 v6.06 (cal_ati)
3044305626 	7024445 	20 Jun 2013, 1:00:30 UTC 	20 Jun 2013, 1:05:38 UTC 	Error while computing 	0.00 	0.00 	--- 	AstroPulse v6 v6.06 (cal_ati)
3044312144 	6909960 	20 Jun 2013, 1:05:43 UTC 	20 Jun 2013, 1:21:58 UTC 	Error while computing 	0.00 	0.00 	--- 	AstroPulse v6 v6.06 (cal_ati)
3044312145 	5462673 	20 Jun 2013, 1:05:46 UTC 	20 Jun 2013, 1:10:53 UTC 	Error while computing 	0.00 	0.00 	--- 	AstroPulse v6 v6.06 (cal_ati)
3044318942 	6991546 	20 Jun 2013, 1:11:05 UTC 	15 Jul 2013, 1:11:05 UTC 	In progress 	--- 	--- 	--- 	AstroPulse v6 Anonymous platform (CPU)
3044334081 	5215447 	20 Jun 2013, 1:22:10 UTC 	20 Jun 2013, 1:27:22 UTC 	Error while computing 	0.00 	0.00 	--- 	AstroPulse v6 v6.06 (ati_opencl_100)
3044341692 	6797524 	20 Jun 2013, 1:27:41 UTC 	15 Jul 2013, 1:27:41 UTC 	In progress 	--- 	--- 	--- 	AstroPulse v6 Anonymous platform (ATI GPU)


ID: 1382966 · Report as offensive
Profile Tim
Volunteer tester
Avatar

Send message
Joined: 19 May 99
Posts: 211
Credit: 278,573,354
RAC: 182
Greece
Message 1382978 - Posted: 20 Jun 2013, 10:09:15 UTC

2 more added again from ATI hosts.

Someone must kick something.

This is a waste of resources.


Tim

ID: 1382978 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 3070
Credit: 122,723,190
RAC: 91,860
United States
Message 1382982 - Posted: 20 Jun 2013, 10:20:55 UTC - in response to Message 1382978.  
Last modified: 20 Jun 2013, 10:27:18 UTC

I'm seeing a lot of these...

3044237945 	6946917 	19 Jun 2013, 23:55:20 UTC 	14 Jul 2013, 23:55:20 UTC 	In progress 	--- 	--- 	--- 	AstroPulse v6 v6.02
3044237946 	6863602 	19 Jun 2013, 23:55:21 UTC 	20 Jun 2013, 0:00:28 UTC 	Error while computing 	0.00 	0.00 	--- 	AstroPulse v6 v6.06 (cal_ati)
3044244036 	5877996 	20 Jun 2013, 0:00:31 UTC 	20 Jun 2013, 3:39:29 UTC 	Error while computing 	0.00 	0.00 	--- 	AstroPulse v6 v6.06 (cal_ati)
3044507440 	6940010 	20 Jun 2013, 3:48:58 UTC 	20 Jun 2013, 3:55:02 UTC 	Error while computing 	0.00 	0.00 	--- 	AstroPulse v6 v6.06 (cal_ati)
3044529297 	6908180 	20 Jun 2013, 4:09:19 UTC 	20 Jun 2013, 4:14:47 UTC 	Error while computing 	0.00 	0.00 	--- 	AstroPulse v6 v6.06 (cal_ati)
3044554208 	6991375 	20 Jun 2013, 4:29:18 UTC 	20 Jun 2013, 5:05:26 UTC 	Error while computing 	0.00 	0.00 	--- 	AstroPulse v6 v6.06 (cal_ati)
3044622244 	6797524 	20 Jun 2013, 5:28:09 UTC 	15 Jul 2013, 5:28:09 UTC 	In progress 	--- 	--- 	--- 	AstroPulse v6 Anonymous platform (ATI GPU)

Nasty...
All AstroPulse v6 tasks

Here they come... http://setiathome.berkeley.edu/results.php?hostid=6645126
ID: 1382982 · Report as offensive
Profile Tim
Volunteer tester
Avatar

Send message
Joined: 19 May 99
Posts: 211
Credit: 278,573,354
RAC: 182
Greece
Message 1382985 - Posted: 20 Jun 2013, 11:00:20 UTC

The list is growing.

Way to go...

Tim

ID: 1382985 · Report as offensive
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 14,572,833
RAC: 9,939
Message 1382997 - Posted: 20 Jun 2013, 11:58:43 UTC

Oh dear, that looks like the new Brook app is having problems. One for Raistmer.

I fear that changing the scheduler so that it spreads problematic units across different platforms requires a fair bit of coding on David's part. Not something easily set in motion.
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1382997 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6108
Credit: 155,326,877
RAC: 50,082
United States
Message 1383023 - Posted: 20 Jun 2013, 13:42:38 UTC

This would be the same as the issue where a CPU task it processed & uploaded. The wingmate is a nvidia GPU that trashes a workunit recording 30 spikes and flagging it with -9 overflow. Then it gets sent to a 3rd host on a nvidia GPU that proceeds to do the same thing. So the two nvidia results matched up and the one good CPU result is flagged as invalid.

When this was first noticed, a few years ago iirc, there was a suggestion that something be implemented so specific hardware/software would get flagged and the task sent to something different. So that valid science data could be collected instead of tossed into the bin.
However that would add a lot of complexity to the server backend. Which is already rather complex.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the BP6/VP6 User Group today!
ID: 1383023 · Report as offensive
Juha
Volunteer tester

Send message
Joined: 7 Mar 04
Posts: 307
Credit: 336,953
RAC: 114
Finland
Message 1383041 - Posted: 20 Jun 2013, 14:40:20 UTC

It might be worth considering using the reliable hosts mechanism.

Even thought the advertising says it's for accelerating retries that doesn't mean it needs to be used for that. Setting the avg turnaround time to something high and delay bound multiplier to 1.0 wouldn't exclude any good hosts from getting work but it would prevent bad hosts from trashing workunits.

I don't think it would increase server load much (no promises!) so the only question is do we have enough reliable hosts.
ID: 1383041 · Report as offensive
kittymanProject Donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 45949
Credit: 815,434,558
RAC: 124,405
United States
Message 1383125 - Posted: 20 Jun 2013, 18:10:13 UTC
Last modified: 20 Jun 2013, 18:15:26 UTC

I just contacted Eric and he says that the cal_ati app has been deprecated and is not currently active or being distributed.

Which means the hosts that have been using it will crunch up whatever work they have cached, but the servers will no longer send any new work for that application.

I assume that it may be brought back after bugfix and further testing, but Eric did not specifically say that.
Always remember.....kitties are all Angels with fur.

Have made friends in this life.
Most were cats.
ID: 1383125 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6108
Credit: 155,326,877
RAC: 50,082
United States
Message 1383153 - Posted: 20 Jun 2013, 19:00:59 UTC - in response to Message 1383125.  

I just contacted Eric and he says that the cal_ati app has been deprecated and is not currently active or being distributed.

Which means the hosts that have been using it will crunch up whatever work they have cached, but the servers will no longer send any new work for that application.

I assume that it may be brought back after bugfix and further testing, but Eric did not specifically say that.

He seemed to be rather frustrated with driver version detection issues in BOINC over on beta. So it could be a bit before we see this all released again. If it was due to that kind of issue anyway.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the BP6/VP6 User Group today!
ID: 1383153 · Report as offensive
terencewee*

Send message
Joined: 10 Oct 09
Posts: 53
Credit: 7,022,510
RAC: 0
Malaysia
Message 1383155 - Posted: 20 Jun 2013, 19:06:08 UTC
Last modified: 20 Jun 2013, 19:09:58 UTC

Encountering similar problem, so far 2 completed but can't validate.

This host had processed thousands of valid AP-WUs and for a moment I thought something is wrong with it.

Affected WUs:
1266433106
1266480341

Run a script to sweep-thru and resubmit affected WUs to different platform?

EDIT: Specifically not to (cal_ati) and (ati_opencl_100) as both platforms are encountering computing error.
terencewee*
Sicituradastra.
ID: 1383155 · Report as offensive
ClaggyProject Donor
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4623
Credit: 46,353,416
RAC: 2,918
United Kingdom
Message 1383186 - Posted: 20 Jun 2013, 20:34:47 UTC - in response to Message 1383155.  

Encountering similar problem, so far 2 completed but can't validate.

This host had processed thousands of valid AP-WUs and for a moment I thought something is wrong with it.

Affected WUs:
1266433106
1266480341

Run a script to sweep-thru and resubmit affected WUs to different platform?

EDIT: Specifically not to (cal_ati) and (ati_opencl_100) as both platforms are encountering computing error.

The problem with the (ati_opencl_100) plan_class (for Boinc 6 hosts) is that the app is going out to Hosts with really old CAL drivers when OpenCL support was never included,

http://setiathome.berkeley.edu/show_host_detail.php?hostid=5421155

http://setiathome.berkeley.edu/show_host_detail.php?hostid=5798321

These two hosts listed above are running Cat 10.5 (CAL 1.4.636) and Cat 9.7 (CAL 1.4.344), they need at least Cat 11.1 (CAL 1.4.900) for OpenCL support to be included,
but since you can't tell that apart from Cat 10.12 where OpenCL support was only available with the APP edition, and not the Normal edition,
then the minimum needs to be Cat 11.2 (CAL 1.4.1016), and possibly later than that.
(and even that doesn't guarantee that it'll work when sent to every host since you could at the time download the bare driver without Catalyst Control Centre,
and without the OpenCL driver, the OpenCL driver being a smallish additional download)

Claggy
ID: 1383186 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 5431
Credit: 66,415,646
RAC: 12,774
Russia
Message 1383211 - Posted: 20 Jun 2013, 21:31:12 UTC - in response to Message 1382997.  
Last modified: 20 Jun 2013, 21:33:14 UTC

Oh dear, that looks like the new Brook app is having problems. One for Raistmer.

I fear that changing the scheduler so that it spreads problematic units across different platforms requires a fair bit of coding on David's part. Not something easily set in motion.

http://setiweb.ssl.berkeley.edu/beta/forum_thread.php?id=2031&postid=46399
http://setiweb.ssl.berkeley.edu/beta/forum_thread.php?id=2031&postid=46400
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1383211 · Report as offensive
1 · 2 · 3 · Next

Message boards : Number crunching : This is not fair


 
©2016 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.