Message boards :
Number crunching :
This is not fair
Message board moderation
Author | Message |
---|---|
Tim Send message Joined: 19 May 99 Posts: 211 Credit: 278,575,259 RAC: 0 |
Today I had 3 invalid AP tasks at my top rig. (ID: 6716400) All 3 tasks were ‘’Completed, can't validate’’ by my rig. As I saw all the wings were with ATI gpus. Why didn’t the server send anything to a different Nvidia gpu and trash the wu with ‘’Too many errors (may have bug)’’? Tim |
Wiggo Send message Joined: 24 Jan 00 Posts: 34744 Credit: 261,360,520 RAC: 489 |
Doesn't it just P&%# ya off when that happens? Sadly I've been there far too many times, but what can 1 person do? Cheers. |
Tim Send message Joined: 19 May 99 Posts: 211 Credit: 278,575,259 RAC: 0 |
Doesn't it just P&%# ya off when that happens? We are 2 now :-) |
Lionel Send message Joined: 25 Mar 00 Posts: 680 Credit: 563,640,304 RAC: 597 |
It's poorly thought out code/operation. They should have thought more than they did. |
Sakletare Send message Joined: 18 May 99 Posts: 132 Credit: 23,423,829 RAC: 0 |
Sometimes I wish that the scheduler would send the workunit to different types of applications to safeguard against bugs, especially when there's an error. |
Wiggo Send message Joined: 24 Jan 00 Posts: 34744 Credit: 261,360,520 RAC: 489 |
I bet that there are a lot more than just us around here. ;-) Cheers. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
What do yo think about this; ap_01mr09ad_B1_P1_00224_20130619_24332.wu 3044311467 7008627 20 Jun 2013, 1:05:10 UTC 20 Jun 2013, 1:10:18 UTC Error while computing 0.00 0.00 --- AstroPulse v6 v6.06 (cal_ati) 3044311468 6797524 20 Jun 2013, 1:05:12 UTC 15 Jul 2013, 1:05:12 UTC In progress --- --- --- AstroPulse v6 Anonymous platform (ATI GPU) 3044318853 7016051 20 Jun 2013, 1:10:24 UTC 20 Jun 2013, 1:33:16 UTC Error while computing 0.00 0.00 --- AstroPulse v6 v6.06 (cal_ati) 3044348714 6958381 20 Jun 2013, 1:33:24 UTC 20 Jun 2013, 1:38:32 UTC Error while computing 0.00 0.00 --- AstroPulse v6 v6.06 (cal_ati) 3044354760 6743006 20 Jun 2013, 1:38:43 UTC 20 Jun 2013, 1:44:29 UTC Error while computing 0.00 0.00 --- AstroPulse v6 v6.01 3044363043 5944441 20 Jun 2013, 1:44:35 UTC 20 Jun 2013, 1:49:43 UTC Error while computing 0.00 0.00 --- AstroPulse v6 v6.06 (ati_opencl_100) 3044369811 5856725 20 Jun 2013, 1:49:48 UTC 20 Jun 2013, 1:54:57 UTC Error while computing 0.00 0.00 --- AstroPulse v6 v6.06 (ati_opencl_100) Why should I even bother? This thing is gonna die. I'm going to run it and then receive an Invalid for my trouble. Whut? |
Tim Send message Joined: 19 May 99 Posts: 211 Credit: 278,575,259 RAC: 0 |
What do yo think about this; Same thing. Server prefer to send wu to ATI and cpu. I wonder how many of my 500 pending AP wu’s are the same. Tim |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
It appears a large number of the older machines are having a problem with the new cal_ati app. I also had a problem with the cal_ati app with the 13.1 Legacy driver. There are a few others, but a large number are using that one driver. The app seems to work fine with the older driver 11.12. Interesting... Workunit 1266285264 3044305625 5095320 20 Jun 2013, 1:00:31 UTC 20 Jun 2013, 1:05:37 UTC Error while computing 0.00 0.00 --- AstroPulse v6 v6.06 (cal_ati) 3044305626 7024445 20 Jun 2013, 1:00:30 UTC 20 Jun 2013, 1:05:38 UTC Error while computing 0.00 0.00 --- AstroPulse v6 v6.06 (cal_ati) 3044312144 6909960 20 Jun 2013, 1:05:43 UTC 20 Jun 2013, 1:21:58 UTC Error while computing 0.00 0.00 --- AstroPulse v6 v6.06 (cal_ati) 3044312145 5462673 20 Jun 2013, 1:05:46 UTC 20 Jun 2013, 1:10:53 UTC Error while computing 0.00 0.00 --- AstroPulse v6 v6.06 (cal_ati) 3044318942 6991546 20 Jun 2013, 1:11:05 UTC 15 Jul 2013, 1:11:05 UTC In progress --- --- --- AstroPulse v6 Anonymous platform (CPU) 3044334081 5215447 20 Jun 2013, 1:22:10 UTC 20 Jun 2013, 1:27:22 UTC Error while computing 0.00 0.00 --- AstroPulse v6 v6.06 (ati_opencl_100) 3044341692 6797524 20 Jun 2013, 1:27:41 UTC 15 Jul 2013, 1:27:41 UTC In progress --- --- --- AstroPulse v6 Anonymous platform (ATI GPU) |
Tim Send message Joined: 19 May 99 Posts: 211 Credit: 278,575,259 RAC: 0 |
2 more added again from ATI hosts. Someone must kick something. This is a waste of resources. Tim |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
I'm seeing a lot of these... 3044237945 6946917 19 Jun 2013, 23:55:20 UTC 14 Jul 2013, 23:55:20 UTC In progress --- --- --- AstroPulse v6 v6.02 3044237946 6863602 19 Jun 2013, 23:55:21 UTC 20 Jun 2013, 0:00:28 UTC Error while computing 0.00 0.00 --- AstroPulse v6 v6.06 (cal_ati) 3044244036 5877996 20 Jun 2013, 0:00:31 UTC 20 Jun 2013, 3:39:29 UTC Error while computing 0.00 0.00 --- AstroPulse v6 v6.06 (cal_ati) 3044507440 6940010 20 Jun 2013, 3:48:58 UTC 20 Jun 2013, 3:55:02 UTC Error while computing 0.00 0.00 --- AstroPulse v6 v6.06 (cal_ati) 3044529297 6908180 20 Jun 2013, 4:09:19 UTC 20 Jun 2013, 4:14:47 UTC Error while computing 0.00 0.00 --- AstroPulse v6 v6.06 (cal_ati) 3044554208 6991375 20 Jun 2013, 4:29:18 UTC 20 Jun 2013, 5:05:26 UTC Error while computing 0.00 0.00 --- AstroPulse v6 v6.06 (cal_ati) 3044622244 6797524 20 Jun 2013, 5:28:09 UTC 15 Jul 2013, 5:28:09 UTC In progress --- --- --- AstroPulse v6 Anonymous platform (ATI GPU) Nasty... All AstroPulse v6 tasks Here they come... http://setiathome.berkeley.edu/results.php?hostid=6645126 |
Tim Send message Joined: 19 May 99 Posts: 211 Credit: 278,575,259 RAC: 0 |
The list is growing. Way to go... Tim |
William Send message Joined: 14 Feb 13 Posts: 2037 Credit: 17,689,662 RAC: 0 |
Oh dear, that looks like the new Brook app is having problems. One for Raistmer. I fear that changing the scheduler so that it spreads problematic units across different platforms requires a fair bit of coding on David's part. Not something easily set in motion. A person who won't read has no advantage over one who can't read. (Mark Twain) |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
This would be the same as the issue where a CPU task it processed & uploaded. The wingmate is a nvidia GPU that trashes a workunit recording 30 spikes and flagging it with -9 overflow. Then it gets sent to a 3rd host on a nvidia GPU that proceeds to do the same thing. So the two nvidia results matched up and the one good CPU result is flagged as invalid. When this was first noticed, a few years ago iirc, there was a suggestion that something be implemented so specific hardware/software would get flagged and the task sent to something different. So that valid science data could be collected instead of tossed into the bin. However that would add a lot of complexity to the server backend. Which is already rather complex. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Juha Send message Joined: 7 Mar 04 Posts: 388 Credit: 1,857,738 RAC: 0 |
It might be worth considering using the reliable hosts mechanism. Even thought the advertising says it's for accelerating retries that doesn't mean it needs to be used for that. Setting the avg turnaround time to something high and delay bound multiplier to 1.0 wouldn't exclude any good hosts from getting work but it would prevent bad hosts from trashing workunits. I don't think it would increase server load much (no promises!) so the only question is do we have enough reliable hosts. |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
I just contacted Eric and he says that the cal_ati app has been deprecated and is not currently active or being distributed. Which means the hosts that have been using it will crunch up whatever work they have cached, but the servers will no longer send any new work for that application. I assume that it may be brought back after bugfix and further testing, but Eric did not specifically say that. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
I just contacted Eric and he says that the cal_ati app has been deprecated and is not currently active or being distributed. He seemed to be rather frustrated with driver version detection issues in BOINC over on beta. So it could be a bit before we see this all released again. If it was due to that kind of issue anyway. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
terencewee* Send message Joined: 10 Oct 09 Posts: 53 Credit: 7,022,510 RAC: 0 |
Encountering similar problem, so far 2 completed but can't validate. This host had processed thousands of valid AP-WUs and for a moment I thought something is wrong with it. Affected WUs: 1266433106 1266480341 Run a script to sweep-thru and resubmit affected WUs to different platform? EDIT: Specifically not to (cal_ati) and (ati_opencl_100) as both platforms are encountering computing error. terencewee* Sicituradastra. |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Encountering similar problem, so far 2 completed but can't validate. The problem with the (ati_opencl_100) plan_class (for Boinc 6 hosts) is that the app is going out to Hosts with really old CAL drivers when OpenCL support was never included, http://setiathome.berkeley.edu/show_host_detail.php?hostid=5421155 http://setiathome.berkeley.edu/show_host_detail.php?hostid=5798321 These two hosts listed above are running Cat 10.5 (CAL 1.4.636) and Cat 9.7 (CAL 1.4.344), they need at least Cat 11.1 (CAL 1.4.900) for OpenCL support to be included, but since you can't tell that apart from Cat 10.12 where OpenCL support was only available with the APP edition, and not the Normal edition, then the minimum needs to be Cat 11.2 (CAL 1.4.1016), and possibly later than that. (and even that doesn't guarantee that it'll work when sent to every host since you could at the time download the bare driver without Catalyst Control Centre, and without the OpenCL driver, the OpenCL driver being a smallish additional download) Claggy |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Oh dear, that looks like the new Brook app is having problems. One for Raistmer. http://setiweb.ssl.berkeley.edu/beta/forum_thread.php?id=2031&postid=46399 http://setiweb.ssl.berkeley.edu/beta/forum_thread.php?id=2031&postid=46400 SETI apps news We're not gonna fight them. We're gonna transcend them. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.