This is not fair

Message boards : Number crunching : This is not fair
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1383233 - Posted: 20 Jun 2013, 23:41:18 UTC - in response to Message 1383211.  

Oh dear, that looks like the new Brook app is having problems. One for Raistmer.

I fear that changing the scheduler so that it spreads problematic units across different platforms requires a fair bit of coding on David's part. Not something easily set in motion.

http://setiweb.ssl.berkeley.edu/beta/forum_thread.php?id=2031&postid=46399
http://setiweb.ssl.berkeley.edu/beta/forum_thread.php?id=2031&postid=46400

Yes, dear, I saw Eric's posts. You may wish to note that I posted that before Eric found the files were damaged. :P
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1383233 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1383235 - Posted: 20 Jun 2013, 23:55:26 UTC - in response to Message 1383233.  
Last modified: 21 Jun 2013, 0:51:20 UTC

Is it safe to allow new tasks yet? When I saw all those 'problem' work units headed my way I just hit No New Tasks to wait out the storm. No sense in running a task when there were already 6 errors in the unit.

BTW, I'm seeing a couple of these cal_ati tasks return a 'repetitive pulses: 30' when the wingperson reports much less. The GPU has proven itself with many completed OpenCL tasks, so, I don't think it's the card. The only thing I can think of is there might be a problem running more than 1 cal task at a time.
single pulses: 0
repetitive pulses: 30
percent blanked: 80.95
ID: 1383235 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1383273 - Posted: 21 Jun 2013, 5:53:41 UTC

Eric sent me a message about 9 hours ago, whilst I was away at work.

At that time, he said new versions were being sent out.
He asked that we monitor the error situation and I should let him know how it goes.
If there are still too many errors, he will pull the new versions.

I will monitor this thread for your reports.


Thanks and meow.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1383273 · Report as offensive
terencewee*

Send message
Joined: 10 Oct 09
Posts: 53
Credit: 7,022,510
RAC: 0
Malaysia
Message 1383275 - Posted: 21 Jun 2013, 6:03:36 UTC - in response to Message 1383186.  
Last modified: 21 Jun 2013, 6:25:07 UTC

Encountering similar problem, so far 2 completed but can't validate.

This host had processed thousands of valid AP-WUs and for a moment I thought something is wrong with it.

Affected WUs:
1266433106
1266480341

Run a script to sweep-thru and resubmit affected WUs to different platform?

EDIT: Specifically not to (cal_ati) and (ati_opencl_100) as both platforms are encountering computing error.

The problem with the (ati_opencl_100) plan_class (for Boinc 6 hosts) is that the app is going out to Hosts with really old CAL drivers when OpenCL support was never included,

http://setiathome.berkeley.edu/show_host_detail.php?hostid=5421155

http://setiathome.berkeley.edu/show_host_detail.php?hostid=5798321

These two hosts listed above are running Cat 10.5 (CAL 1.4.636) and Cat 9.7 (CAL 1.4.344), they need at least Cat 11.1 (CAL 1.4.900) for OpenCL support to be included,
but since you can't tell that apart from Cat 10.12 where OpenCL support was only available with the APP edition, and not the Normal edition,
then the minimum needs to be Cat 11.2 (CAL 1.4.1016), and possibly later than that.
(and even that doesn't guarantee that it'll work when sent to every host since you could at the time download the bare driver without Catalyst Control Centre,
and without the OpenCL driver, the OpenCL driver being a smallish additional download)

Claggy



You are correct.

Most (ati_opencl_100) errors caused by old drivers and we know (cal_ati) errors are due to corrupt brook dll (fixing).

Please take a look at this WU: 1266476872

Specifically Host:5335631.

Coprocessors AMD ATI Radeon HD5800 series (Cypress) (1024MB) driver: 1.4.1385

<core_client_version>6.10.18</core_client_version>
<![CDATA[
<message>
 - exit code -1073741515 (0xc0000135)
</message>
]]>


EDIT: An outlier as it's having computing errors, when using >1.4.1016.


Maybe it is time someone more knowledgeable put together a minimum system requirement to run AP-Opencl-ATI (and other apps)? e.g.
Min BOINC version, min device spec & min driver version

A sticky post in News/Number Crunching will help answer most "Why don't I get <insert-plan-class> tasks?".

This will also ease troubleshooting (most) help-request cases.

Clear/defined scheduler rules should help tremendously at the server-end, user-end and support-end.
terencewee*
Sicituradastra.
ID: 1383275 · Report as offensive
Profile Tim
Volunteer tester
Avatar

Send message
Joined: 19 May 99
Posts: 211
Credit: 278,575,259
RAC: 0
Greece
Message 1383277 - Posted: 21 Jun 2013, 6:09:34 UTC

I think Host:5335631 is a ‘’set and forget machine’’.

That’s why he is using boinc version 6.10.18.

But yes, minimum system requirement is a good idea.

Tim

ID: 1383277 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1383283 - Posted: 21 Jun 2013, 6:39:23 UTC - in response to Message 1383277.  

I think Host:5335631 is a ‘’set and forget machine’’.

That’s why he is using boinc version 6.10.18.

A lot of people chose to stay with v6 due to the some of the more annoying things about v7. I only recently upgraded from v6 to v7. Not sure i'll ever get used to the layout of v7.
Grant
Darwin NT
ID: 1383283 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1383288 - Posted: 21 Jun 2013, 7:09:37 UTC - in response to Message 1383275.  

exit code -1073741515 (0xc0000135)

is a missing DLL problem - but it doesn't tell you which DLL.
ID: 1383288 · Report as offensive
Profile Tim
Volunteer tester
Avatar

Send message
Joined: 19 May 99
Posts: 211
Credit: 278,575,259
RAC: 0
Greece
Message 1383289 - Posted: 21 Jun 2013, 7:15:14 UTC

It is like that at the beginning, but when you go to v7, you will see that it is very good.

Yes there are some minor thinks, but you will use to it.

I am using it almost from the beginning, and if you tell me now go back to v6, I will say a big NO.

Tim

ID: 1383289 · Report as offensive
Profile tullio
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 8797
Credit: 2,930,782
RAC: 1
Italy
Message 1383327 - Posted: 21 Jun 2013, 9:23:45 UTC

I am still using BOINC 6.10.58 on my Linux boxes. SETI@home V7 and AP 6.01 by Lunatics run flawlessly. Same for other 5 BOINC projects.
Tullio
ID: 1383327 · Report as offensive
Profile Tim
Volunteer tester
Avatar

Send message
Joined: 19 May 99
Posts: 211
Credit: 278,575,259
RAC: 0
Greece
Message 1383328 - Posted: 21 Jun 2013, 9:47:36 UTC - in response to Message 1383327.  

I am still using BOINC 6.10.58 on my Linux boxes. SETI@home V7 and AP 6.01 by Lunatics run flawlessly. Same for other 5 BOINC projects.
Tullio


How can you say that?

You have 1 in progress a and 2 pendings at your machines.

But when you have a big amount of wu’s like me, eventually the problem will appear.

Tim

ID: 1383328 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1383420 - Posted: 21 Jun 2013, 17:11:36 UTC

Well....
Just got word from Eric that the revised edition of the cal_ati apps did not fare too much better than the first.
So they have been disabled as well.

In addition he adds..."The fix may get complicated."

So, I would not look for those apps to be reactivated any time real soon.

Back to the drawing board, I guess.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1383420 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1383453 - Posted: 21 Jun 2013, 18:50:27 UTC - in response to Message 1383420.  

Well....
Just got word from Eric that the revised edition of the cal_ati apps did not fare too much better than the first.
So they have been disabled as well.

In addition he adds..."The fix may get complicated."

So, I would not look for those apps to be reactivated any time real soon.

Back to the drawing board, I guess.

This may not be a bad thing. After my initial success with Brook+ all my completions, while running 2 at a time, are giving 'repetitive pulses: 30'. Too Bad really. I was getting run times of around 35k with r557, and around 30k with Brook+. Some are much less than 30k, that's a nice speedup...if it worked.

Run time: 25,554.97
CPU time: 25,809.73
single pulses: 0
repetitive pulses: 30
percent blanked: 2.98
ID: 1383453 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1383547 - Posted: 22 Jun 2013, 2:22:02 UTC

Guess I'll add my 2 cents (or 13,428.89 seconds of wasted run time) to this thread, just for the sake of piling on the evidence:

http://setiathome.berkeley.edu/workunit.php?wuid=1266538088

This is my very first invalid. Does that count as a milestone? :-(
ID: 1383547 · Report as offensive
bluestar

Send message
Joined: 5 Sep 12
Posts: 6995
Credit: 2,084,789
RAC: 3
Message 1383549 - Posted: 22 Jun 2013, 2:44:52 UTC

Supposedly nVidia is doing this thing a little better...
ID: 1383549 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1383557 - Posted: 22 Jun 2013, 3:38:58 UTC - in response to Message 1383547.  

Guess I'll add my 2 cents (or 13,428.89 seconds of wasted run time) to this thread, just for the sake of piling on the evidence:

http://setiathome.berkeley.edu/workunit.php?wuid=1266538088

This is my very first invalid. Does that count as a milestone? :-(

And if it even matters for you, that particular error does not reset your "consecutive valid tasks" count over on your application details for that machine. I had an issue similar to that over a year ago where I just got a bad luck WU where my CPU app was the only one that actually ran all the way through it, and I thought for sure my streak of 1700+ consecutive was going to get reset, but it didn't.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1383557 · Report as offensive
S@NL - John van Gorsel
Volunteer tester
Avatar

Send message
Joined: 5 Jul 99
Posts: 193
Credit: 139,673,078
RAC: 0
Netherlands
Message 1383632 - Posted: 22 Jun 2013, 12:14:00 UTC

And here's another one: 1266412592
14 hours of CPU time against 6 ATi's that failed to run this Astropulse task


Seti@Netherlands website
ID: 1383632 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1383634 - Posted: 22 Jun 2013, 12:29:14 UTC - in response to Message 1383632.  
Last modified: 22 Jun 2013, 12:29:39 UTC

And here's another one: 1266412592
14 hours of CPU time against 6 ATi's that failed to run this Astropulse task

I like to see this WU reproceced by CPU or a NV and see what happens...
That´s could be interesting.
ID: 1383634 · Report as offensive
Sakletare
Avatar

Send message
Joined: 18 May 99
Posts: 132
Credit: 23,423,829
RAC: 0
Sweden
Message 1383677 - Posted: 22 Jun 2013, 14:50:04 UTC - in response to Message 1382997.  

I fear that changing the scheduler so that it spreads problematic units across different platforms requires a fair bit of coding on David's part. Not something easily set in motion.

They really should get it fixed. Science/workunits are wasted. ET could be hiding in one of them.
ID: 1383677 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 1383684 - Posted: 22 Jun 2013, 15:49:37 UTC

I have one, wonder if I'll get credits ;)

http://setiathome.berkeley.edu/workunit.php?wuid=1266352994
ID: 1383684 · Report as offensive
BetelgeuseFive Project Donor
Volunteer tester

Send message
Joined: 6 Jul 99
Posts: 158
Credit: 17,117,787
RAC: 19
Netherlands
Message 1383691 - Posted: 22 Jun 2013, 16:41:53 UTC - in response to Message 1383677.  

I fear that changing the scheduler so that it spreads problematic units across different platforms requires a fair bit of coding on David's part. Not something easily set in motion.

They really should get it fixed. Science/workunits are wasted. ET could be hiding in one of them.


There may be an easier solution. Most of the hosts involved here consistently report errors on GPU tasks. I think the number of tasks sent to these hosts should be limited to 1 (or only a couple) per day (so people have a chance to fix the problem and get tasks again). Right now they get a (single) task, it fails immediately and 5 minutes later they report the task that has the error and get a new task. This means that a single host can still send hundreds of invalid/error tasks every day. Would it be difficult to check if a host (once it has processed a number of tasks) has at least a certain percentage of valid results (during the last say 50 reported tasks) and limit the number of new tasks to a single task (or a couple of tasks) per day ?
Most of the information required is already available (I can find it in my host details).

Tom

ID: 1383691 · Report as offensive
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : This is not fair


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.