Perhaps my 7th wingman will be the charm! (or maybe the 8th)

Message boards : Number crunching : Perhaps my 7th wingman will be the charm! (or maybe the 8th)
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1578599 - Posted: 27 Sep 2014, 3:01:34 UTC

I'm posting this just for fun. WU #1561509069 seems to have been a real hot potato for about 7 weeks now, with a whole bunch of different reasons for getting dropped. My host is the first one, patiently waiting for another reliable host to come along. Even the one that finally "finished", to trigger the inconclusive, is a runaway machine that got a 30/30 overflow! The real irony is that, after all is said and done, since my host found 0 single pules and 0 repetitive pulses, all this churning will be for naught, anyway. ;^)

ID: 1578599 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1578605 - Posted: 27 Sep 2014, 3:22:21 UTC - in response to Message 1578599.  
Last modified: 27 Sep 2014, 3:33:45 UTC

I was sent one of those, http://setiathome.berkeley.edu/workunit.php?wuid=1594867559
Which lead me to find TWO other 'Old' ATI cards running the ATI CAL App. Similar to others, the Driver installed doesn't even have OpenCL, driver: 1.4.1417 & driver: 1.4.1385
So, why is the server sending OpenCL tasks to Hosts that don't have OpenCL which then have to Abort them? Along the same line, Why is the Server sending ATI_CAL tasks to Hosts with ATI Driver 1.4.1734 when it has been known 'forever' that Driver 1734 (Legacy Driver) doesn't work with the ATI_CAL App?
It all results in Aborted/Error results which clog the System...
ID: 1578605 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1578617 - Posted: 27 Sep 2014, 4:06:31 UTC - in response to Message 1578605.  

I was sent one of those, http://setiathome.berkeley.edu/workunit.php?wuid=1594867559

And that looks like another WU with no pulses to be found, despite all the back and forth it'll end up with.

I seem to recall that in June of last year, there was a major fiasco with one of the AP apps for ATI that was causing Computation errors just as fast as the scheduler could send them out. A lot of WUs were failing with too many errors after 6 wingmen crapped out. I think that's the most wingmen I've ever run across, until now. Also, the only time I ever got a "Completed, can't validate" status for one of my tasks.
ID: 1578617 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1578621 - Posted: 27 Sep 2014, 4:24:07 UTC - in response to Message 1578617.  
Last modified: 27 Sep 2014, 4:57:07 UTC

I was sent one of those, http://setiathome.berkeley.edu/workunit.php?wuid=1594867559

And that looks like another WU with no pulses to be found, despite all the back and forth it'll end up with.

I seem to recall that in June of last year, there was a major fiasco with one of the AP apps for ATI that was causing Computation errors just as fast as the scheduler could send them out. A lot of WUs were failing with too many errors after 6 wingmen crapped out. I think that's the most wingmen I've ever run across, until now. Also, the only time I ever got a "Completed, can't validate" status for one of my tasks.

Yes, it was the same App. As I recall, the problem was caused by a corrupted copy being placed on the server due to a failing Flash Drive...I could be wrong. It was a while ago.
I'm Not wrong about those Old Drivers Not having OpenCL, or that the ATI_CAL App WILL NOT WORK with the ATI Legacy Driver 1734 (Legacy 13.1 & Legacy 13.9). The App works fine with the Intended drivers. One has to question why a task titled "ati_nocal" is being sent to an App titled "cal_ati". opencl_ati_nocal_100 was meant for the New ATI cards that DON'T have CAL, NOT the Old ATI cards that ONLY have CAL.

These 2 Don't work either, GeForce 9800 GT (511MB) driver: 340.32.
ID: 1578621 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1578839 - Posted: 27 Sep 2014, 18:41:57 UTC - in response to Message 1578621.  

Yep, another Invalid for the 9800 GT running driver 340.xx;
@Pre-FERMI nVidia GPU users: Important warning
So, if you want to use your pre-FERMI nVidia hardware for AstroPulse crunching stay with pre-340.xx drivers.
ID: 1578839 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1588574 - Posted: 18 Oct 2014, 3:53:47 UTC - in response to Message 1578599.  

I'm posting this just for fun. WU #1561509069 seems to have been a real hot potato for about 7 weeks now, with a whole bunch of different reasons for getting dropped. My host is the first one, patiently waiting for another reliable host to come along. Even the one that finally "finished", to trigger the inconclusive, is a runaway machine that got a 30/30 overflow! The real irony is that, after all is said and done, since my host found 0 single pules and 0 repetitive pulses, all this churning will be for naught, anyway. ;^)

4 days and counting until the 7th wingman times out, as well (no contact from that host since 8 Oct). I think that's the last chance this WU will have. It's definitely jinxed!

This is just the sort of WU that could conceivably drag out the demise of AP v6 for months, too.
ID: 1588574 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1590190 - Posted: 22 Oct 2014, 15:43:51 UTC
Last modified: 22 Oct 2014, 16:21:02 UTC

And now I have my 8th wingman, after the 7th one timed out. I thought 7 would be the limit, but I guess we'll soon find out if it stops at 8, since number 8's task summary doesn't indicate a particularly successful host.

State: All (70) · In progress (8) · Validation pending (0) · Validation inconclusive (0) · Valid (1) · Invalid (0) · Error (61) 


This is downright comical!

Edit: Although now that I take a second look at his task list, the one and only Valid task he has is an AP v6. Maybe there's still hope!
ID: 1590190 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1590222 - Posted: 22 Oct 2014, 16:48:06 UTC - in response to Message 1590190.  

And now I have my 8th wingman, after the 7th one timed out. I thought 7 would be the limit, but I guess we'll soon find out if it stops at 8, since number 8's task summary doesn't indicate a particularly successful host.

State: All (70) · In progress (8) · Validation pending (0) · Validation inconclusive (0) · Valid (1) · Invalid (0) · Error (61) 


This is downright comical!

Edit: Although now that I take a second look at his task list, the one and only Valid task he has is an AP v6. Maybe there's still hope!

The BOINC temporary exits on the AP v7 tasks because the drivers are too old should be ensuring the host actually makes some progress on that AP v6 task each time the GPU gets some crunching time. Still, the on again off again processing probably means at least a couple of days to finish even though the run time is likely to be around half a day.
                                                                   Joe
ID: 1590222 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1590291 - Posted: 22 Oct 2014, 19:18:37 UTC

Okay, you can quit worrying now. You got validated 20 minutes ago, less than 13 hours after the last host was assigned the task.
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1590291 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1590386 - Posted: 22 Oct 2014, 21:21:41 UTC - in response to Message 1590291.  

LOL

Well, it was certainly entertaining while it lasted! I see one last bit of mystery in that last host's Stderr, which looks to be truncated, after multiple restarts, with no pulse counts included. A fitting finish.
ID: 1590386 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1590658 - Posted: 23 Oct 2014, 11:42:01 UTC

And you are lucky cause last valid result could be invalid with easy:

OpenCL 1.1 AMD-APP-SDK-v2.4 (650.9)

It's unsupported SDK version.
ID: 1590658 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1590743 - Posted: 23 Oct 2014, 14:47:11 UTC - in response to Message 1590658.  

And you are lucky cause last valid result could be invalid with easy:

OpenCL 1.1 AMD-APP-SDK-v2.4 (650.9)

It's unsupported SDK version.

Unsupported for "Windows x86 rev 1832, V6 match, by Raistmer" build? I've been assuming your switch to the newer SDK took place after that build.

The host is of course erroring on all AP v7 tasks, and the user will soon need to update it with a later Catalyst version to have it remain productive. I did send a PM.
                                                                   Joe
ID: 1590743 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1590829 - Posted: 23 Oct 2014, 17:26:15 UTC - in response to Message 1590743.  

I did send a PM.
                                                                   Joe

If he takes notice of your PM and updates his host, at least something positive will come out of this little episode. I think at least a few of the other wingmen on this WU could use a similar nudge. From time to time, I've tried sending PMs to other users when I saw a wingman's host that had just recently appeared to go off the rails, but only one of them ever responded, and I think even that took about a month.

It's a shame that the project doesn't have some functionality in one of the servers that would automatically generate an email to a user when a host crosses some defined threshold of Invalids and Errors, perhaps when those results exceed 50% of Valid results. The email wouldn't have to diagnose the problem, just point out to the user that a problem appears to exist and direct them to the Message Boards if they need assistance. I can't help but feel that such a process could enable a whole lot of hosts to regain lost productivity, which would surely be a good thing for the project.

The current "system", which relies on individual users to occasionally PM other users, with what I suspect are widely varying degrees of diplomacy, doesn't seem like it accomplishes much. Then, too, quite a few of the wayward rigs belong to Anonymous users who can't be PM'd in the first place. Only the project administrators, or an automated system they implement, can reach the Anonymous ones.

Automatically generating some emails, perhaps once every couple of weeks or once a month, can't be that big a deal, can it?
ID: 1590829 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1590887 - Posted: 23 Oct 2014, 18:45:14 UTC - in response to Message 1590743.  
Last modified: 23 Oct 2014, 18:46:11 UTC

And you are lucky cause last valid result could be invalid with easy:

OpenCL 1.1 AMD-APP-SDK-v2.4 (650.9)

It's unsupported SDK version.

Unsupported for "Windows x86 rev 1832, V6 match, by Raistmer" build? I've been assuming your switch to the newer SDK took place after that build.

The host is of course erroring on all AP v7 tasks, and the user will soon need to update it with a later Catalyst version to have it remain productive. I did send a PM.
                                                                   Joe


Hm, good question. There were some v6 build with SDK 2.6 but w/o code that blocks execution in case of SDK 2.4 (like in v7). Can't recall in what range falls 1832 though.
ID: 1590887 · Report as offensive

Message boards : Number crunching : Perhaps my 7th wingman will be the charm! (or maybe the 8th)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.