To Whomever this concerns: Backoffs and error -6

Message boards : Number crunching : To Whomever this concerns: Backoffs and error -6
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 9 · Next

AuthorMessage
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65839
Credit: 55,293,173
RAC: 49
United States
Message 887000 - Posted: 21 Apr 2009, 21:31:24 UTC

Please Discontinue the Backoffs on Error -6 as a certain amount of VLAR's(WU's) are currently assigned wrongly to the gpu and should be reassigned to the cpu, But since there is no mechanism in the Seti apps or Boinc Itself currently to do this(although It has been suggested I've read) the only other alternative is yep Error -6 so that the WU can be reassigned to some other cruncher somewhere else that hopefully doesn't have a gpu(Nvidia video card, Or later on ATi video cards maybe), If You can't do this then return the $20 donation to Me. As I had one WU on the gpu here take over 5 hours.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 887000 · Report as offensive
Profile Blurf
Volunteer tester

Send message
Joined: 2 Sep 06
Posts: 8962
Credit: 12,678,685
RAC: 0
United States
Message 887002 - Posted: 21 Apr 2009, 21:33:50 UTC

SJ,

I have sent your concerns directly to Eric and Matt for a response from them on this issue.


ID: 887002 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65839
Credit: 55,293,173
RAC: 49
United States
Message 887003 - Posted: 21 Apr 2009, 21:34:46 UTC - in response to Message 887002.  

SJ,

I have sent your concerns directly to Eric and Matt for a response from them on this issue.

Thanks Blurf.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 887003 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 887004 - Posted: 21 Apr 2009, 21:34:53 UTC - in response to Message 887000.  

Sorry, but if you choose to use an unofficial App that kills VLAR's with an error condition, then, as far as I am concerned, that is your choice. Don't blame the project for the consequences.

F.
ID: 887004 · Report as offensive
Andy Williams
Volunteer tester
Avatar

Send message
Joined: 11 May 01
Posts: 187
Credit: 112,464,820
RAC: 0
United States
Message 887007 - Posted: 21 Apr 2009, 21:38:40 UTC - in response to Message 887000.  

I had one WU on the gpu here take over 5 hours.


I just aborted three. One was over 11 1/2 hours, two were over ten hours. Nowhere near done.
--
Classic 82353 WU / 400979 h
ID: 887007 · Report as offensive
Profile skildude
Avatar

Send message
Joined: 4 Oct 00
Posts: 9541
Credit: 50,759,529
RAC: 60
Yemen
Message 887010 - Posted: 21 Apr 2009, 21:42:30 UTC
Last modified: 21 Apr 2009, 21:43:30 UTC

First, you are using a development version of BOINC. second running seti and making donations are voluntary. I guess thats 2 and 3. 4th the way it works is the way it works. Seti apps dont look at teh angle range before assigning them to a cpu or GPU. It just assigns them. If you have hanging WU's on your GPU you should try raistmers v.11 app. This app is supposed to eliminate the hanging GPU WU's. As I recall BOINC 6.6.20 already has the VLAR killer in it. I'm not sure how anyone would work around the -6 error but VLARs are still a problm for a few folks


In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope
ID: 887010 · Report as offensive
Andy Williams
Volunteer tester
Avatar

Send message
Joined: 11 May 01
Posts: 187
Credit: 112,464,820
RAC: 0
United States
Message 887014 - Posted: 21 Apr 2009, 21:51:45 UTC - in response to Message 887010.  
Last modified: 21 Apr 2009, 21:52:01 UTC

First, you are using a development version of BOINC.


I'm not sure who this is addressed to, but I personally am running a development version on one non-CUDA machine. 6.6.20 on all others. If the VLAR killer is in 6.6.20 (first time I've heard that), it clearly is buggy. I am already using Raistmer's apps on all 6.6.20 machines.
--
Classic 82353 WU / 400979 h
ID: 887014 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 887015 - Posted: 21 Apr 2009, 21:55:57 UTC - in response to Message 887010.  

As I recall BOINC 6.6.20 already has the VLAR killer in it. I'm not sure how anyone would work around the -6 error but VLARs are still a problm for a few folks

Nope. VLAR killer is only in Raistmer's apps. 6.6.20 will feed you VLAR's if they are next in the queue when you ask.

F.
ID: 887015 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65839
Credit: 55,293,173
RAC: 49
United States
Message 887016 - Posted: 21 Apr 2009, 21:56:24 UTC - in response to Message 887014.  
Last modified: 21 Apr 2009, 21:57:06 UTC

First, you are using a development version of BOINC.


I'm not sure who this is addressed to, but I personally am running a development version on one non-CUDA machine. 6.6.20 on all others. If the VLAR killer is in 6.6.20 (first time I've heard that), it clearly is buggy. I am already using Raistmer's apps on all 6.6.20 machines.

Actually It's in the apps Raistmer makes(but not all), It's not in any version of Boinc.

Don't bother Fred W, Yer blocked.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 887016 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 887020 - Posted: 21 Apr 2009, 22:04:58 UTC - in response to Message 887016.  


Actually It's in the apps Raistmer makes(but not all), It's not in any version of Boinc.

Don't bother Fred W, Yer blocked.

He does have a point. The application is intentionally throwing errors, and what BOINC does on an error was known beforehand.

The effort to "fix" this could/should go into the application itself to find out why the CUDA card does these slowly and address it there, not by returning work because it doesn't pay well enough.

I understand Nvidia did the port, and they'd likely be the best ones to fix it.
ID: 887020 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65839
Credit: 55,293,173
RAC: 49
United States
Message 887022 - Posted: 21 Apr 2009, 22:09:54 UTC - in response to Message 887020.  
Last modified: 21 Apr 2009, 22:11:05 UTC


Actually It's in the apps Raistmer makes(but not all), It's not in any version of Boinc.

Don't bother Fred W, Yer blocked.

He does have a point. The application is intentionally throwing errors, and what BOINC does on an error was known beforehand.

The effort to "fix" this could/should go into the application itself to find out why the CUDA card does these slowly and address it there, not by returning work because it doesn't pay well enough.

I understand Nvidia did the port, and they'd likely be the best ones to fix it.

pay, smay, It just takes too damn long, Sure It should be done in the app, So far no one knows how to fix this I've read, Not a clue, So reassign the dumb things and allow the WU to be aborted by the app in question, Seti has already had a lot of users depart for various reasons, Do they need more to depart forever? Before something positive on the subject is done?
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 887022 · Report as offensive
Andy Williams
Volunteer tester
Avatar

Send message
Joined: 11 May 01
Posts: 187
Credit: 112,464,820
RAC: 0
United States
Message 887024 - Posted: 21 Apr 2009, 22:10:02 UTC - in response to Message 887020.  
Last modified: 21 Apr 2009, 22:11:33 UTC

The effort to "fix" this could/should go into the application itself to find out why the CUDA card does these slowly and address it there, not by returning work because it doesn't pay well enough.


The point is not the payment or credits. It's about using the proper tool for the job. Wasting 24 hours, or worse hanging the video card completely, on a VLAR is an enormous waste of computational resources.
--
Classic 82353 WU / 400979 h
ID: 887024 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65839
Credit: 55,293,173
RAC: 49
United States
Message 887025 - Posted: 21 Apr 2009, 22:12:09 UTC - in response to Message 887024.  

The effort to "fix" this could/should go into the application itself to find out why the CUDA card does these slowly and address it there, not by returning work because it doesn't pay well enough.


The point is not the payment or credits. It about using the proper tool for the job. Wasting 24 hours, or worse hanging the video card completely, on a VLAR is an enormous waste of computational resources.

I hear Ya, Mine hung yesterday and I had to reboot the PC. Another nail.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 887025 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65839
Credit: 55,293,173
RAC: 49
United States
Message 887032 - Posted: 21 Apr 2009, 22:17:24 UTC

Backoffs as in Communication Deferred on the Projects Tab in Boinc and It can be as much as 24 hours, You want to punish somebody? What's next Banishment for using the VLAR killer? Until the project can come up with a better idea that works, Stop punishing the users for this.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 887032 · Report as offensive
Profile Blurf
Volunteer tester

Send message
Joined: 2 Sep 06
Posts: 8962
Credit: 12,678,685
RAC: 0
United States
Message 887040 - Posted: 21 Apr 2009, 22:39:56 UTC - in response to Message 887032.  

Backoffs as in Communication Deferred on the Projects Tab in Boinc and It can be as much as 24 hours, You want to punish somebody? What's next Banishment for using the VLAR killer? Until the project can come up with a better idea that works, Stop punishing the users for this.


SJ--please relax. Your concern has been escalated to the Admins and I am sure they will address it when they can.


ID: 887040 · Report as offensive
Moabiter

Send message
Joined: 9 Dec 02
Posts: 79
Credit: 215,029
RAC: 0
Germany
Message 887042 - Posted: 21 Apr 2009, 22:44:43 UTC

Hi there!
I'm not sure of how many VLARS are around, but this problem will be fade away, when they are all crunched?
How long will it take to deal with 2 month (Dec/Jan) of problem WUs?

In my opinion, yes, it is a waste of resources if the WU could be done 5 times faster.
Non-Optimized SSEx Astropulse Apps are 10 times slower, also a waste of resources (for todays CPUs).

I think, if the coder are working on future apps, they should not waste time to 'optimize' 'old' apps.
You need a new mainboard if you want more CPU-power (mostly).

To contribute to Seti as a small cruncher more effectively would be a big motivation boost.

Btw. OT: Can someone point me to the Raimster + .xml and any needed .dlls to run optimized AP + S@Home Enhanched but without killing the VLARs, please?
Browsing the forums I only come across VLAR-killers.

Thanks =)
During browsing berkeley's homepage I came along this statement that makes me say:
"ME!, ME!, ME!, wants to know, what patch was used and is it available for windows?"
ID: 887042 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14655
Credit: 200,643,578
RAC: 874
United Kingdom
Message 887044 - Posted: 21 Apr 2009, 22:47:05 UTC - in response to Message 887020.  

He does have a point. The application is intentionally throwing errors, and what BOINC does on an error was known beforehand.

The effort to "fix" this could/should go into the application itself to find out why the CUDA card does these slowly and address it there, not by returning work because it doesn't pay well enough.

I understand Nvidia did the port, and they'd likely be the best ones to fix it.

I agree with you and Fred. The VLAR kill is merely a workround - and you might say a clumsy workround - for a problem which is proving surprisingly hard to fix. If you use a workround, you should be aware of the consequences, and be prepared to live with them.

As it happens, I had an email today from the nVidia developer who did a major part of the port. He said:

I am aware of the VLAR sluggishness and have spent some time trying different way to resolve it given the current pulse detection algorithm with not much luck. The bottom line is that when using CUDA on the same GPU that also used as the graphical display, it is difficult to prioritize one type of GPU client over another on the current HW architecture. SETI@home is not alone, this issue can affect any CUDA application that demands high amounts of time from CUDA.

And BTW, BOINC does not impose a communications backoff as a result of a single reported error. If you report 100 in a row, with no valid work at all, then quota limits will kick in to protect the project servers (which heaven knows are in a fragile enough state). But if you have valid work to return, backoff can always be overcome.
ID: 887044 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65839
Credit: 55,293,173
RAC: 49
United States
Message 887051 - Posted: 21 Apr 2009, 22:58:27 UTC - in response to Message 887040.  

Backoffs as in Communication Deferred on the Projects Tab in Boinc and It can be as much as 24 hours, You want to punish somebody? What's next Banishment for using the VLAR killer? Until the project can come up with a better idea that works, Stop punishing the users for this.


SJ--please relax. Your concern has been escalated to the Admins and I am sure they will address it when they can.

I was only elaborating on what I meant by backoffs Blurf(so relax) as there are other backoffs due to traffic problems which are a different subject for some other thread. :D
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 887051 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 887055 - Posted: 21 Apr 2009, 23:05:10 UTC

The other point that occurs to me is "Has SJ actually run out of work because of the 24 hour back-off?" (He obviously can't answer this himself since I am blacklisted). As Richard says, if the VLAR killer returns 100+ consecutive VLARs then the back-off will be 24 hours but only until a non-error is returned and there should be plenty of those well within that 24 hour period so all it needs is a click on the "Update" button to report them isn't it?

F.
ID: 887055 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 887056 - Posted: 21 Apr 2009, 23:08:06 UTC - in response to Message 887022.  


Actually It's in the apps Raistmer makes(but not all), It's not in any version of Boinc.

Don't bother Fred W, Yer blocked.

He does have a point. The application is intentionally throwing errors, and what BOINC does on an error was known beforehand.

The effort to "fix" this could/should go into the application itself to find out why the CUDA card does these slowly and address it there, not by returning work because it doesn't pay well enough.

I understand Nvidia did the port, and they'd likely be the best ones to fix it.

pay, smay, It just takes too damn long, Sure It should be done in the app, So far no one knows how to fix this I've read, Not a clue, So reassign the dumb things and allow the WU to be aborted by the app in question, Seti has already had a lot of users depart for various reasons, Do they need more to depart forever? Before something positive on the subject is done?

I've got an AP work unit which has 266 hours on it so far. I think it'll finish sometime tomorrow.

It's not a problem, it is the perception of a problem.

My question is: is there a problem, or are people leaving because of the constant complaining?
ID: 887056 · Report as offensive
1 · 2 · 3 · 4 . . . 9 · Next

Message boards : Number crunching : To Whomever this concerns: Backoffs and error -6


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.