Message boards :
Number crunching :
To Whomever this concerns: Backoffs and error -6
Message board moderation
Author | Message |
---|---|
![]() Send message Joined: 2 Sep 06 Posts: 8964 Credit: 12,678,685 RAC: 0 ![]() |
SJ, I have sent your concerns directly to Eric and Matt for a response from them on this issue. ![]() ![]() |
Fred W Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0 ![]() |
Sorry, but if you choose to use an unofficial App that kills VLAR's with an error condition, then, as far as I am concerned, that is your choice. Don't blame the project for the consequences. F. ![]() |
Andy Williams ![]() Send message Joined: 11 May 01 Posts: 187 Credit: 112,464,820 RAC: 0 ![]() |
I had one WU on the gpu here take over 5 hours. I just aborted three. One was over 11 1/2 hours, two were over ten hours. Nowhere near done. -- Classic 82353 WU / 400979 h ![]() |
![]() ![]() Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60 ![]() ![]() |
First, you are using a development version of BOINC. second running seti and making donations are voluntary. I guess thats 2 and 3. 4th the way it works is the way it works. Seti apps dont look at teh angle range before assigning them to a cpu or GPU. It just assigns them. If you have hanging WU's on your GPU you should try raistmers v.11 app. This app is supposed to eliminate the hanging GPU WU's. As I recall BOINC 6.6.20 already has the VLAR killer in it. I'm not sure how anyone would work around the -6 error but VLARs are still a problm for a few folks ![]() In a rich man's house there is no place to spit but his face. Diogenes Of Sinope |
Andy Williams ![]() Send message Joined: 11 May 01 Posts: 187 Credit: 112,464,820 RAC: 0 ![]() |
First, you are using a development version of BOINC. I'm not sure who this is addressed to, but I personally am running a development version on one non-CUDA machine. 6.6.20 on all others. If the VLAR killer is in 6.6.20 (first time I've heard that), it clearly is buggy. I am already using Raistmer's apps on all 6.6.20 machines. -- Classic 82353 WU / 400979 h ![]() |
Fred W Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0 ![]() |
As I recall BOINC 6.6.20 already has the VLAR killer in it. I'm not sure how anyone would work around the -6 error but VLARs are still a problm for a few folks Nope. VLAR killer is only in Raistmer's apps. 6.6.20 will feed you VLAR's if they are next in the queue when you ask. F. ![]() |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 ![]() |
He does have a point. The application is intentionally throwing errors, and what BOINC does on an error was known beforehand. The effort to "fix" this could/should go into the application itself to find out why the CUDA card does these slowly and address it there, not by returning work because it doesn't pay well enough. I understand Nvidia did the port, and they'd likely be the best ones to fix it. |
Andy Williams ![]() Send message Joined: 11 May 01 Posts: 187 Credit: 112,464,820 RAC: 0 ![]() |
The effort to "fix" this could/should go into the application itself to find out why the CUDA card does these slowly and address it there, not by returning work because it doesn't pay well enough. The point is not the payment or credits. It's about using the proper tool for the job. Wasting 24 hours, or worse hanging the video card completely, on a VLAR is an enormous waste of computational resources. -- Classic 82353 WU / 400979 h ![]() |
![]() Send message Joined: 2 Sep 06 Posts: 8964 Credit: 12,678,685 RAC: 0 ![]() |
Backoffs as in Communication Deferred on the Projects Tab in Boinc and It can be as much as 24 hours, You want to punish somebody? What's next Banishment for using the VLAR killer? Until the project can come up with a better idea that works, Stop punishing the users for this. SJ--please relax. Your concern has been escalated to the Admins and I am sure they will address it when they can. ![]() ![]() |
Moabiter Send message Joined: 9 Dec 02 Posts: 79 Credit: 215,029 RAC: 0 ![]() |
Hi there! I'm not sure of how many VLARS are around, but this problem will be fade away, when they are all crunched? How long will it take to deal with 2 month (Dec/Jan) of problem WUs? In my opinion, yes, it is a waste of resources if the WU could be done 5 times faster. Non-Optimized SSEx Astropulse Apps are 10 times slower, also a waste of resources (for todays CPUs). I think, if the coder are working on future apps, they should not waste time to 'optimize' 'old' apps. You need a new mainboard if you want more CPU-power (mostly). To contribute to Seti as a small cruncher more effectively would be a big motivation boost. Btw. OT: Can someone point me to the Raimster + .xml and any needed .dlls to run optimized AP + S@Home Enhanched but without killing the VLARs, please? Browsing the forums I only come across VLAR-killers. Thanks =) During browsing berkeley's homepage I came along this statement that makes me say: "ME!, ME!, ME!, wants to know, what patch was used and is it available for windows?" |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
He does have a point. The application is intentionally throwing errors, and what BOINC does on an error was known beforehand. I agree with you and Fred. The VLAR kill is merely a workround - and you might say a clumsy workround - for a problem which is proving surprisingly hard to fix. If you use a workround, you should be aware of the consequences, and be prepared to live with them. As it happens, I had an email today from the nVidia developer who did a major part of the port. He said: I am aware of the VLAR sluggishness and have spent some time trying different way to resolve it given the current pulse detection algorithm with not much luck. The bottom line is that when using CUDA on the same GPU that also used as the graphical display, it is difficult to prioritize one type of GPU client over another on the current HW architecture. SETI@home is not alone, this issue can affect any CUDA application that demands high amounts of time from CUDA. And BTW, BOINC does not impose a communications backoff as a result of a single reported error. If you report 100 in a row, with no valid work at all, then quota limits will kick in to protect the project servers (which heaven knows are in a fragile enough state). But if you have valid work to return, backoff can always be overcome. |
Fred W Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0 ![]() |
The other point that occurs to me is "Has SJ actually run out of work because of the 24 hour back-off?" (He obviously can't answer this himself since I am blacklisted). As Richard says, if the VLAR killer returns 100+ consecutive VLARs then the back-off will be 24 hours but only until a non-error is returned and there should be plenty of those well within that 24 hour period so all it needs is a click on the "Update" button to report them isn't it? F. ![]() |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 ![]() |
I've got an AP work unit which has 266 hours on it so far. I think it'll finish sometime tomorrow. It's not a problem, it is the perception of a problem. My question is: is there a problem, or are people leaving because of the constant complaining? |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 ![]() |
That adds an interesting question: is there a way to reduce the "graphics" demand on the card if the owner is mostly interested in crunching. Setting the display to the minimum resolution and the minimum color depth might make a difference -- and would be an interesting experiment. I wonder if some of the same kind optimizations (rearranging execution order) that help the CPU apps would help the GPU apps. |
Moabiter Send message Joined: 9 Dec 02 Posts: 79 Credit: 215,029 RAC: 0 ![]() |
Reminds me of hybrid sli. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 ![]() |
No, I'm comparing minutes to minutes. As far as I can tell, your complaint is that some CUDA takes longer than others. If you don't care about credit, then it's just runtime. |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
My question is: is there a problem, or are people leaving because of the constant complaining? I doubt people are leaving just because of the complaining. I would have sympathy with people who leave because they are confused by the plethora of threads, acronyms, in-jokes, technical jargon and conflicting advice - such as skildude's absurd suggestion earlier in this thread that "BOINC 6.6.20 already has the VLAR killer in it". No it doesn't, and never will do: BOINC versions don't have project-specific workrounds in them. I would also have sympathy with a new user who leaves after receiving the stock AP application and an AP task as their first experience of SETI. Without calm, clear advice that the initial estimate is overstated by roughly 2.5 times - AP on a Core2 class processor is deliberated designed for a target TDCF of 0.4 [there, I'm doing the technical jargon already] - and that the new user should at least watch the 'To completion' estimate for an hour or so before clicking the 'abort' button, my reaction might be the same. My understanding is that the next version of Astropulse - currently in testing at Beta - will be deployed here in a way which ensures that AP will not be issued as the first task to a newly-joined host or user. That should help. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 ![]() |
My question is: is there a problem, or are people leaving because of the constant complaining? The part that bothers me most is the fact that people seem to have such wild expectations. The project discloses a number of things right off the bat, like the fact that they will be out of work at times. We then see people complaining loudly that the project is out of work. They build machines that crunch incredibly fast (and spend real money on them) and then complain that the project can't keep their cache full. I understand the complaints, but the project is not to blame for the project doing what it said it would do. I think it's all about expectations. If people see what they expect, they'll stay. If people come in with unreasonable expectations, they'll be disappointed and they'll leave. I crunch beta (AP & MB) using their stock apps. AP 5.05 is a lot faster. |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
As it happens, I had an email today from the nVidia developer who did a major part of the port. He said: There are two problems with the VLAR tasks. 1) They take absolutely ages 2) If they are running on the GUI GPU, they cause sluggishness and lagging on the display and other applications using that display. SJ's problem is with (1). I have every sympathy. I don't do VLAR on CUDA either - it's not worth it. But I use a more sophisticated workround - one developed, as it happens, by Fred W, who is too modest by half. SJ, you should learn who your friends are, and reconsider your decision not to listen to Fred W. You might learn something to your advantage. Problem (1) is the hard one, and nobody seems to have a solution. Jason Gee is working on it - he found a reference today which may help: "WooHoo, Found it in Bailey's paper. "FFTs in External or Hierarchical Memory", David H. Bailey December 30, 1989. Ref: Journal of Supercomputing, vol. 4, no. 1 (March 1990), p. 23{35". But don't hold your breath. The nVidia developer's concern - today - was with problem (2), the screen lag. He had a useful suggestion - make the new "don't use GPU while computer is in use" option in BOINC v6.6.20 apply only to GPU cards driving the user interface display - which I have forwarded to Eric and David. Watch this space, as they say. |
Moabiter Send message Joined: 9 Dec 02 Posts: 79 Credit: 215,029 RAC: 0 ![]() |
From what I've read Hybrid SLI has been discontinued by Nvidia. Realy? bummer.. was planing to get one when it's more developed. Anyways, an additional $20 non CUDA-GPU might help? Sadly, if someone has a system setup like that without having performance problems won't post here to verify my theory, right? =) But buying another card would not be the best workaround. And I don't know why people leave. What I know is that people seem to join when the movie "Contact" plays on tv. ^^ (maybe this is the case only in Germany) During browsing berkeley's homepage I came along this statement that makes me say: "ME!, ME!, ME!, wants to know, what patch was used and is it available for windows?" |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.