Message boards :
Number crunching :
holy cow! 20 timeouts
Message board moderation
Author | Message |
---|---|
David S Send message Joined: 4 Oct 99 Posts: 18352 Credit: 27,761,924 RAC: 12 |
I know the explanation for short deadline timeouts, but I have a question about the explanation. The explanation is that the host asks for work for, say, GPU, and the Scheduler responds by assigning a bunch of tasks for GPU, but the message never reaches the host so it can start downloading them. A few minutes later, the host again asks for work, but this time only for CPU, and it sends in the list of what it has on hand. The Scheduler looks at the list and says, "hey, I assigned this other bunch of work to you, but you don't have it, and I can't assign it again because you're not asking for GPU this time, so I have no choice but to time it out on you." Okay, fine. BUT..... if the host asked for GPU and (as far as it knows) didn't get any, why wouldn't it ask for GPU again the next time? While typing the above, I began to wonder something... Does the list the host sends the Scheduler include everything that it knows to have been assigned, even if it has't been downloaded yet, or is it only what's been downloaded? If the latter, then download slowness could be at the root of many of the short timeouts we all seem to experience at one time or another. It could probably also be fixed with a fairly minor tweak of the code somewhere (says the guy who knows nothing about coding, but who is smart enough to know such a tweak might have consequences I don't see). David Sitting on my butt while others boldly go, Waiting for a message from a small furry creature from Alpha Centauri. |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
Last time I checked the ghost task VLAR timeout was more opposite of your description. 1) Host requests work for CPU or CPU & GPU. 2) Server assigns VLAR tasks to the CPU, but the host doesn't receive the list. 3) Host requests work for just GPU. 4) Server wants to send the tasks the host doesn't have. 5) The check for not allowing VLAR tasks on the GPU is tripped. 6) The tasks are marked as "Timed out - no response". I think there was some talk about trying to prevent this, but I don't know if that was just talk or not. IIRC: When a host makes a request it sends its list of all tasks in the client_state.xml. So if they are being processes, waiting, or in a transfer state it will tell the server "Hey I have these tasks". SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
It's more likely to be the other way round. Your computer asks for CPU tasks, and is assigned VLAR workunits (somebody was saying yesterday that there were a lot around at the moment). If that message gets lost, and the next request is for GPU work, that's when the deadlines get fudged (because of the "don't send VLAR to nVidia" rule). Apart from the 'no VLAR to NV' situation, there's nothing (except possibly your preferences) to stop a "lost" result being issued to a different computing resource the second time round - when VLARs were being issued to GPUs because of a bug recently, I published instructions for a technique of deliberately losing them and getting them resent to CPU instead. That's the situation you suggest might cause problems, but it worked well for those who tried it. As regards the client telling the server about work it knows has been allocated, but not yet downloaded - I don't know. I'll take a look next time I have some downloads stuck. Murphy has decreed that I'm all green, just at the moment... Edit - got some. Yes, confirming Hal's post - allocated but not yet downloaded do get reported to the server. |
Horacio Send message Joined: 14 Jan 00 Posts: 536 Credit: 75,967,266 RAC: 0 |
if the host asked for GPU and (as far as it knows) didn't get any, why wouldn't it ask for GPU again the next time? As said before, it asks for CPU tasks, they are lost and then, at least 5 mins before, they asks again for more tasks... Well, in those 5 mins it may happen that the GPU Cache reach the "request more work" trigger so it asks for CPU and GPU, as the scheduller will try to fill first the GPU cache it will try to assign the losts tasks (originally intended for the CPU) to the GPU... There are something else, if you are also attached to other projects, after the request fails to receive the needed CPU tasks, it migh ask for CPU tasks to another project while waiting for the 5 min delay, and if it gets tasks then it may not need CPU tasks anymore. |
Akio Send message Joined: 18 May 11 Posts: 375 Credit: 32,129,242 RAC: 0 |
Check out all my timeouts. I've never experienced this before. Can someone explain to me what might be causing this all of the sudden, and what steps I can take to prevent it from happening in the future? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Check out all my timeouts. I've never experienced this before. Can someone explain to me what might be causing this all of the sudden, and what steps I can take to prevent it from happening in the future? Reposting with a HostID link - other volunteers aren't allowed to follow a UserID link. Error tasks for computer 5967851 Look at all those VLARs, as previously discussed. |
Akio Send message Joined: 18 May 11 Posts: 375 Credit: 32,129,242 RAC: 0 |
Look at all those VLARs, as previously discussed. I don't understand any of that jargon. Riddle me this simply: Is it something that I did or that shows that there is something wrong with my computer that I can fix/change? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Look at all those VLARs, as previously discussed. Neither. It's a slightly quirky part of the SETI system. It wasn't caused by anything you did: there's nothing wrong with your computer: and there's nothing you can do to fix it. Just hang on tight and enjoy the ride. |
Akio Send message Joined: 18 May 11 Posts: 375 Credit: 32,129,242 RAC: 0 |
Just hang on tight and enjoy the ride. Thanks for helping me again, Richard :) |
bj Send message Joined: 11 Oct 00 Posts: 163 Credit: 50,429,507 RAC: 0 |
Did have almost 300. Went doen tp 152 and now back to 226 timed-outs. Like they say: hang in for the ride. bj |
Wedge009 Send message Joined: 3 Apr 99 Posts: 451 Credit: 431,396,357 RAC: 553 |
Yep, I have a sharp increase in errors due to VLARs misdirected to GPUs as well. Somewhat-related question: With the recent changes to the scheduler, are VLARs no longer sent to ATI GPUs now? I'm certainly not complaining - while VLARs don't suffer with ATI GPUs nearly as badly as NV GPUs, I do often get a drop in the responsiveness of the OS GUI. I just haven't noticed any VLARs assigned to the ATI GPUs over the past several weeks and I'm enjoying this change, if it was indeed a deliberate change in the scheduler code. Soli Deo Gloria |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Somewhat-related question: With the recent changes to the scheduler, are VLARs no longer sent to ATI GPUs now? I'm certainly not complaining - while VLARs don't suffer with ATI GPUs nearly as badly as NV GPUs, I do often get a drop in the responsiveness of the OS GUI. I just haven't noticed any VLARs assigned to the ATI GPUs over the past several weeks and I'm enjoying this change, if it was indeed a deliberate change in the scheduler code. I noticed a couple of weeks ago while looking through someone's ATI cache that he didn't have any VLARs on the ATI, so i expect it's true, don't know whether it was a deliberate change or not, Claggy |
disco_nnected Send message Joined: 19 Dec 06 Posts: 16 Credit: 13,654,017 RAC: 66 |
I'm also getting a bunch of timeouts last few days.... |
chromespringer Send message Joined: 3 Dec 05 Posts: 296 Credit: 55,183,482 RAC: 0 |
Check out all my timeouts. I've never experienced this before. Can someone explain to me what might be causing this all of the sudden, and what steps I can take to prevent it from happening in the future? Samten, if you look @ your task manager under application, all error-ed tasks appear to be cpu applications as do mine. They appear to time out shortly after they download. |
chromespringer Send message Joined: 3 Dec 05 Posts: 296 Credit: 55,183,482 RAC: 0 |
Check out all my timeouts. I've never experienced this before. Can someone explain to me what might be causing this all of the sudden, and what steps I can take to prevent it from happening in the future? All are "recent lost task" .. i expect they hung around in limbo too long |
David S Send message Joined: 4 Oct 99 Posts: 18352 Credit: 27,761,924 RAC: 12 |
if the host asked for GPU and (as far as it knows) didn't get any, why wouldn't it ask for GPU again the next time? Ah hah. That was the answer I was looking for (instead of just correcting me on the details). I forgot that on a dual request, it tries to fill GPU first. There are something else, if you are also attached to other projects, after the request fails to receive the needed CPU tasks, it migh ask for CPU tasks to another project while waiting for the 5 min delay, and if it gets tasks then it may not need CPU tasks anymore. And I didn't think of that. Although in my case, I have only one other project, Einstein, and all of my computers have been shying away from it lately (one of the three has a couple of tasks due in three days and the others haven't contacted it since they reported their last ones about a week ago). As to the list including undownloaded tasks, thanks to the guys who answered. It was just an idea. BTW, I'm up to 53 timeouts now. David Sitting on my butt while others boldly go, Waiting for a message from a small furry creature from Alpha Centauri. |
James Sotherden Send message Joined: 16 May 99 Posts: 10436 Credit: 110,373,059 RAC: 54 |
if the host asked for GPU and (as far as it knows) didn't get any, why wouldn't it ask for GPU again the next time? I now have 81 time outs. I cant even buy a download let alone report right now. [/quote] Old James |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.