holy cow! 20 timeouts

Author	Message
David S Volunteer tester Send message Joined: 4 Oct 99 Posts: 18352 Credit: 27,761,924 RAC: 12	Message 1283568 - Posted: 14 Sep 2012, 13:18:14 UTC I know the explanation for short deadline timeouts, but I have a question about the explanation. The explanation is that the host asks for work for, say, GPU, and the Scheduler responds by assigning a bunch of tasks for GPU, but the message never reaches the host so it can start downloading them. A few minutes later, the host again asks for work, but this time only for CPU, and it sends in the list of what it has on hand. The Scheduler looks at the list and says, "hey, I assigned this other bunch of work to you, but you don't have it, and I can't assign it again because you're not asking for GPU this time, so I have no choice but to time it out on you." Okay, fine. BUT..... if the host asked for GPU and (as far as it knows) didn't get any, why wouldn't it ask for GPU again the next time? While typing the above, I began to wonder something... Does the list the host sends the Scheduler include everything that it knows to have been assigned, even if it has't been downloaded yet, or is it only what's been downloaded? If the latter, then download slowness could be at the root of many of the short timeouts we all seem to experience at one time or another. It could probably also be fixed with a fairly minor tweak of the code somewhere (says the guy who knows nothing about coding, but who is smart enough to know such a tweak might have consequences I don't see). David Sitting on my butt while others boldly go, Waiting for a message from a small furry creature from Alpha Centauri. ID: 1283568 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1283592 - Posted: 14 Sep 2012, 13:48:37 UTC Last time I checked the ghost task VLAR timeout was more opposite of your description. 1) Host requests work for CPU or CPU & GPU. 2) Server assigns VLAR tasks to the CPU, but the host doesn't receive the list. 3) Host requests work for just GPU. 4) Server wants to send the tasks the host doesn't have. 5) The check for not allowing VLAR tasks on the GPU is tripped. 6) The tasks are marked as "Timed out - no response". I think there was some talk about trying to prevent this, but I don't know if that was just talk or not. IIRC: When a host makes a request it sends its list of all tasks in the client_state.xml. So if they are being processes, waiting, or in a transfer state it will tell the server "Hey I have these tasks". SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1283592 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1283593 - Posted: 14 Sep 2012, 13:48:43 UTC - in response to Message 1283568. Last modified: 14 Sep 2012, 14:07:10 UTC It's more likely to be the other way round. Your computer asks for CPU tasks, and is assigned VLAR workunits (somebody was saying yesterday that there were a lot around at the moment). If that message gets lost, and the next request is for GPU work, that's when the deadlines get fudged (because of the "don't send VLAR to nVidia" rule). Apart from the 'no VLAR to NV' situation, there's nothing (except possibly your preferences) to stop a "lost" result being issued to a different computing resource the second time round - when VLARs were being issued to GPUs because of a bug recently, I published instructions for a technique of deliberately losing them and getting them resent to CPU instead. That's the situation you suggest might cause problems, but it worked well for those who tried it. As regards the client telling the server about work it knows has been allocated, but not yet downloaded - I don't know. I'll take a look next time I have some downloads stuck. Murphy has decreed that I'm all green, just at the moment... Edit - got some. Yes, confirming Hal's post - allocated but not yet downloaded do get reported to the server. ID: 1283593 ·

Horacio Send message Joined: 14 Jan 00 Posts: 536 Credit: 75,967,266 RAC: 0	Message 1283623 - Posted: 14 Sep 2012, 14:41:45 UTC - in response to Message 1283568. if the host asked for GPU and (as far as it knows) didn't get any, why wouldn't it ask for GPU again the next time? As said before, it asks for CPU tasks, they are lost and then, at least 5 mins before, they asks again for more tasks... Well, in those 5 mins it may happen that the GPU Cache reach the "request more work" trigger so it asks for CPU and GPU, as the scheduller will try to fill first the GPU cache it will try to assign the losts tasks (originally intended for the CPU) to the GPU... There are something else, if you are also attached to other projects, after the request fails to receive the needed CPU tasks, it migh ask for CPU tasks to another project while waiting for the 5 min delay, and if it gets tasks then it may not need CPU tasks anymore. ID: 1283623 ·

Akio Send message Joined: 18 May 11 Posts: 375 Credit: 32,129,242 RAC: 0	Message 1283808 - Posted: 14 Sep 2012, 22:37:53 UTC Check out all my timeouts. I've never experienced this before. Can someone explain to me what might be causing this all of the sudden, and what steps I can take to prevent it from happening in the future? ID: 1283808 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1283811 - Posted: 14 Sep 2012, 22:44:56 UTC - in response to Message 1283808. Check out all my timeouts. I've never experienced this before. Can someone explain to me what might be causing this all of the sudden, and what steps I can take to prevent it from happening in the future? Reposting with a HostID link - other volunteers aren't allowed to follow a UserID link. Error tasks for computer 5967851 Look at all those VLARs, as previously discussed. ID: 1283811 ·

Akio Send message Joined: 18 May 11 Posts: 375 Credit: 32,129,242 RAC: 0	Message 1283813 - Posted: 14 Sep 2012, 22:54:23 UTC - in response to Message 1283811. Look at all those VLARs, as previously discussed. I don't understand any of that jargon. Riddle me this simply: Is it something that I did or that shows that there is something wrong with my computer that I can fix/change? ID: 1283813 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1283818 - Posted: 14 Sep 2012, 23:05:05 UTC - in response to Message 1283813. Look at all those VLARs, as previously discussed. I don't understand any of that jargon. Riddle me this simply: Is it something that I did or that shows that there is something wrong with my computer that I can fix/change? Neither. It's a slightly quirky part of the SETI system. It wasn't caused by anything you did: there's nothing wrong with your computer: and there's nothing you can do to fix it. Just hang on tight and enjoy the ride. ID: 1283818 ·

Akio Send message Joined: 18 May 11 Posts: 375 Credit: 32,129,242 RAC: 0	Message 1283826 - Posted: 14 Sep 2012, 23:23:28 UTC - in response to Message 1283818. Just hang on tight and enjoy the ride. Thanks for helping me again, Richard :) ID: 1283826 ·

bj Send message Joined: 11 Oct 00 Posts: 163 Credit: 50,429,507 RAC: 0	Message 1283871 - Posted: 15 Sep 2012, 2:19:23 UTC Did have almost 300. Went doen tp 152 and now back to 226 timed-outs. Like they say: hang in for the ride. bj ID: 1283871 ·

Wedge009 Volunteer tester Send message Joined: 3 Apr 99 Posts: 451 Credit: 431,396,357 RAC: 553	Message 1283895 - Posted: 15 Sep 2012, 3:22:53 UTC Last modified: 15 Sep 2012, 3:23:46 UTC Yep, I have a sharp increase in errors due to VLARs misdirected to GPUs as well. Somewhat-related question: With the recent changes to the scheduler, are VLARs no longer sent to ATI GPUs now? I'm certainly not complaining - while VLARs don't suffer with ATI GPUs nearly as badly as NV GPUs, I do often get a drop in the responsiveness of the OS GUI. I just haven't noticed any VLARs assigned to the ATI GPUs over the past several weeks and I'm enjoying this change, if it was indeed a deliberate change in the scheduler code. Soli Deo Gloria ID: 1283895 ·

Claggy Volunteer tester Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4	Message 1283987 - Posted: 15 Sep 2012, 9:30:29 UTC - in response to Message 1283895. Somewhat-related question: With the recent changes to the scheduler, are VLARs no longer sent to ATI GPUs now? I'm certainly not complaining - while VLARs don't suffer with ATI GPUs nearly as badly as NV GPUs, I do often get a drop in the responsiveness of the OS GUI. I just haven't noticed any VLARs assigned to the ATI GPUs over the past several weeks and I'm enjoying this change, if it was indeed a deliberate change in the scheduler code. I noticed a couple of weeks ago while looking through someone's ATI cache that he didn't have any VLARs on the ATI, so i expect it's true, don't know whether it was a deliberate change or not, Claggy ID: 1283987 ·

disco_nnected Volunteer tester Send message Joined: 19 Dec 06 Posts: 16 Credit: 13,654,017 RAC: 66	Message 1284029 - Posted: 15 Sep 2012, 11:40:43 UTC I'm also getting a bunch of timeouts last few days.... ID: 1284029 ·

chromespringer Send message Joined: 3 Dec 05 Posts: 296 Credit: 55,183,482 RAC: 0	Message 1284142 - Posted: 15 Sep 2012, 17:02:47 UTC - in response to Message 1283808. Check out all my timeouts. I've never experienced this before. Can someone explain to me what might be causing this all of the sudden, and what steps I can take to prevent it from happening in the future? Samten, if you look @ your task manager under application, all error-ed tasks appear to be cpu applications as do mine. They appear to time out shortly after they download. ID: 1284142 ·

chromespringer Send message Joined: 3 Dec 05 Posts: 296 Credit: 55,183,482 RAC: 0	Message 1284153 - Posted: 15 Sep 2012, 17:18:08 UTC - in response to Message 1284142. Check out all my timeouts. I've never experienced this before. Can someone explain to me what might be causing this all of the sudden, and what steps I can take to prevent it from happening in the future? Samten, if you look @ your task manager under application, all error-ed tasks appear to be cpu applications as do mine. They appear to time out shortly after they download. All are "recent lost task" .. i expect they hung around in limbo too long ID: 1284153 ·

David S Volunteer tester Send message Joined: 4 Oct 99 Posts: 18352 Credit: 27,761,924 RAC: 12	Message 1284781 - Posted: 17 Sep 2012, 13:39:05 UTC - in response to Message 1283623. if the host asked for GPU and (as far as it knows) didn't get any, why wouldn't it ask for GPU again the next time? As said before, it asks for CPU tasks, they are lost and then, at least 5 mins before, they asks again for more tasks... Well, in those 5 mins it may happen that the GPU Cache reach the "request more work" trigger so it asks for CPU and GPU, as the scheduller will try to fill first the GPU cache it will try to assign the losts tasks (originally intended for the CPU) to the GPU... Ah hah. That was the answer I was looking for (instead of just correcting me on the details). I forgot that on a dual request, it tries to fill GPU first. There are something else, if you are also attached to other projects, after the request fails to receive the needed CPU tasks, it migh ask for CPU tasks to another project while waiting for the 5 min delay, and if it gets tasks then it may not need CPU tasks anymore. And I didn't think of that. Although in my case, I have only one other project, Einstein, and all of my computers have been shying away from it lately (one of the three has a couple of tasks due in three days and the others haven't contacted it since they reported their last ones about a week ago). As to the list including undownloaded tasks, thanks to the guys who answered. It was just an idea. BTW, I'm up to 53 timeouts now. David Sitting on my butt while others boldly go, Waiting for a message from a small furry creature from Alpha Centauri. ID: 1284781 ·

James Sotherden Send message Joined: 16 May 99 Posts: 10436 Credit: 110,373,059 RAC: 54	Message 1285128 - Posted: 18 Sep 2012, 12:20:25 UTC - in response to Message 1284781. if the host asked for GPU and (as far as it knows) didn't get any, why wouldn't it ask for GPU again the next time? As said before, it asks for CPU tasks, they are lost and then, at least 5 mins before, they asks again for more tasks... Well, in those 5 mins it may happen that the GPU Cache reach the "request more work" trigger so it asks for CPU and GPU, as the scheduller will try to fill first the GPU cache it will try to assign the losts tasks (originally intended for the CPU) to the GPU... Ah hah. That was the answer I was looking for (instead of just correcting me on the details). I forgot that on a dual request, it tries to fill GPU first. There are something else, if you are also attached to other projects, after the request fails to receive the needed CPU tasks, it migh ask for CPU tasks to another project while waiting for the 5 min delay, and if it gets tasks then it may not need CPU tasks anymore. And I didn't think of that. Although in my case, I have only one other project, Einstein, and all of my computers have been shying away from it lately (one of the three has a couple of tasks due in three days and the others haven't contacted it since they reported their last ones about a week ago). As to the list including undownloaded tasks, thanks to the guys who answered. It was just an idea. BTW, I'm up to 53 timeouts now. I now have 81 time outs. I cant even buy a download let alone report right now. [/quote] Old James ID: 1285128 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.