WU's timing out........

Author	Message
Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1771876 - Posted: 16 Mar 2016, 11:41:15 UTC - in response to Message 1771874. I see the imfamous "finish file present too long" error. We really must all gang up on David to get that one fixed sometime. ID: 1771876 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1771878 - Posted: 16 Mar 2016, 11:54:56 UTC - in response to Message 1771876. I see the imfamous "finish file present too long" error. We really must all gang up on David to get that one fixed sometime. Spotted that. I would like to see file operation timeouts changed from hardcoded magic numbers, to a single default value (10 seconds or whatever), then added as a cc_config.xml option. I would likely wind mine out to substantially longer for RAID rebuilds and other more normal system pressure that presses Windows Lazy file operation optimisations. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1771878 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1771879 - Posted: 16 Mar 2016, 11:58:10 UTC - in response to Message 1771878. OK, got to take a time-out from this thread until (probably) tomorrow evening - real life intrudes. ID: 1771879 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1771880 - Posted: 16 Mar 2016, 11:58:35 UTC - in response to Message 1771879. OK, got to take a time-out from this thread until (probably) tomorrow evening - real life intrudes. Beer time here too :D "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1771880 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1771918 - Posted: 16 Mar 2016, 16:26:11 UTC - in response to Message 1771851. With the limits of 100 tasks per CPU/GPU & a deadline of about 6 weeks how many hours a day do you let your machines run tasks? The only things I can think of are: 1) Adjust your queue size. 2) Stop fiddling with things & let BOINC handle running tasks instead of trying to micromanage what it does. BOINC will run tasks that are near the deadline sooner if it thinks there is a chance of missing them. EDIT: A single core CPU machine that takes around 10 hours to complete a task would be able to complete a queue of 100 tasks before the deadline if it ran 24/7. I run 24/7 on all of my machines and I didn't start "playing around" until Boinc was unable to finish all of the tasks it had at it's command. As I said, never had this problem before the change of versions. Thanks for the input though. Allen Are you looking at the tasks in BOINC Manager and seeing they are close to their due date or are you looking at the error tasks list on the website and seeing "Timed out - no response"? If you are seeing "Timed out - no response" in your errors those tasks are most likely from ghosts that were never on your system. BOINC runs tasks FIFO (First In First Out) unless it thinks there will be a problem. Then it will run tasks in order by due date. How your cache settings are set can effect how BOINC determines this. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1771918 ·

Kevin Olley Send message Joined: 3 Aug 99 Posts: 906 Credit: 261,085,289 RAC: 572	Message 1771942 - Posted: 16 Mar 2016, 18:20:43 UTC - in response to Message 1771876. I see the imfamous "finish file present too long" error. We really must all gang up on David to get that one fixed sometime. Over 10k V8 tasks done now (CPU and GPU), no invalids, 3 errors all of the above:-( Kevin ID: 1771942 ·

AllenIN Volunteer tester Send message Joined: 5 Dec 00 Posts: 292 Credit: 58,297,005 RAC: 311	Message 1771975 - Posted: 16 Mar 2016, 19:58:33 UTC - in response to Message 1771855. The only real change that I have made since v7, is that I upgraded all of my machines to the latest Boinc version. Maybe that's the problem. "The latest"? Could we have an actual number on that, please? (Yes, I know I can look it up - v7.6.22 - but you might have meant you're testing v7.6.29). In general, it's better to use absolute data than relative terms which might get overtaken by events. Speaking of which, could you post the actual numbers you're using for cache size settings - both of them, please. I personally didn't have any problem with BOINC v7.6.22 maintaining the cache sizes I requested, but that tends to be an absolute maximum of 1 day, more often 0.4 or 0.5 days. Hi Richard. Sorry, I am running v7.6.22 on all of my computers except the tablet. I have quads with gpu's on 3 of them and one dual with gpu and one without. (It's a laptop.) I have it set for 10 and 10 since I hate to run out when Seti goes down for awhile. I've never had trouble with it running newer wu's and skipping over the one's that are due soon and that's why I asked if anyone else was having this problem. I thought that Boinc or Seti would decide how many wu's of whichever kind I should get to be sure to be able to accomplish finishing them on time. Usually I never mess with them, but when I saw so many of them being timed out, I thought I would try to fix it. It did get better for a day, but then it would grab units that weren't due for weeks instead of one that was due in a couple of days and then it would be late eventually. Thanks for helping! ID: 1771975 ·

AllenIN Volunteer tester Send message Joined: 5 Dec 00 Posts: 292 Credit: 58,297,005 RAC: 311	Message 1771979 - Posted: 16 Mar 2016, 20:17:36 UTC - in response to Message 1771918. With the limits of 100 tasks per CPU/GPU & a deadline of about 6 weeks how many hours a day do you let your machines run tasks? The only things I can think of are: 1) Adjust your queue size. 2) Stop fiddling with things & let BOINC handle running tasks instead of trying to micromanage what it does. BOINC will run tasks that are near the deadline sooner if it thinks there is a chance of missing them. EDIT: A single core CPU machine that takes around 10 hours to complete a task would be able to complete a queue of 100 tasks before the deadline if it ran 24/7. I run 24/7 on all of my machines and I didn't start "playing around" until Boinc was unable to finish all of the tasks it had at it's command. As I said, never had this problem before the change of versions. Thanks for the input though. Allen Are you looking at the tasks in BOINC Manager and seeing they are close to their due date or are you looking at the error tasks list on the website and seeing "Timed out - no response"? If you are seeing "Timed out - no response" in your errors those tasks are most likely from ghosts that were never on your system. BOINC runs tasks FIFO (First In First Out) unless it thinks there will be a problem. Then it will run tasks in order by due date. How your cache settings are set can effect how BOINC determines this. Wow! Now that is interesting. I did see the time outs that got me interested in doing something about it, but I never checked to see if they were actually ever on my system or at least present at some time. I just assumed (stupidly) that they were there and they were skipped for other wu's. They could have been ghosts as you said, since I didn't know anything about 'ghosts'. I guess at one time or another I could have lost some wu's during the change over of versions, but I didn't think it would affect all of the machines in relatively the same way since some of them didn't lose any wu's. After reading some of the give and take from Richard and Mr. Gee, I was wondering if changing the number of gpu units running at one time could affect the time it would take to complete a batch of wu's and maybe throw off the timing a bit. BUT....it seems to be a very large number of timed out units for that to be the case. Thanks for all the info.... didn't know about ghosts....hmmmm ID: 1771979 ·

AllenIN Volunteer tester Send message Joined: 5 Dec 00 Posts: 292 Credit: 58,297,005 RAC: 311	Message 1771980 - Posted: 16 Mar 2016, 20:19:28 UTC - in response to Message 1771942. I see the imfamous "finish file present too long" error. We really must all gang up on David to get that one fixed sometime. Over 10k V8 tasks done now (CPU and GPU), no invalids, 3 errors all of the above:-( As Indiana Jones said, "Don't get cocky kid."....grin I was in the same boat a few weeks back and then everything changed..grrrr! ID: 1771980 ·

Juha Volunteer tester Send message Joined: 7 Mar 04 Posts: 388 Credit: 1,857,738 RAC: 0	Message 1771994 - Posted: 16 Mar 2016, 21:17:22 UTC - in response to Message 1771979. Take host 6335328 for example. It's got 200 tasks In progress that were sent 13 March or later and another 25 tasks sent between 21 and 24 January. The January tasks are most likely ghost but you could check if any of them are on board, say task 4676230915 aka 01mr11ad.8091.17654.11.38.109_0. It has deadline set to 20 March so you should have enough time to check the host. ID: 1771994 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1772004 - Posted: 16 Mar 2016, 22:04:33 UTC - in response to Message 1771979. Last modified: 16 Mar 2016, 22:06:59 UTC With the limits of 100 tasks per CPU/GPU & a deadline of about 6 weeks how many hours a day do you let your machines run tasks? The only things I can think of are: 1) Adjust your queue size. 2) Stop fiddling with things & let BOINC handle running tasks instead of trying to micromanage what it does. BOINC will run tasks that are near the deadline sooner if it thinks there is a chance of missing them. EDIT: A single core CPU machine that takes around 10 hours to complete a task would be able to complete a queue of 100 tasks before the deadline if it ran 24/7. I run 24/7 on all of my machines and I didn't start "playing around" until Boinc was unable to finish all of the tasks it had at it's command. As I said, never had this problem before the change of versions. Thanks for the input though. Allen Are you looking at the tasks in BOINC Manager and seeing they are close to their due date or are you looking at the error tasks list on the website and seeing "Timed out - no response"? If you are seeing "Timed out - no response" in your errors those tasks are most likely from ghosts that were never on your system. BOINC runs tasks FIFO (First In First Out) unless it thinks there will be a problem. Then it will run tasks in order by due date. How your cache settings are set can effect how BOINC determines this. Wow! Now that is interesting. I did see the time outs that got me interested in doing something about it, but I never checked to see if they were actually ever on my system or at least present at some time. I just assumed (stupidly) that they were there and they were skipped for other wu's. They could have been ghosts as you said, since I didn't know anything about 'ghosts'. I guess at one time or another I could have lost some wu's during the change over of versions, but I didn't think it would affect all of the machines in relatively the same way since some of them didn't lose any wu's. After reading some of the give and take from Richard and Mr. Gee, I was wondering if changing the number of gpu units running at one time could affect the time it would take to complete a batch of wu's and maybe throw off the timing a bit. BUT....it seems to be a very large number of timed out units for that to be the case. Thanks for all the info.... didn't know about ghosts....hmmmm Sometimes when a host requests work the server selects tasks and assigns to the host, but the tasks never reason the host. So we call them "ghosts". The BOINC server does have an option "resend lost tasks". However SETI@home often disables that options because it can add extra strain to the servers. The only option at that point is having them timeout. Which isn't a problem as the server reassigns them. As other mentioned some of your machines show more than 200 tasks in progress. SETI@home has a limit of 100 CPU tasks and then 100 per GPU. This can make it easy to tell if you have a system with ghosts tasks. Looking at your host 6335328. It shows In progress (224) if we select next until we get past the first 200. Then we see the sent date is January 21st to the 24th for the last 24 tasks. If you don't see those tasks on your system then they are ghosts & will time out in the next few days. Which means everything is basically running fine on your end. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1772004 ·

AllenIN Volunteer tester Send message Joined: 5 Dec 00 Posts: 292 Credit: 58,297,005 RAC: 311	Message 1772030 - Posted: 17 Mar 2016, 0:27:03 UTC - in response to Message 1772004. With the limits of 100 tasks per CPU/GPU & a deadline of about 6 weeks how many hours a day do you let your machines run tasks? The only things I can think of are: 1) Adjust your queue size. 2) Stop fiddling with things & let BOINC handle running tasks instead of trying to micromanage what it does. BOINC will run tasks that are near the deadline sooner if it thinks there is a chance of missing them. EDIT: A single core CPU machine that takes around 10 hours to complete a task would be able to complete a queue of 100 tasks before the deadline if it ran 24/7. I run 24/7 on all of my machines and I didn't start "playing around" until Boinc was unable to finish all of the tasks it had at it's command. As I said, never had this problem before the change of versions. Thanks for the input though. Allen Are you looking at the tasks in BOINC Manager and seeing they are close to their due date or are you looking at the error tasks list on the website and seeing "Timed out - no response"? If you are seeing "Timed out - no response" in your errors those tasks are most likely from ghosts that were never on your system. BOINC runs tasks FIFO (First In First Out) unless it thinks there will be a problem. Then it will run tasks in order by due date. How your cache settings are set can effect how BOINC determines this. Wow! Now that is interesting. I did see the time outs that got me interested in doing something about it, but I never checked to see if they were actually ever on my system or at least present at some time. I just assumed (stupidly) that they were there and they were skipped for other wu's. They could have been ghosts as you said, since I didn't know anything about 'ghosts'. I guess at one time or another I could have lost some wu's during the change over of versions, but I didn't think it would affect all of the machines in relatively the same way since some of them didn't lose any wu's. After reading some of the give and take from Richard and Mr. Gee, I was wondering if changing the number of gpu units running at one time could affect the time it would take to complete a batch of wu's and maybe throw off the timing a bit. BUT....it seems to be a very large number of timed out units for that to be the case. Thanks for all the info.... didn't know about ghosts....hmmmm Sometimes when a host requests work the server selects tasks and assigns to the host, but the tasks never reason the host. So we call them "ghosts". The BOINC server does have an option "resend lost tasks". However SETI@home often disables that options because it can add extra strain to the servers. The only option at that point is having them timeout. Which isn't a problem as the server reassigns them. As other mentioned some of your machines show more than 200 tasks in progress. SETI@home has a limit of 100 CPU tasks and then 100 per GPU. This can make it easy to tell if you have a system with ghosts tasks. Looking at your host 6335328. It shows In progress (224) if we select next until we get past the first 200. Then we see the sent date is January 21st to the 24th for the last 24 tasks. If you don't see those tasks on your system then they are ghosts & will time out in the next few days. Which means everything is basically running fine on your end. Okay, got it. I will check it out and let you know what I find, but right now I'm guessing that you are right. Bed time for me. Thanks for the help. ID: 1772030 ·

Kevin Olley Send message Joined: 3 Aug 99 Posts: 906 Credit: 261,085,289 RAC: 572	Message 1772074 - Posted: 17 Mar 2016, 5:22:28 UTC - in response to Message 1771980. I see the imfamous "finish file present too long" error. We really must all gang up on David to get that one fixed sometime. Over 10k V8 tasks done now (CPU and GPU), no invalids, 3 errors all of the above:-( As Indiana Jones said, "Don't get cocky kid."....grin I was in the same boat a few weeks back and then everything changed..grrrr! It would be a boring life if everything worked perfectly all the time:-) Kevin ID: 1772074 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22184 Credit: 416,307,556 RAC: 380	Message 1772080 - Posted: 17 Mar 2016, 6:03:12 UTC A word of warning about doing the "count back two hundred" tasks to find ghosts - they may be recent arrivals - the best way is to do a detach/re-attach cycle, which gets rid of ghosts very effectively. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1772080 ·

AllenIN Volunteer tester Send message Joined: 5 Dec 00 Posts: 292 Credit: 58,297,005 RAC: 311	Message 1772261 - Posted: 18 Mar 2016, 0:13:48 UTC Last modified: 18 Mar 2016, 0:15:13 UTC Hal and all you helpers, Yep, that was it....a bunch of ghosts. I guess I'll just turn my head and let Boinc/Seti figure it out for me. BTW, is detaching and reattaching a good idea? Thanks all! ID: 1772261 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1772333 - Posted: 18 Mar 2016, 4:54:13 UTC - in response to Message 1772261. Last modified: 18 Mar 2016, 4:54:34 UTC Hal and all you helpers, Yep, that was it....a bunch of ghosts. I guess I'll just turn my head and let Boinc/Seti figure it out for me. BTW, is detaching and reattaching a good idea? Thanks all! Doing that will clear all of the current work you have downloaded. So you may wish to set No New Tasks in order to run down your cache first. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1772333 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.