WU's timing out........

Message boards : Number crunching : WU's timing out........
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1771876 - Posted: 16 Mar 2016, 11:41:15 UTC - in response to Message 1771874.  

I see the imfamous "finish file present too long" error. We really must all gang up on David to get that one fixed sometime.
ID: 1771876 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1771878 - Posted: 16 Mar 2016, 11:54:56 UTC - in response to Message 1771876.  

I see the imfamous "finish file present too long" error. We really must all gang up on David to get that one fixed sometime.


Spotted that. I would like to see file operation timeouts changed from hardcoded magic numbers, to a single default value (10 seconds or whatever), then added as a cc_config.xml option. I would likely wind mine out to substantially longer for RAID rebuilds and other more normal system pressure that presses Windows Lazy file operation optimisations.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1771878 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1771879 - Posted: 16 Mar 2016, 11:58:10 UTC - in response to Message 1771878.  

OK, got to take a time-out from this thread until (probably) tomorrow evening - real life intrudes.
ID: 1771879 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1771880 - Posted: 16 Mar 2016, 11:58:35 UTC - in response to Message 1771879.  

OK, got to take a time-out from this thread until (probably) tomorrow evening - real life intrudes.


Beer time here too :D
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1771880 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1771918 - Posted: 16 Mar 2016, 16:26:11 UTC - in response to Message 1771851.  

With the limits of 100 tasks per CPU/GPU & a deadline of about 6 weeks how many hours a day do you let your machines run tasks?

The only things I can think of are:
1) Adjust your queue size.
2) Stop fiddling with things & let BOINC handle running tasks instead of trying to micromanage what it does. BOINC will run tasks that are near the deadline sooner if it thinks there is a chance of missing them.

EDIT: A single core CPU machine that takes around 10 hours to complete a task would be able to complete a queue of 100 tasks before the deadline if it ran 24/7.


I run 24/7 on all of my machines and I didn't start "playing around" until Boinc was unable to finish all of the tasks it had at it's command. As I said, never had this problem before the change of versions.
Thanks for the input though.

Allen

Are you looking at the tasks in BOINC Manager and seeing they are close to their due date or are you looking at the error tasks list on the website and seeing "Timed out - no response"?
If you are seeing "Timed out - no response" in your errors those tasks are most likely from ghosts that were never on your system.

BOINC runs tasks FIFO (First In First Out) unless it thinks there will be a problem. Then it will run tasks in order by due date. How your cache settings are set can effect how BOINC determines this.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1771918 · Report as offensive
Kevin Olley

Send message
Joined: 3 Aug 99
Posts: 906
Credit: 261,085,289
RAC: 572
United Kingdom
Message 1771942 - Posted: 16 Mar 2016, 18:20:43 UTC - in response to Message 1771876.  

I see the imfamous "finish file present too long" error. We really must all gang up on David to get that one fixed sometime.


Over 10k V8 tasks done now (CPU and GPU), no invalids, 3 errors all of the above:-(
Kevin


ID: 1771942 · Report as offensive
Profile AllenIN
Volunteer tester
Avatar

Send message
Joined: 5 Dec 00
Posts: 292
Credit: 58,297,005
RAC: 311
United States
Message 1771975 - Posted: 16 Mar 2016, 19:58:33 UTC - in response to Message 1771855.  

The only real change that I have made since v7, is that I upgraded all of my machines to the latest Boinc version. Maybe that's the problem.

"The latest"? Could we have an actual number on that, please? (Yes, I know I can look it up - v7.6.22 - but you might have meant you're testing v7.6.29). In general, it's better to use absolute data than relative terms which might get overtaken by events.

Speaking of which, could you post the actual numbers you're using for cache size settings - both of them, please. I personally didn't have any problem with BOINC v7.6.22 maintaining the cache sizes I requested, but that tends to be an absolute maximum of 1 day, more often 0.4 or 0.5 days.


Hi Richard. Sorry, I am running v7.6.22 on all of my computers except the tablet. I have quads with gpu's on 3 of them and one dual with gpu and one without. (It's a laptop.)

I have it set for 10 and 10 since I hate to run out when Seti goes down for awhile. I've never had trouble with it running newer wu's and skipping over the one's that are due soon and that's why I asked if anyone else was having this problem.

I thought that Boinc or Seti would decide how many wu's of whichever kind I should get to be sure to be able to accomplish finishing them on time. Usually I never mess with them, but when I saw so many of them being timed out, I thought I would try to fix it. It did get better for a day, but then it would grab units that weren't due for weeks instead of one that was due in a couple of days and then it would be late eventually.

Thanks for helping!
ID: 1771975 · Report as offensive
Profile AllenIN
Volunteer tester
Avatar

Send message
Joined: 5 Dec 00
Posts: 292
Credit: 58,297,005
RAC: 311
United States
Message 1771979 - Posted: 16 Mar 2016, 20:17:36 UTC - in response to Message 1771918.  

With the limits of 100 tasks per CPU/GPU & a deadline of about 6 weeks how many hours a day do you let your machines run tasks?

The only things I can think of are:
1) Adjust your queue size.
2) Stop fiddling with things & let BOINC handle running tasks instead of trying to micromanage what it does. BOINC will run tasks that are near the deadline sooner if it thinks there is a chance of missing them.

EDIT: A single core CPU machine that takes around 10 hours to complete a task would be able to complete a queue of 100 tasks before the deadline if it ran 24/7.


I run 24/7 on all of my machines and I didn't start "playing around" until Boinc was unable to finish all of the tasks it had at it's command. As I said, never had this problem before the change of versions.
Thanks for the input though.

Allen

Are you looking at the tasks in BOINC Manager and seeing they are close to their due date or are you looking at the error tasks list on the website and seeing "Timed out - no response"?
If you are seeing "Timed out - no response" in your errors those tasks are most likely from ghosts that were never on your system.

BOINC runs tasks FIFO (First In First Out) unless it thinks there will be a problem. Then it will run tasks in order by due date. How your cache settings are set can effect how BOINC determines this.


Wow! Now that is interesting. I did see the time outs that got me interested in doing something about it, but I never checked to see if they were actually ever on my system or at least present at some time. I just assumed (stupidly) that they were there and they were skipped for other wu's. They could have been ghosts as you said, since I didn't know anything about 'ghosts'.

I guess at one time or another I could have lost some wu's during the change over of versions, but I didn't think it would affect all of the machines in relatively the same way since some of them didn't lose any wu's.

After reading some of the give and take from Richard and Mr. Gee, I was wondering if changing the number of gpu units running at one time could affect the time it would take to complete a batch of wu's and maybe throw off the timing a bit. BUT....it seems to be a very large number of timed out units for that to be the case.

Thanks for all the info.... didn't know about ghosts....hmmmm
ID: 1771979 · Report as offensive
Profile AllenIN
Volunteer tester
Avatar

Send message
Joined: 5 Dec 00
Posts: 292
Credit: 58,297,005
RAC: 311
United States
Message 1771980 - Posted: 16 Mar 2016, 20:19:28 UTC - in response to Message 1771942.  

I see the imfamous "finish file present too long" error. We really must all gang up on David to get that one fixed sometime.


Over 10k V8 tasks done now (CPU and GPU), no invalids, 3 errors all of the above:-(



As Indiana Jones said, "Don't get cocky kid."....grin

I was in the same boat a few weeks back and then everything changed..grrrr!
ID: 1771980 · Report as offensive
Juha
Volunteer tester

Send message
Joined: 7 Mar 04
Posts: 388
Credit: 1,857,738
RAC: 0
Finland
Message 1771994 - Posted: 16 Mar 2016, 21:17:22 UTC - in response to Message 1771979.  

Take host 6335328 for example. It's got 200 tasks In progress that were sent 13 March or later and another 25 tasks sent between 21 and 24 January.

The January tasks are most likely ghost but you could check if any of them are on board, say task 4676230915 aka 01mr11ad.8091.17654.11.38.109_0. It has deadline set to 20 March so you should have enough time to check the host.
ID: 1771994 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1772004 - Posted: 16 Mar 2016, 22:04:33 UTC - in response to Message 1771979.  
Last modified: 16 Mar 2016, 22:06:59 UTC

With the limits of 100 tasks per CPU/GPU & a deadline of about 6 weeks how many hours a day do you let your machines run tasks?

The only things I can think of are:
1) Adjust your queue size.
2) Stop fiddling with things & let BOINC handle running tasks instead of trying to micromanage what it does. BOINC will run tasks that are near the deadline sooner if it thinks there is a chance of missing them.

EDIT: A single core CPU machine that takes around 10 hours to complete a task would be able to complete a queue of 100 tasks before the deadline if it ran 24/7.


I run 24/7 on all of my machines and I didn't start "playing around" until Boinc was unable to finish all of the tasks it had at it's command. As I said, never had this problem before the change of versions.
Thanks for the input though.

Allen

Are you looking at the tasks in BOINC Manager and seeing they are close to their due date or are you looking at the error tasks list on the website and seeing "Timed out - no response"?
If you are seeing "Timed out - no response" in your errors those tasks are most likely from ghosts that were never on your system.

BOINC runs tasks FIFO (First In First Out) unless it thinks there will be a problem. Then it will run tasks in order by due date. How your cache settings are set can effect how BOINC determines this.


Wow! Now that is interesting. I did see the time outs that got me interested in doing something about it, but I never checked to see if they were actually ever on my system or at least present at some time. I just assumed (stupidly) that they were there and they were skipped for other wu's. They could have been ghosts as you said, since I didn't know anything about 'ghosts'.

I guess at one time or another I could have lost some wu's during the change over of versions, but I didn't think it would affect all of the machines in relatively the same way since some of them didn't lose any wu's.

After reading some of the give and take from Richard and Mr. Gee, I was wondering if changing the number of gpu units running at one time could affect the time it would take to complete a batch of wu's and maybe throw off the timing a bit. BUT....it seems to be a very large number of timed out units for that to be the case.

Thanks for all the info.... didn't know about ghosts....hmmmm

Sometimes when a host requests work the server selects tasks and assigns to the host, but the tasks never reason the host. So we call them "ghosts". The BOINC server does have an option "resend lost tasks". However SETI@home often disables that options because it can add extra strain to the servers. The only option at that point is having them timeout. Which isn't a problem as the server reassigns them.

As other mentioned some of your machines show more than 200 tasks in progress. SETI@home has a limit of 100 CPU tasks and then 100 per GPU. This can make it easy to tell if you have a system with ghosts tasks. Looking at your host 6335328. It shows In progress (224) if we select next until we get past the first 200. Then we see the sent date is January 21st to the 24th for the last 24 tasks. If you don't see those tasks on your system then they are ghosts & will time out in the next few days. Which means everything is basically running fine on your end.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1772004 · Report as offensive
Profile AllenIN
Volunteer tester
Avatar

Send message
Joined: 5 Dec 00
Posts: 292
Credit: 58,297,005
RAC: 311
United States
Message 1772030 - Posted: 17 Mar 2016, 0:27:03 UTC - in response to Message 1772004.  

With the limits of 100 tasks per CPU/GPU & a deadline of about 6 weeks how many hours a day do you let your machines run tasks?

The only things I can think of are:
1) Adjust your queue size.
2) Stop fiddling with things & let BOINC handle running tasks instead of trying to micromanage what it does. BOINC will run tasks that are near the deadline sooner if it thinks there is a chance of missing them.

EDIT: A single core CPU machine that takes around 10 hours to complete a task would be able to complete a queue of 100 tasks before the deadline if it ran 24/7.


I run 24/7 on all of my machines and I didn't start "playing around" until Boinc was unable to finish all of the tasks it had at it's command. As I said, never had this problem before the change of versions.
Thanks for the input though.

Allen

Are you looking at the tasks in BOINC Manager and seeing they are close to their due date or are you looking at the error tasks list on the website and seeing "Timed out - no response"?
If you are seeing "Timed out - no response" in your errors those tasks are most likely from ghosts that were never on your system.

BOINC runs tasks FIFO (First In First Out) unless it thinks there will be a problem. Then it will run tasks in order by due date. How your cache settings are set can effect how BOINC determines this.


Wow! Now that is interesting. I did see the time outs that got me interested in doing something about it, but I never checked to see if they were actually ever on my system or at least present at some time. I just assumed (stupidly) that they were there and they were skipped for other wu's. They could have been ghosts as you said, since I didn't know anything about 'ghosts'.

I guess at one time or another I could have lost some wu's during the change over of versions, but I didn't think it would affect all of the machines in relatively the same way since some of them didn't lose any wu's.

After reading some of the give and take from Richard and Mr. Gee, I was wondering if changing the number of gpu units running at one time could affect the time it would take to complete a batch of wu's and maybe throw off the timing a bit. BUT....it seems to be a very large number of timed out units for that to be the case.

Thanks for all the info.... didn't know about ghosts....hmmmm

Sometimes when a host requests work the server selects tasks and assigns to the host, but the tasks never reason the host. So we call them "ghosts". The BOINC server does have an option "resend lost tasks". However SETI@home often disables that options because it can add extra strain to the servers. The only option at that point is having them timeout. Which isn't a problem as the server reassigns them.

As other mentioned some of your machines show more than 200 tasks in progress. SETI@home has a limit of 100 CPU tasks and then 100 per GPU. This can make it easy to tell if you have a system with ghosts tasks. Looking at your host 6335328. It shows In progress (224) if we select next until we get past the first 200. Then we see the sent date is January 21st to the 24th for the last 24 tasks. If you don't see those tasks on your system then they are ghosts & will time out in the next few days. Which means everything is basically running fine on your end.


Okay, got it. I will check it out and let you know what I find, but right now I'm guessing that you are right. Bed time for me. Thanks for the help.
ID: 1772030 · Report as offensive
Kevin Olley

Send message
Joined: 3 Aug 99
Posts: 906
Credit: 261,085,289
RAC: 572
United Kingdom
Message 1772074 - Posted: 17 Mar 2016, 5:22:28 UTC - in response to Message 1771980.  

I see the imfamous "finish file present too long" error. We really must all gang up on David to get that one fixed sometime.


Over 10k V8 tasks done now (CPU and GPU), no invalids, 3 errors all of the above:-(



As Indiana Jones said, "Don't get cocky kid."....grin

I was in the same boat a few weeks back and then everything changed..grrrr!


It would be a boring life if everything worked perfectly all the time:-)
Kevin


ID: 1772074 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22184
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1772080 - Posted: 17 Mar 2016, 6:03:12 UTC

A word of warning about doing the "count back two hundred" tasks to find ghosts - they may be recent arrivals - the best way is to do a detach/re-attach cycle, which gets rid of ghosts very effectively.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1772080 · Report as offensive
Profile AllenIN
Volunteer tester
Avatar

Send message
Joined: 5 Dec 00
Posts: 292
Credit: 58,297,005
RAC: 311
United States
Message 1772261 - Posted: 18 Mar 2016, 0:13:48 UTC
Last modified: 18 Mar 2016, 0:15:13 UTC

Hal and all you helpers,

Yep, that was it....a bunch of ghosts. I guess I'll just turn my head and let Boinc/Seti figure it out for me.

BTW, is detaching and reattaching a good idea?

Thanks all!
ID: 1772261 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1772333 - Posted: 18 Mar 2016, 4:54:13 UTC - in response to Message 1772261.  
Last modified: 18 Mar 2016, 4:54:34 UTC

Hal and all you helpers,

Yep, that was it....a bunch of ghosts. I guess I'll just turn my head and let Boinc/Seti figure it out for me.

BTW, is detaching and reattaching a good idea?

Thanks all!

Doing that will clear all of the current work you have downloaded. So you may wish to set No New Tasks in order to run down your cache first.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1772333 · Report as offensive
Previous · 1 · 2

Message boards : Number crunching : WU's timing out........


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.