Questions and Answers :
Unix/Linux :
Most files must be aborted
Message board moderation
Author | Message |
---|---|
Lee Wilkerson Send message Joined: 6 Jul 99 Posts: 7 Credit: 891,075 RAC: 0 |
Can someone tell me if this situation is normal, or can be corrected? I have a PC with Mint Linux (Intel 4 core CPU Q6600 @ 2.40GHz , Linux 4.4.0-89-generic) which processes four files at a time and does not use GPU RAM. After four days or so of processing, the system will abort the file with the message: "Timed out - no response". I have begun manually aborting files if they take close to one day to get to 98% or so, because those always hang. Sometimes rebooting the PC will fix the issue, but usually not. About 60% of the files have to be terminated, therefore much processing time is wasted. I cannot find any hardware problems, or Linux or BOINC/Seti configuration issues. Resetting the project did not help. Thanks. Lee Wilkerson |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
What I see from your list of erroneous tasks is that the Timed out - no response was outside its deadline. Tasks come with a deadline, a time limit before which they have to be calculated and reported. The one task that didn't meet that just timed out. This task may not even have been on your system, it may have been a so-called lost task (lost in transition between the server and your computer). Then among all the aborted tasks, there's one exceeded elapsed time limit 431161.36 (3826907.57G/8.88G) . All tasks come with an estimated time to run value, measured in flops (floating point operations per second). When a task runs, BOINC calculates constantly how long it has already done so. When the task runs for longer than the estimated flops amount, the task is auto-aborted with the above error. A cause for the problem could be that the system was busy with other things at the same time as running the task. The benchmarks could be too low. It could've been a bad task for your computer. It may indicate hardware trouble, but until we see a lot more of these tasks do the same thing, there's no telling. That one task has this error doesn't mean all tasks will have this error. You shouldn't be aborting tasks, just because they can run for longer than a day. My Android phone regularly does over 270,000 - 300,000 seconds on a task, I'm just leaving things well alone and they'll finish fine. Sorry to say this, but you say you must terminate the files. Why must you do that? Is someone threatening you with great bodily harm, if you do not do what they say in this? Why can't you leave them well alone? Why don't you just let BOINC try to finish them? Now there's hardly any evidence to go by that something may be wrong with your system, or that it's a bad batch of tasks, because you continue to abort them. Just leave them be. If they're on your system, BOINC will try to run all tasks before their deadline. |
Lee Wilkerson Send message Joined: 6 Jul 99 Posts: 7 Credit: 891,075 RAC: 0 |
Thanks for the reply. I don't actually have to abort the tasks - no, there are no threats. I can let them sit at 100.000% until the deadline comes, then the system will abort them. In the meantime the loss of good potential calculation time has been lost. This PC is doing virtually nothing else. I don't like to abort files, but I have never seen one go to 100.000% and sit there for several days and then finish properly. I have no problem with files that take 4, 9, 20 or more days to complete. It seems like either this system has a hardware problem, or the time-to-calculate needs to be reset on the server. Lee |
Lee Wilkerson Send message Joined: 6 Jul 99 Posts: 7 Credit: 891,075 RAC: 0 |
This is why I manually abort files. The date and time on the processing system are correct. Files either process correctly in less than ten hours (elapsed), or they take 1-2 days to go to 100.000% processed, then after a few more days the system aborts them apparently for processing too long . It has nothing to do with deadlines: Mon 06 Nov 2017 03:08:08 AM EST | SETI@home | Aborting task 20fe07af.27239.13569.12.39.33_0: exceeded elapsed time limit 152520.65 (1414546.15G/9.35G) Note: Deadline was Tue 21 Nov 2017 10:24:15 AM EST Mon 06 Nov 2017 04:56:45 PM EST | SETI@home | Aborting task 09mr07ab.13652.1299.7.34.23_0: exceeded elapsed time limit 151733.96 (1414546.15G/9.36G) Note: Deadline was Tue 21 Nov 2017 10:24:15 AM EST Tue 07 Nov 2017 09:09:24 PM EST | SETI@home | Aborting task 09mr07ab.13652.1299.7.34.17_1: exceeded elapsed time limit 151273.31 (1414546.15G/9.36G) Note: Deadline was Tue 21 Nov 2017 10:24:15 AM EST Wed 08 Nov 2017 10:54:29 AM EST | SETI@home | Aborting task 20fe07af.27239.13569.12.39.54_1: exceeded elapsed time limit 151061.69 (1414546.15G/9.40G) Note: Deadline was Tue 21 Nov 2017 10:24:15 AM EST Thu 09 Nov 2017 08:31:09 AM EST | SETI@home | Aborting task 02ja07aa.24089.281010.4.31.103.vlar_0: exceeded elapsed time limit 396248.37 (3677746.15G/9.40G) Note: Deadline was Sun 24 Dec 2017 04:14:15 AM EST Before I shut down this PC forever, is there something that can be done to fix the problem? |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
I see you're using BOINC 7.6.31, is it possible for you to upgrade to a newer version or downgrade to an earlier version to make sure that the BOINC version isn't the cause here? It sounds as if boinc_finish() isn't called in time, at task end. |
Lee Wilkerson Send message Joined: 6 Jul 99 Posts: 7 Credit: 891,075 RAC: 0 |
Thanks, I’ll definitely look into that. Lee |
Lee Wilkerson Send message Joined: 6 Jul 99 Posts: 7 Credit: 891,075 RAC: 0 |
I installed a different OS on the PC, Ubuntu 16.04 LTS. So far processing seems much improved. Lee |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.