Message boards :
Number crunching :
Postponed: Waiting to acquire lock
Message board moderation
Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · Next
Author | Message |
---|---|
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
A BOINC application in memory can sleep for as long as it likes. But it should keep just enough awareness to listen for further instructions, whether they be 'wake up and crunch', or 'exit completely, we're quitting'. I already quit the Boinc several times after yesterday then why the "exit completely" dit not work? Can't say for sure but not believe is't a good programing technique a program who leave "sleeping grinch" of his used data for days after it was shut down. Could not is a waste CPU time (the 13 hrs of the SSE4.1 is not a good signal) but it waste memory and whatever else it could use. And not forget if anyone lock a slot file for any reason (cosmic ray?) it could tigger the issue. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
@JeffFirst, you should probably fix your link. It threw me for a bit. Those two show the task files as '(deleted)', which probably means that the tasks have actually been completed and reported. Did you try to verify that from your task pages? If they actually have successfully completed, you could probably just go ahead and kill the processes now from System Monitor, if you like, although it doesn't sound like they're causing your system any problems. I don't know if there's anything there that could tell you what caused the processes to go to sleep. (Although you did say its pretty hot there today, as I recall. ;^)) I notice that each one shows a different "Waiting Channel", but I have no idea what that means. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Juan, your OneDrive URL link is malformed. Can't get where you intended. Thanks. Fixed. Missed the h on the beginning. https://1drv.ms/u/s!Asjkc9Jyluh3zxFNnAHQAuFT-19J |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
One of the reasons I moved on from repository 7.6.33 and the 7.2.42/7 variants was because I didn't like having to make two mouse clicks to cleanly exit tasks in the Manager. BOINC 7.8.3 cleanly exits with just the one click to Quit. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
One of the reasons I moved on from repository 7.6.33 and the 7.2.42/7 variants was because I didn't like having to make two mouse clicks to cleanly exit tasks in the Manager. BOINC 7.8.3 cleanly exits with just the one click to Quit.Apparently that's not working for some people even though it works for most others. It works for me, and I guess that's all that matters. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
BOINC doesn't always stop the running tasks by just exiting the Manager. That should be pretty clear by now.Clear as mud. "Stop running tasks" is really a misnomer, as what it really means is "Stop the client". The client then needs to handle stopping the tasks. If what you're saying is that the client stopped, but the tasks didn't, then yes, it would be relevant to this discussion, because that would be leaving Zombie tasks behind and possibly unwanted slot contents. On the other hand, if it's simply that the Manager doesn't stop the client, then having the client and its subordinate tasks continue to run really doesn't lead to either residual lockfiles or Zombie tasks, does it? Once the second step is taken to stop the client, the tasks should shut down then, as well. Only if there's a hitch in that process would it be relevant here. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Juan, you most definitely have the SSE4.1 tasks as zombies. They still are in memory based on your screenshots. One thing to try would be to check is if you have the "Leave non-GPU tasks in memory while suspended" option set either in the website computing preferences or in your local preferences. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Juan, you most definitely have the SSE4.1 tasks as zombies. They still are in memory based on your screenshots. One thing to try would be to check is if you have the "Leave non-GPU tasks in memory while suspended" option set either in the website computing preferences or in your local preferences. YEs it was checked but i set my options to never suspend the GPU / CPU. But i changed that now. Will going to kill all the sleeping grinches and keep and eye if they return. Time for a beer. Tomorrow is a holiday here. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
If I read them right from your screen photos, the two tasks shown in the sleeping processes are: 06mr07ab.22000.9479.12.39.150_2 blc05_2bit_guppi_57976_07262_HIP74926_0026.27310.409.22.45.114.vlar_1 Both seem to be gone from the server, so it appears that those processes got left behind after the tasks were finished. You might be able to go back to your earlier Event Log entries and see if those tasks show up there, and whether or not anything odd occurred about the time they finished processing. |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
TBarr, we are looking for Zombies where 1 or more tasks are left running occasionally, not every task remaining running every exit - a completely different topic. With the decline in RTS, I loaded up with an extra 2500 tasks across my Ubuntu 14/16 and Mint 17 systems with v7.2 and v7.6 clients while watching running tasks and didn't see any hang around for more than a few seconds when shutting down by command line. I only did 1 shutdown on each system via the manager, and the same, no problems found. The 'other' search still continues ... |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Tried but can't find them on the host task list anymore: No such task: 06mr07ab.22000.9479.12.39.150_2 No such task: blc05_2bit_guppi_57976_07262_HIP74926_0026.27310.409.22.45.114.vlar_1 They R.I.P. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Tried but can't find them on the host task list anymore:Did you look in your "stdoutdae.txt" and "stdoutdae.old" Event Log files, or do they not hold enough records. You can increase the size of those by using the "<max_event_log_lines>nnnn</max_event_log_lines>" option in cc_config. I usually keep 10000 lines in my bigger boxes, but your Linux one is way busier than mine, so perhaps a larger number would be good for yours. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
I believe i do. UL to the onedrive if you want to look. Maybe i miss something. https://1drv.ms/t/s!Asjkc9Jyluh3zxVjRRK7ZN2j2nls https://1drv.ms/u/s!Asjkc9Jyluh3zxSNZoorSTLnHX1M Changing the max event to 15K now was 2K only |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
I believe i do. UL to the onedrive if you want to look. Maybe i miss something.Yeah, even your ".old" file only goes back about 37 hours, so it appears that any reference to those two tasks must go back further than that. Which would indicate that those sleepy processes have been napping for quite a while. I don't suppose you happened to catch the task name of the third one. I didn't see it in your earlier post. Anyway, 15K lines should definitely help in future diagnosis. Keep an eye on those files, though, and see how many days that ends up covering. Ultimately, you may want to increase it even further, but just wait and see for now. EDIT: Oh, I just realized something. For some reason, I thought we were only looking at the three SSE41 processes, but I just realized that you also had several AVX2 processes sleeping and that your earlier screen photos just showed one of each. Did you already kill all of those, or can you still extract the task names from them? |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
I don't suppose you happened to catch the task name of the third one. I didn't see it in your earlier post. No and i already kill all the sleepy process. Will keep an eye on that too. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Okay, you just answered the question in my belated edit. Sorry, I apparently wasn't paying close enough attention earlier. :^(I don't suppose you happened to catch the task name of the third one. I didn't see it in your earlier post. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I have my max event lines set to 3000 thinking that was big enough. Then I got whammied by the download servers having a brain-fart when I asked for 37 tasks on a work request with http_debug set. And the server responded with a 65 line dump for every task sent my way. Quickly overwrote everything in my logfile. Had to look in the old file for the beginning of the request. http_debug is most definitely not working well or like it should right now with the servers. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
I need to make a correction to the Event Log size information I posted earlier. The "<max_event_log_lines>N</max_event_log_lines>" option in cc_config only controls the number of displayable lines in the Event Log window. To change the maximum size of the "stdoutdae.txt" and "stdoutdae.old" files, and thus the amount of Event Log data retained over time, use the <max_stdout_file_size>N</max_stdout_file_size> option. This is entered in bytes, following this example from the BOINC manual: <max_stdout_file_size>N</max_stdout_file_size> Specify the maximum size of the standard out log file (stdoutdae.txt); default is 2 MB. Sample: <max_stdout_file_size>3145728</max_stdout_file_size> equals 3 MB. NB: A Client restart may be needed to have changes take effect!I apologize if I caused any confusion. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Changed i use 10485760 (10 MB - 5X the original 2 MB size) |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
@Jeff This is interesting: lun 08 ene 2018 17:29:15 EST: Task Rescheduling Begins Arecibo non-VLAR CPU Tasks Moved to GPU = 0 Arecibo non-VLAR GPU Tasks Moved to CPU = 0 Arecibo VLAR CPU Tasks Moved to GPU = 0 Arecibo VLAR GPU Tasks Moved to CPU = 0 BLC non-VLAR CPU Tasks Moved to GPU = 0 BLC non-VLAR GPU Tasks Moved to CPU = 0 BLC VLAR CPU Tasks Moved to GPU = 0 BLC VLAR GPU Tasks Moved to CPU = 370 Astropulse CPU Tasks Moved to GPU = 0 Astropulse GPU Tasks Moved to CPU = 0 Deleted File >> /home/juan/BOINC/slots/2/boinc_finish_called <<<<<< ----------- look here!!! lun 08 ene 2018 17:29:15 EST: Task Rescheduling Complete 08-Jan-2018 17:27:33 [SETI@home] Starting task blc05_2bit_guppi_57976_14749_HIP91145_0047.25682.409.21.44.171.vlar_0 08-Jan-2018 17:27:35 [SETI@home] Started upload of blc05_2bit_guppi_57976_16213_HIP91234_0051.25235.818.22.45.46.vlar_0_r9107781_0 08-Jan-2018 17:27:37 [SETI@home] Finished upload of blc05_2bit_guppi_57976_16213_HIP91234_0051.25235.818.22.45.46.vlar_0_r9107781_0 08-Jan-2018 17:28:49 [---] Exiting 08-Jan-2018 17:29:19 [---] Starting BOINC client version 7.8.3 for x86_64-pc-linux-gnu 08-Jan-2018 17:29:19 [---] log flags: file_xfer, sched_ops, task 08-Jan-2018 17:29:19 [---] Libraries: libcurl/7.47.0 OpenSSL/1.0.2g zlib/1.2.8 libidn/1.32 librtmp/2.3 08-Jan-2018 17:29:19 [---] Data directory: /home/juan/BOINC 08-Jan-2018 17:29:20 [---] CUDA: NVIDIA GPU 0: GeForce GTX 1070 (driver version 384.98, CUDA version 9.0, compute capability 6.1, 4096MB, 3984MB available, 6852 GFLOPS peak) 08-Jan-2018 17:29:20 [---] CUDA: NVIDIA GPU 1: GeForce GTX 1070 (driver version 384.98, CUDA version 9.0, compute capability 6.1, 4096MB, 3984MB available, 6852 GFLOPS peak) 08-Jan-2018 17:29:20 [---] CUDA: NVIDIA GPU 2: GeForce GTX 1070 (driver version 384.98, CUDA version 9.0, compute capability 6.1, 4096MB, 3984MB available, 6852 GFLOPS peak) 08-Jan-2018 17:29:20 [---] CUDA: NVIDIA GPU 3: GeForce GTX 1070 (driver version 384.98, CUDA version 9.0, As you could see the bypass of the issue works and the host continues to crunch normally but the issue still appears even with blc work only and with the WU cache at full (that's new). |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.