Postponed: Waiting to acquire lock

Message boards : Number crunching : Postponed: Waiting to acquire lock
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · Next

AuthorMessage
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1911885 - Posted: 8 Jan 2018, 22:38:57 UTC - in response to Message 1911881.  

A BOINC application in memory can sleep for as long as it likes. But it should keep just enough awareness to listen for further instructions, whether they be 'wake up and crunch', or 'exit completely, we're quitting'.

I already quit the Boinc several times after yesterday then why the "exit completely" dit not work?
Can't say for sure but not believe is't a good programing technique a program who leave "sleeping grinch" of his used data for days after it was shut down. Could not is a waste CPU time (the 13 hrs of the SSE4.1 is not a good signal) but it waste memory and whatever else it could use.
And not forget if anyone lock a slot file for any reason (cosmic ray?) it could tigger the issue.
ID: 1911885 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1911888 - Posted: 8 Jan 2018, 22:40:27 UTC - in response to Message 1911879.  

@Jeff

I made what you ask. Post 4 examples on the link. All AVX2 shows all parameters equal as the posted. Each SSE4.1 shows different CPU time. In all the status is Sleep. So they are not phantoms they are sleeping grinch!. Sometimes i really hate computers!

What i can't understand why they still in the memory some after 24 hrs.

http://ttps://1drv.ms/u/s!Asjkc9Jyluh3zxFNnAHQAuFT-19J

<edit> One question... What program is supposed to delete the process? Or at least tell the OS to kill the process?
First, you should probably fix your link. It threw me for a bit.

Those two show the task files as '(deleted)', which probably means that the tasks have actually been completed and reported. Did you try to verify that from your task pages? If they actually have successfully completed, you could probably just go ahead and kill the processes now from System Monitor, if you like, although it doesn't sound like they're causing your system any problems.

I don't know if there's anything there that could tell you what caused the processes to go to sleep. (Although you did say its pretty hot there today, as I recall. ;^)) I notice that each one shows a different "Waiting Channel", but I have no idea what that means.
ID: 1911888 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1911889 - Posted: 8 Jan 2018, 22:41:16 UTC - in response to Message 1911884.  
Last modified: 8 Jan 2018, 22:43:58 UTC

Juan, your OneDrive URL link is malformed. Can't get where you intended.

Thanks. Fixed. Missed the h on the beginning.

https://1drv.ms/u/s!Asjkc9Jyluh3zxFNnAHQAuFT-19J
ID: 1911889 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1911890 - Posted: 8 Jan 2018, 22:42:05 UTC

One of the reasons I moved on from repository 7.6.33 and the 7.2.42/7 variants was because I didn't like having to make two mouse clicks to cleanly exit tasks in the Manager. BOINC 7.8.3 cleanly exits with just the one click to Quit.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1911890 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1911893 - Posted: 8 Jan 2018, 22:48:41 UTC - in response to Message 1911890.  

One of the reasons I moved on from repository 7.6.33 and the 7.2.42/7 variants was because I didn't like having to make two mouse clicks to cleanly exit tasks in the Manager. BOINC 7.8.3 cleanly exits with just the one click to Quit.
Apparently that's not working for some people even though it works for most others. It works for me, and I guess that's all that matters.
ID: 1911893 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1911895 - Posted: 8 Jan 2018, 22:53:16 UTC - in response to Message 1911883.  

BOINC doesn't always stop the running tasks by just exiting the Manager. That should be pretty clear by now.
Clear as mud.

"Stop running tasks" is really a misnomer, as what it really means is "Stop the client". The client then needs to handle stopping the tasks. If what you're saying is that the client stopped, but the tasks didn't, then yes, it would be relevant to this discussion, because that would be leaving Zombie tasks behind and possibly unwanted slot contents. On the other hand, if it's simply that the Manager doesn't stop the client, then having the client and its subordinate tasks continue to run really doesn't lead to either residual lockfiles or Zombie tasks, does it? Once the second step is taken to stop the client, the tasks should shut down then, as well. Only if there's a hitch in that process would it be relevant here.
ID: 1911895 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1911897 - Posted: 8 Jan 2018, 22:59:07 UTC - in response to Message 1911889.  
Last modified: 8 Jan 2018, 22:59:27 UTC

Juan, you most definitely have the SSE4.1 tasks as zombies. They still are in memory based on your screenshots. One thing to try would be to check is if you have the "Leave non-GPU tasks in memory while suspended" option set either in the website computing preferences or in your local preferences.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1911897 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1911899 - Posted: 8 Jan 2018, 23:05:10 UTC - in response to Message 1911897.  

Juan, you most definitely have the SSE4.1 tasks as zombies. They still are in memory based on your screenshots. One thing to try would be to check is if you have the "Leave non-GPU tasks in memory while suspended" option set either in the website computing preferences or in your local preferences.

YEs it was checked but i set my options to never suspend the GPU / CPU. But i changed that now.
Will going to kill all the sleeping grinches and keep and eye if they return.
Time for a beer. Tomorrow is a holiday here.
ID: 1911899 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1911900 - Posted: 8 Jan 2018, 23:19:15 UTC - in response to Message 1911899.  

If I read them right from your screen photos, the two tasks shown in the sleeping processes are:

06mr07ab.22000.9479.12.39.150_2
blc05_2bit_guppi_57976_07262_HIP74926_0026.27310.409.22.45.114.vlar_1

Both seem to be gone from the server, so it appears that those processes got left behind after the tasks were finished. You might be able to go back to your earlier Event Log entries and see if those tasks show up there, and whether or not anything odd occurred about the time they finished processing.
ID: 1911900 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1911903 - Posted: 8 Jan 2018, 23:40:51 UTC

TBarr, we are looking for Zombies where 1 or more tasks are left running occasionally, not every task remaining running every exit - a completely different topic.

With the decline in RTS, I loaded up with an extra 2500 tasks across my Ubuntu 14/16 and Mint 17 systems with v7.2 and v7.6 clients while watching running tasks and didn't see any hang around for more than a few seconds when shutting down by command line. I only did 1 shutdown on each system via the manager, and the same, no problems found.

The 'other' search still continues ...
ID: 1911903 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1911904 - Posted: 8 Jan 2018, 23:43:11 UTC - in response to Message 1911900.  

Tried but can't find them on the host task list anymore:

No such task: 06mr07ab.22000.9479.12.39.150_2
No such task: blc05_2bit_guppi_57976_07262_HIP74926_0026.27310.409.22.45.114.vlar_1

They R.I.P.
ID: 1911904 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1911913 - Posted: 8 Jan 2018, 23:59:18 UTC - in response to Message 1911904.  

Tried but can't find them on the host task list anymore:

No such task: 06mr07ab.22000.9479.12.39.150_2
No such task: blc05_2bit_guppi_57976_07262_HIP74926_0026.27310.409.22.45.114.vlar_1

They R.I.P.
Did you look in your "stdoutdae.txt" and "stdoutdae.old" Event Log files, or do they not hold enough records. You can increase the size of those by using the "<max_event_log_lines>nnnn</max_event_log_lines>" option in cc_config. I usually keep 10000 lines in my bigger boxes, but your Linux one is way busier than mine, so perhaps a larger number would be good for yours.
ID: 1911913 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1911917 - Posted: 9 Jan 2018, 0:11:50 UTC - in response to Message 1911913.  
Last modified: 9 Jan 2018, 0:16:51 UTC

I believe i do. UL to the onedrive if you want to look. Maybe i miss something.

https://1drv.ms/t/s!Asjkc9Jyluh3zxVjRRK7ZN2j2nls

https://1drv.ms/u/s!Asjkc9Jyluh3zxSNZoorSTLnHX1M


Changing the max event to 15K now was 2K only
ID: 1911917 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1911919 - Posted: 9 Jan 2018, 0:31:42 UTC - in response to Message 1911917.  
Last modified: 9 Jan 2018, 0:38:05 UTC

I believe i do. UL to the onedrive if you want to look. Maybe i miss something.
......
Changing the max event to 15K now was 2K only
Yeah, even your ".old" file only goes back about 37 hours, so it appears that any reference to those two tasks must go back further than that. Which would indicate that those sleepy processes have been napping for quite a while. I don't suppose you happened to catch the task name of the third one. I didn't see it in your earlier post.

Anyway, 15K lines should definitely help in future diagnosis. Keep an eye on those files, though, and see how many days that ends up covering. Ultimately, you may want to increase it even further, but just wait and see for now.

EDIT: Oh, I just realized something. For some reason, I thought we were only looking at the three SSE41 processes, but I just realized that you also had several AVX2 processes sleeping and that your earlier screen photos just showed one of each. Did you already kill all of those, or can you still extract the task names from them?
ID: 1911919 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1911920 - Posted: 9 Jan 2018, 0:38:00 UTC - in response to Message 1911919.  

I don't suppose you happened to catch the task name of the third one. I didn't see it in your earlier post.

No and i already kill all the sleepy process. Will keep an eye on that too.
ID: 1911920 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1911921 - Posted: 9 Jan 2018, 0:39:16 UTC - in response to Message 1911920.  

I don't suppose you happened to catch the task name of the third one. I didn't see it in your earlier post.

No and i already kill all the sleepy process. Will keep an eye on that too.
Okay, you just answered the question in my belated edit. Sorry, I apparently wasn't paying close enough attention earlier. :^(
ID: 1911921 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1911933 - Posted: 9 Jan 2018, 1:37:29 UTC

I have my max event lines set to 3000 thinking that was big enough. Then I got whammied by the download servers having a brain-fart when I asked for 37 tasks on a work request with http_debug set. And the server responded with a 65 line dump for every task sent my way. Quickly overwrote everything in my logfile. Had to look in the old file for the beginning of the request.

http_debug is most definitely not working well or like it should right now with the servers.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1911933 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1911947 - Posted: 9 Jan 2018, 3:20:39 UTC

I need to make a correction to the Event Log size information I posted earlier. The "<max_event_log_lines>N</max_event_log_lines>" option in cc_config only controls the number of displayable lines in the Event Log window. To change the maximum size of the "stdoutdae.txt" and "stdoutdae.old" files, and thus the amount of Event Log data retained over time, use the <max_stdout_file_size>N</max_stdout_file_size> option. This is entered in bytes, following this example from the BOINC manual:

 <max_stdout_file_size>N</max_stdout_file_size>
    Specify the maximum size of the standard out log file (stdoutdae.txt); default is 2 MB.
    Sample: <max_stdout_file_size>3145728</max_stdout_file_size> equals 3 MB.
    NB: A Client restart may be needed to have changes take effect!
I apologize if I caused any confusion.
ID: 1911947 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1911948 - Posted: 9 Jan 2018, 3:35:26 UTC - in response to Message 1911947.  

Changed i use 10485760 (10 MB - 5X the original 2 MB size)
ID: 1911948 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1911977 - Posted: 9 Jan 2018, 5:08:49 UTC
Last modified: 9 Jan 2018, 5:33:19 UTC

@Jeff This is interesting:

lun 08 ene 2018 17:29:15 EST: Task Rescheduling Begins
Arecibo non-VLAR CPU Tasks Moved to GPU = 0
Arecibo non-VLAR GPU Tasks Moved to CPU = 0
Arecibo VLAR CPU Tasks Moved to GPU = 0
Arecibo VLAR GPU Tasks Moved to CPU = 0
BLC non-VLAR CPU Tasks Moved to GPU = 0
BLC non-VLAR GPU Tasks Moved to CPU = 0
BLC VLAR CPU Tasks Moved to GPU = 0
BLC VLAR GPU Tasks Moved to CPU = 370
Astropulse CPU Tasks Moved to GPU = 0
Astropulse GPU Tasks Moved to CPU = 0
Deleted File >> /home/juan/BOINC/slots/2/boinc_finish_called    <<<<<< -----------  look here!!!
lun 08 ene 2018 17:29:15 EST: Task Rescheduling Complete

08-Jan-2018 17:27:33 [SETI@home] Starting task blc05_2bit_guppi_57976_14749_HIP91145_0047.25682.409.21.44.171.vlar_0
08-Jan-2018 17:27:35 [SETI@home] Started upload of blc05_2bit_guppi_57976_16213_HIP91234_0051.25235.818.22.45.46.vlar_0_r9107781_0
08-Jan-2018 17:27:37 [SETI@home] Finished upload of blc05_2bit_guppi_57976_16213_HIP91234_0051.25235.818.22.45.46.vlar_0_r9107781_0
08-Jan-2018 17:28:49 [---] Exiting
08-Jan-2018 17:29:19 [---] Starting BOINC client version 7.8.3 for x86_64-pc-linux-gnu
08-Jan-2018 17:29:19 [---] log flags: file_xfer, sched_ops, task
08-Jan-2018 17:29:19 [---] Libraries: libcurl/7.47.0 OpenSSL/1.0.2g zlib/1.2.8 libidn/1.32 librtmp/2.3
08-Jan-2018 17:29:19 [---] Data directory: /home/juan/BOINC
08-Jan-2018 17:29:20 [---] CUDA: NVIDIA GPU 0: GeForce GTX 1070 (driver version 384.98, CUDA version 9.0, compute capability 6.1, 4096MB, 3984MB available, 6852 GFLOPS peak)
08-Jan-2018 17:29:20 [---] CUDA: NVIDIA GPU 1: GeForce GTX 1070 (driver version 384.98, CUDA version 9.0, compute capability 6.1, 4096MB, 3984MB available, 6852 GFLOPS peak)
08-Jan-2018 17:29:20 [---] CUDA: NVIDIA GPU 2: GeForce GTX 1070 (driver version 384.98, CUDA version 9.0, compute capability 6.1, 4096MB, 3984MB available, 6852 GFLOPS peak)
08-Jan-2018 17:29:20 [---] CUDA: NVIDIA GPU 3: GeForce GTX 1070 (driver version 384.98, CUDA version 9.0, 


As you could see the bypass of the issue works and the host continues to crunch normally but the issue still appears even with blc work only and with the WU cache at full (that's new).
ID: 1911977 · Report as offensive
Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · Next

Message boards : Number crunching : Postponed: Waiting to acquire lock


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.