Posts by Jeff Buck

21) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1912586)
Posted 12 Jan 2018 by Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Post:
Okay, the x41p_zi3v, Cuda 9.00 agreed with SoG, so that should confirm that this particularly Pulse issue has been addressed. The x41p_zi3t1f should be retired and replaced with a newer version.

In a way, this highlights one of the problems facing the Special App when, hopefully, the remaining issues one day get resolved and a stable version gets released for general use. Over the many months that some version of the Special App has appeared in my Inconclusives list, I've identified no fewer that 31 different versions, all essentially being Beta-tested in the production environment. I have no way of knowing how many are currently active but, as the previous example shows, there are certainly some that should be upgraded.

Unfortunately, there's really no way to force the retirement of those earlier versions. I don't think the project has any way to do it, so that should put it in the hands of the developer. But that isn't really practical, either, so........the bottom line seems to be that even if a completely clean version of the Special App got released tomorrow, some of those earlier test versions are likely to be hanging around for a good long while. That's perfectly normal if it happens on Beta, but it shouldn't happen that way on Main.
22) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1912455)
Posted 12 Jan 2018 by Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Post:
I don't normally post Inconclusives involving a significantly older version of the Special App, but I think it will be useful to verify that the Pulse difference that caused this Inconclusive is something that's been fixed in a later version, or whether it will cross-validate with the tiebreaker on my host, which will run with x41p_zi3v, Cuda 9.00.

Workunit 2812663093 (blc05_2bit_guppi_57976_75003_HIP46005_0031.23430.818.22.45.80.vlar)
Task 6304795497 (S=1, A=0, P=5, T=1, G=0, BS=24.12749, BG=0, BP=3.887372) v8.22 (opencl_nvidia_SoG) windows_intelx86
Task 6305023330 (S=1, A=0, P=5, T=1, G=0, BS=24.12747, BG=0, BP=3.887368) x41p_zi3t1f, Cuda 8.00 special

The difference is in the second Pulse, which SoG reports as....
Pulse: peak=6.384412, time=45.82, period=14.39, d_freq=7495055652.29, score=1.044, chirp=24.211, fft_len=64

....while the x41p_zi3t1f, Cuda 8.00 special reports....
Pulse: peak=6.210592, time=45.82, period=14.39, d_freq=7495055652.29, score=1.016, chirp=24.211, fft_len=64

If my host agrees with SoG, then it should confirm that upgrading to a newer version of the Special App is advisable. However, if it cross-validates, I imagine some offline runs with other apps would be appropriate to see if SoG's value is truly the correct one (in which case the ball would probably be back in Petri's court).
23) Message boards : Number crunching : Panic Mode On (109) Server Problems? (Message 1912431)
Posted 12 Jan 2018 by Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Post:
Thanks for that confirmation, Richard. I was pretty sure it wasn't just the credit calculation that needed the APR, but would have been hard-pressed to verify that assumption myself. ;^)
24) Message boards : Number crunching : Panic Mode On (109) Server Problems? (Message 1912410)
Posted 11 Jan 2018 by Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Post:
APR is an "artifact" that is only required by CreditScrew - if one goes to a data based credit system it become redundant.
I was under the impression that APR was a determining factor in the task allotment calculation, in order to determine how many tasks it takes to fill the "Store at least nn days of work" requirement. (At least for those who aren't already maxing out the 100 + 100 limits.) Is that not correct?
25) Message boards : Number crunching : Postponed: Waiting to acquire lock (Message 1912044)
Posted 10 Jan 2018 by Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Post:
So, even the simplest level of Analysis 101 is still too complex for you. Then please stick to putting your efforts into what you seem to be good at, which appears to be compiling programs for use in non-standard operating environments. We'll all get more benefit from that kind of effort than from your ridiculous diatribes here.
26) Message boards : Number crunching : Postponed: Waiting to acquire lock (Message 1912036)
Posted 10 Jan 2018 by Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Post:
How is it you consistently ignore every piece of solid evidence presented in this thread and repeatedly spout the same asinine conclusions that have already been pretty thoroughly dismissed. You must have a really special talent.

At its simplest level, two people have reported the problem. One happens to be doing some rescheduling, one is not. Both users reporting the problem happen to be running Ubuntu 16.04 with either an AVX or AVX2 CPU app. Nobody has "blamed" any individual app, OS, or anything else, because not enough evidence has been developed yet to reach any conclusions. So, unless you care to actually contribute useful analysis and investigative effort to this issue, you should really just BUTT OUT.
27) Message boards : Number crunching : Postponed: Waiting to acquire lock (Message 1912023)
Posted 10 Jan 2018 by Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Post:
@Jeff This is interesting:

Deleted File >> /home/juan/BOINC/slots/2/boinc_finish_called    <<<<<< -----------  look here!!!
.
Unfortunately, that's a "finish" file rather than a lockfile, so a totally different and longstanding issue. This just prevents the task from getting a "finish file not present" error when BOINC restarts. That's an issue I hope the BOINC developers will address someday.
28) Message boards : Number crunching : Postponed: Waiting to acquire lock (Message 1911947)
Posted 9 Jan 2018 by Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Post:
I need to make a correction to the Event Log size information I posted earlier. The "<max_event_log_lines>N</max_event_log_lines>" option in cc_config only controls the number of displayable lines in the Event Log window. To change the maximum size of the "stdoutdae.txt" and "stdoutdae.old" files, and thus the amount of Event Log data retained over time, use the <max_stdout_file_size>N</max_stdout_file_size> option. This is entered in bytes, following this example from the BOINC manual:

 <max_stdout_file_size>N</max_stdout_file_size>
    Specify the maximum size of the standard out log file (stdoutdae.txt); default is 2 MB.
    Sample: <max_stdout_file_size>3145728</max_stdout_file_size> equals 3 MB.
    NB: A Client restart may be needed to have changes take effect!
I apologize if I caused any confusion.
29) Message boards : Number crunching : Postponed: Waiting to acquire lock (Message 1911921)
Posted 9 Jan 2018 by Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Post:
I don't suppose you happened to catch the task name of the third one. I didn't see it in your earlier post.

No and i already kill all the sleepy process. Will keep an eye on that too.
Okay, you just answered the question in my belated edit. Sorry, I apparently wasn't paying close enough attention earlier. :^(
30) Message boards : Number crunching : Postponed: Waiting to acquire lock (Message 1911919)
Posted 9 Jan 2018 by Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Post:
I believe i do. UL to the onedrive if you want to look. Maybe i miss something.
......
Changing the max event to 15K now was 2K only
Yeah, even your ".old" file only goes back about 37 hours, so it appears that any reference to those two tasks must go back further than that. Which would indicate that those sleepy processes have been napping for quite a while. I don't suppose you happened to catch the task name of the third one. I didn't see it in your earlier post.

Anyway, 15K lines should definitely help in future diagnosis. Keep an eye on those files, though, and see how many days that ends up covering. Ultimately, you may want to increase it even further, but just wait and see for now.

EDIT: Oh, I just realized something. For some reason, I thought we were only looking at the three SSE41 processes, but I just realized that you also had several AVX2 processes sleeping and that your earlier screen photos just showed one of each. Did you already kill all of those, or can you still extract the task names from them?
31) Message boards : Number crunching : Postponed: Waiting to acquire lock (Message 1911913)
Posted 8 Jan 2018 by Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Post:
Tried but can't find them on the host task list anymore:

No such task: 06mr07ab.22000.9479.12.39.150_2
No such task: blc05_2bit_guppi_57976_07262_HIP74926_0026.27310.409.22.45.114.vlar_1

They R.I.P.
Did you look in your "stdoutdae.txt" and "stdoutdae.old" Event Log files, or do they not hold enough records. You can increase the size of those by using the "<max_event_log_lines>nnnn</max_event_log_lines>" option in cc_config. I usually keep 10000 lines in my bigger boxes, but your Linux one is way busier than mine, so perhaps a larger number would be good for yours.
32) Message boards : Number crunching : Postponed: Waiting to acquire lock (Message 1911900)
Posted 8 Jan 2018 by Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Post:
If I read them right from your screen photos, the two tasks shown in the sleeping processes are:

06mr07ab.22000.9479.12.39.150_2
blc05_2bit_guppi_57976_07262_HIP74926_0026.27310.409.22.45.114.vlar_1

Both seem to be gone from the server, so it appears that those processes got left behind after the tasks were finished. You might be able to go back to your earlier Event Log entries and see if those tasks show up there, and whether or not anything odd occurred about the time they finished processing.
33) Message boards : Number crunching : Postponed: Waiting to acquire lock (Message 1911895)
Posted 8 Jan 2018 by Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Post:
BOINC doesn't always stop the running tasks by just exiting the Manager. That should be pretty clear by now.
Clear as mud.

"Stop running tasks" is really a misnomer, as what it really means is "Stop the client". The client then needs to handle stopping the tasks. If what you're saying is that the client stopped, but the tasks didn't, then yes, it would be relevant to this discussion, because that would be leaving Zombie tasks behind and possibly unwanted slot contents. On the other hand, if it's simply that the Manager doesn't stop the client, then having the client and its subordinate tasks continue to run really doesn't lead to either residual lockfiles or Zombie tasks, does it? Once the second step is taken to stop the client, the tasks should shut down then, as well. Only if there's a hitch in that process would it be relevant here.
34) Message boards : Number crunching : Postponed: Waiting to acquire lock (Message 1911888)
Posted 8 Jan 2018 by Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Post:
@Jeff

I made what you ask. Post 4 examples on the link. All AVX2 shows all parameters equal as the posted. Each SSE4.1 shows different CPU time. In all the status is Sleep. So they are not phantoms they are sleeping grinch!. Sometimes i really hate computers!

What i can't understand why they still in the memory some after 24 hrs.

http://ttps://1drv.ms/u/s!Asjkc9Jyluh3zxFNnAHQAuFT-19J

<edit> One question... What program is supposed to delete the process? Or at least tell the OS to kill the process?
First, you should probably fix your link. It threw me for a bit.

Those two show the task files as '(deleted)', which probably means that the tasks have actually been completed and reported. Did you try to verify that from your task pages? If they actually have successfully completed, you could probably just go ahead and kill the processes now from System Monitor, if you like, although it doesn't sound like they're causing your system any problems.

I don't know if there's anything there that could tell you what caused the processes to go to sleep. (Although you did say its pretty hot there today, as I recall. ;^)) I notice that each one shows a different "Waiting Channel", but I have no idea what that means.
35) Message boards : Number crunching : Postponed: Waiting to acquire lock (Message 1911874)
Posted 8 Jan 2018 by Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Post:
If you can prove that the problem only exists in a specific OS version, then post your evidence. Otherwise, put a sock in it. We aren't just trying to help users running 16.04, we are trying to identify the conditions under which the problem may exist. Having the first two reports of the problem occur in 16.04 does not automatically prove that 16.04 is the only OS version in which it might live.
36) Message boards : Number crunching : Postponed: Waiting to acquire lock (Message 1911859)
Posted 8 Jan 2018 by Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Post:
I'm totally lost too.

I'm at home now so if something lights please let me know.

BTW The host continues to crunch with normal times so that not affect it performance.
In System Monitor, try right-clicking on each of those processes and choose Properties to see what kind of CPU% and other resources each is consuming. That should tell you if some of them are actually dead or, at least, what's going on with each.

EDIT: Also, the "Open Files" selection in the right-click menu will tell you which files that process has open. Then you can compare each WU file with your active and completed WUs.
37) Message boards : Number crunching : Postponed: Waiting to acquire lock (Message 1911854)
Posted 8 Jan 2018 by Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Post:
You are running an older version of Ubuntu. What works in 14.04 Doesn't work the same as with 16.04, so, your data points are useless to those people in this thread running 16.04.
That could only be true if everything in 14.04 worked differently in 16.04, top to bottom. You don't seem to be able to grasp the concept that isolating a problem requires looking at as many different variables as possible until the focus starts to narrow. Right now, that can include OS, BOINC version, application type and version, hardware, task count, etc., etc., and possible combinations of any of those.

So far, we've only recorded observations that the problem occurs with a few applications running under a single BOINC version on very fast hardware. That doesn't really narrow the focus that much. Other observations, and tests run in different environments, may either expand or narrow the scope, by either confirming that the problem exists with different factors in play, or by eliminating certain factors, as long as those tests are repeatable. Simple problem identification and debugging 101. Until specific factors are eliminated, no data points are useless.
38) Message boards : Number crunching : Postponed: Waiting to acquire lock (Message 1911836)
Posted 8 Jan 2018 by Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Post:
I'm not having any issues on my machines and have no need at present to change my OS. I simply posted my observations to provide more data points to assist in problem resolution. As to number of running CPU tasks, I have one machine running 8, one 7, and one 4. Again, all cleanly terminated within 4 seconds.
39) Message boards : Number crunching : Postponed: Waiting to acquire lock (Message 1911804)
Posted 8 Jan 2018 by Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Post:
Actually, I was noting that my own tasks, all running with "Very Low" priority, were the ones that took no more than 4 seconds to kill, whereas the Zombie examples that Ruelke posted were ones running in "Normal" priority.

EDIT: It may also be worth noting that my machines all have much slower CPUs than yours and Ruelke's.
40) Message boards : Number crunching : Postponed: Waiting to acquire lock (Message 1911794)
Posted 8 Jan 2018 by Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Post:
Is it possible that BOINC version enters into these Zombie task situations? I just tried a normal BOINC Manager Exit on each of my 3 Linux systems. All of mine are set to automatically shut down the client and running tasks. On all 3, System Monitor showed the longest shutdown delay for the last of the running tasks was no more than 4 seconds. All three of my Linux boxes are running BOINC 7.2.42.

One other observation. I notice from Ruelke's screenshot that his tasks are running with "Normal" priority, while on all 3 of my boxes the tasks were set to "Very Low" priority. (boinc and boincmgr show as Normal priority.) Could that be a factor?


Previous 20 · Next 20


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.