Postponed: Waiting to acquire lock

Message boards : Number crunching : Postponed: Waiting to acquire lock
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 14 · Next

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13731
Credit: 208,696,464
RAC: 304
Australia
Message 1910926 - Posted: 5 Jan 2018, 21:57:06 UTC - in response to Message 1910920.  

Crunching always uses a physical core, so never utilize more than the physical cores available.
It will slow down significantly.

Maybe on Bulldozer CPUs but for my i7 I've always found running with HyperThreading on results in more work being done than with it off, even when using all cores (physical & virtual).
Grant
Darwin NT
ID: 1910926 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1910927 - Posted: 5 Jan 2018, 21:59:25 UTC - in response to Message 1910921.  

yup, Hard to keep them straight lol...
ID: 1910927 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34256
Credit: 79,922,639
RAC: 80
Germany
Message 1910930 - Posted: 5 Jan 2018, 22:05:42 UTC - in response to Message 1910926.  

Crunching always uses a physical core, so never utilize more than the physical cores available.
It will slow down significantly.

Maybe on Bulldozer CPUs but for my i7 I've always found running with HyperThreading on results in more work being done than with it off, even when using all cores (physical & virtual).


Certainly not.


With each crime and every kindness we birth our future.
ID: 1910930 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1910932 - Posted: 5 Jan 2018, 22:07:22 UTC - in response to Message 1910920.  

Crunching always uses a physical core, so never utilize more than the physical cores available.
It will slow down significantly.
Maybe its possible to feed the GPU`s with available threads on modern CPU`s.

So in my 6 cores / 12 threads CPU what is the right thing to do since I already run 4 GPU WU.
Run 2 more CPU WU or keep running 4 As I do totay?
I already know 8 CPU WU slow down everything. 6 CPU WU works fine but never actually check the times .
ID: 1910932 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34256
Credit: 79,922,639
RAC: 80
Germany
Message 1910934 - Posted: 5 Jan 2018, 22:13:57 UTC - in response to Message 1910932.  

Crunching always uses a physical core, so never utilize more than the physical cores available.
It will slow down significantly.
Maybe its possible to feed the GPU`s with available threads on modern CPU`s.

So in my 6 cores / 12 threads CPU what is the right thing to do since I already run 4 GPU WU.
Run 2 more CPU WU or keep running 4 As I do totay?
I already know 8 CPU WU slow down everything. 6 CPU WU works fine but never actually check the times .


It might be worth trying 5 CPU tasks.
Give me a notice and i will check your times again.
Right now it looks perfect.


With each crime and every kindness we birth our future.
ID: 1910934 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13731
Credit: 208,696,464
RAC: 304
Australia
Message 1910936 - Posted: 5 Jan 2018, 22:16:37 UTC - in response to Message 1910932.  
Last modified: 5 Jan 2018, 22:18:04 UTC

I already know 8 CPU WU slow down everything. 6 CPU WU works fine but never actually check the times .

On my i7 system (4 cores 8 threads) running on all available threads does result in longer CPU run times, however it results in more work being done per hour (like back in the day of CUDA on the GPU running more than 1 WU. Longer run times per WU, but more WUs done per hour).
However, as I mentioned- I have reserved 1 CPU Core for every GPU WU that is being run, so no CPU cores are trying to process CPU work and support a GPU WU at the same time. I can see that causing significant processing & system responsiveness slow downs.

My app_config.xml
<app_config>
 <app>
  <name>setiathome_v8</name>
  <gpu_versions>
  <gpu_usage>1.00</gpu_usage>
  <cpu_usage>1.00</cpu_usage>
  </gpu_versions>
 </app>
 <app>
  <name>astropulse_v7</name>
  <gpu_versions>
  <gpu_usage>0.5</gpu_usage>
  <cpu_usage>1.0</cpu_usage>
  </gpu_versions>
 </app>
</app_config>


When running AP, 2 WUs at a time gives the best amount of work per hour, so it takes away a CPU WU core to support that extra GPU WU. When the AP WUs are all done, it goes back to crunching CPU WUs.

I also run Seti only.
Running multiple projects, particularly if your system has the resources to run 2 of them at the same time, will result in much different system behavior than for me with just the one project.
Grant
Darwin NT
ID: 1910936 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1910938 - Posted: 5 Jan 2018, 22:21:45 UTC - in response to Message 1910934.  
Last modified: 5 Jan 2018, 22:24:05 UTC

It might be worth trying 5 CPU tasks.

Changed to 5 now. Takes about 1 Hr to crunch each CPU WU but seems like the new blc5 type crunch in less time.
ID: 1910938 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13731
Credit: 208,696,464
RAC: 304
Australia
Message 1910941 - Posted: 5 Jan 2018, 22:25:41 UTC - in response to Message 1910938.  
Last modified: 5 Jan 2018, 22:26:08 UTC

It might be worth trying 5 CPU tasks.

Changed to 5 now. Takes about 1 Hr to crunch each WU.

The best way to check is to compare the Run time to the CPU time for a given CPU WU.
At present, the difference for your CPU WUs is round 3 seconds. Mine is up to 3 minutes. If you've got more than 10min on many WUs, then the system is showing signs of being over committed. If it's 15min or more, it's overcommitted.
Reserve Cores for the GPU, or just use less. My personal preference, reserve Cores and use them all.
Grant
Darwin NT
ID: 1910941 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34256
Credit: 79,922,639
RAC: 80
Germany
Message 1910944 - Posted: 5 Jan 2018, 22:30:16 UTC - in response to Message 1910941.  
Last modified: 5 Jan 2018, 22:31:32 UTC

It might be worth trying 5 CPU tasks.

Changed to 5 now. Takes about 1 Hr to crunch each WU.

The best way to check is to compare the Run time to the CPU time for a given CPU WU.
At present, the difference for your CPU WUs is round 3 seconds. Mine is up to 3 minutes. If you've got more than 10min on many WUs, then the system is showing signs of being over committed. If it's 15min or more, it's overcommitted.
Reserve Cores for the GPU, or just use less. My personal preference, reserve Cores and use them all.


Please don`t give wrong advice.
Your CPU times on the I7 are slower than it was on my FX .
If you are happy with the way you are running the host its fine but you should know by now that i know what i`m talking about.


With each crime and every kindness we birth our future.
ID: 1910944 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1910946 - Posted: 5 Jan 2018, 22:30:43 UTC
Last modified: 5 Jan 2018, 22:32:49 UTC

This is my app_config file. I believe this already reserve the core or no?

<app_config>
<project_max_concurrent>9</project_max_concurrent>
 <app_version>
    <app_name>setiathome_v8</app_name>
    <plan_class>cuda90</plan_class>
    <avg_ncpus>1.0</avg_ncpus>
    <ngpus>1.0</ngpus>
    <cmdline>-unroll 15 -pfb 16 -pfp 600 -nobs</cmdline>
 </app_version>
 <app_version>
    <app_name>astropulse_v7</app_name>
    <plan_class>opencl_nvidia_100</plan_class>
    <avg_ncpus>1.0</avg_ncpus>
    <ngpus>1.0</ngpus>
    <cmdline>-use_sleep -unroll 15 -sbs 256 -ffa_block 12288 -ffa_block_fetch 6144</cmdline>
 </app_version>
</app_config>


<edit> just remembering my host has 4 GPU's
ID: 1910946 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34256
Credit: 79,922,639
RAC: 80
Germany
Message 1910948 - Posted: 5 Jan 2018, 22:32:16 UTC

Don`t change anything for know.


With each crime and every kindness we birth our future.
ID: 1910948 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1910951 - Posted: 5 Jan 2018, 22:35:10 UTC
Last modified: 5 Jan 2018, 22:35:50 UTC

Don`t change anything for know.


By your command. LOL

Running 4 GPU + 5 CPU WU

Out off topic: blc5 type WU are crunched in 2:21 Min on the GPU and in 45 min on the CPU. I like that.
ID: 1910951 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13731
Credit: 208,696,464
RAC: 304
Australia
Message 1910952 - Posted: 5 Jan 2018, 22:36:59 UTC - in response to Message 1910944.  

Please don`t give wrong advice.
Your CPU times on the I7 are slower than it was on my FX .
If you are happy with the way you are running the host its fine but you should know by now that i know what i`m talking about.

I'm aware of the work you have done, i'm also aware of the number of big crunchers that run more than 1 WU at a time on their GPUs.
Yet my personal experience on my system is that running more than 1 GPU WU at a time doesn't result in more work per hour, and with HyperThreading on, and using all the cores, I get more work per hour done than with HyperThreading off with faster WU run times.
True, I haven't tried HyperThreading off and more than 1 GPU WU, but it works & is stable so i'm not going to fiddle further with it.
Grant
Darwin NT
ID: 1910952 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1910963 - Posted: 5 Jan 2018, 22:50:13 UTC - in response to Message 1910952.  
Last modified: 5 Jan 2018, 22:52:00 UTC

Some of the confusion comes from how the AMD Bulldozer/Piledriver cpu was arranged. It was arguably a 4core/8HT thread processor. The virtual cores or thread had to share time accessing the single FPU register in the (2 core) module. So trying to run a cpu thread on the virtual core slowed cpu processing down for both threads in the module. For FX processor best to limit cpu core usage to only the 4 physical cores for cpu tasks. You can let the virtual threads support any gpu tasks as a gpu task does not need any math functions from the cpu. Math work is done on the gpu.

With Ryzen, everything has changed. Now the current AMD architecture bears more resemblance to Intel architecture with respect to hyperthreading. Each thread has access to its own FPU register so you don't see the slowdown with cpu tasks on a virtual core like you did with FX.

Still, as has been pointed out by Mike and Grant, ideally you should keep cpu_time and run_time equal or very close. If the times diverge a large amount, it shows the processor is overcommitted and is not running the tasks as efficiently as possible.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1910963 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34256
Credit: 79,922,639
RAC: 80
Germany
Message 1910964 - Posted: 5 Jan 2018, 22:50:39 UTC - in response to Message 1910952.  

Please don`t give wrong advice.
Your CPU times on the I7 are slower than it was on my FX .
If you are happy with the way you are running the host its fine but you should know by now that i know what i`m talking about.

I'm aware of the work you have done, i'm also aware of the number of big crunchers that run more than 1 WU at a time on their GPUs.
Yet my personal experience on my system is that running more than 1 GPU WU at a time doesn't result in more work per hour, and with HyperThreading on, and using all the cores, I get more work per hour done than with HyperThreading off with faster WU run times.
True, I haven't tried HyperThreading off and more than 1 GPU WU, but it works & is stable so i'm not going to fiddle further with it.


Nobody said you should.


With each crime and every kindness we birth our future.
ID: 1910964 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1910966 - Posted: 5 Jan 2018, 22:52:01 UTC

Juan, it looks like yesterday afternoon you also had a period of tasks with the "Can't acquire lockfile" messages. Take a look at these 6 tasks while they're still on the server:

https://setiathome.berkeley.edu/result.php?resultid=6285905907
https://setiathome.berkeley.edu/result.php?resultid=6285905928
https://setiathome.berkeley.edu/result.php?resultid=6286130580
https://setiathome.berkeley.edu/result.php?resultid=6286157029
https://setiathome.berkeley.edu/result.php?resultid=6286176312
https://setiathome.berkeley.edu/result.php?resultid=6286204543

The problem started at about 16:15:55 local time and cleared shortly after 17:27:51 local time. I suspect that if you'd let your other 4 tasks continue this morning, they might have eventually cleared, also.

What I would suggest is that you first review your BOINC Event Log for that time period and see if you can determine what might have occurred around 16:15 to trigger the issue. If there's nothing unusual there, then perhaps a system log might tell you something. Or, perhaps you can recall running some sort of resource hogging application on that machine about that time. If you can pinpoint a cause, then look for the same sort of thing when the problem cropped up again this morning. Good luck!
ID: 1910966 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13731
Credit: 208,696,464
RAC: 304
Australia
Message 1910971 - Posted: 5 Jan 2018, 22:57:15 UTC - in response to Message 1910964.  

True, I haven't tried HyperThreading off and more than 1 GPU WU, but it works & is stable so i'm not going to fiddle further with it.

Nobody said you should.

I know that, but I was just pointing out it was the one thing that I haven't tried (that and limiting the number of cores used. So that makes it 2 things I haven't tried) to get the same results that others say they are getting.
Grant
Darwin NT
ID: 1910971 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1910974 - Posted: 5 Jan 2018, 22:59:01 UTC

Wow, you've been busy while I've been out supping a pint or two. Anyone actually matching symptoms to solutions?
ID: 1910974 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1910982 - Posted: 5 Jan 2018, 23:09:02 UTC - in response to Message 1910966.  
Last modified: 5 Jan 2018, 23:09:53 UTC

What I would suggest is that you first review your BOINC Event Log for that time period

Slowly... what is the file name of this?

Or, perhaps you can recall running some sort of resource hogging application on that machine about that time. If you can pinpoint a cause, then look for the same sort of thing when the problem cropped up again this morning.

Almost sure not, unless something running without my knowledge of course. When i use this host is just to browse besides this forums and the S@H site itself, some very light sites like read a newspaper. Not runs anything else on the host. Mainly because i not learn how to run more things on this Linux host.
ID: 1910982 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1910985 - Posted: 5 Jan 2018, 23:12:12 UTC - in response to Message 1910974.  

Wow, you've been busy while I've been out supping a pint or two. Anyone actually matching symptoms to solutions?
Didn't bring back any to share, huh? ;^)

Well, after discovering that Juan's machine experienced an earlier episode with the same symptoms, I'm thinking that maybe my "overcommitted resources" theory might be back in play. When I was looking at Bill G's earlier problem a couple months ago, I discovered from an old Process Monitor log that the BOINC client polls all the slots every 5 minutes checking for lockfiles. What those logs don't tell me is how long the client waits for a response from the OS before timing out and deciding that it can't acquire a lockfile in a particular slot. Unless somebody knows that off the top of their head, I'd say it would probably require an experienced code walker to ferret out that info. Know anybody like that?
ID: 1910985 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 . . . 14 · Next

Message boards : Number crunching : Postponed: Waiting to acquire lock


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.