OpenCL MB v8.12 issues thread attempt 2

Message boards : Number crunching : OpenCL MB v8.12 issues thread attempt 2
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1789307 - Posted: 21 May 2016, 17:32:24 UTC - in response to Message 1789202.  

Btw, why did the 8.10 app get pushed to main

Perhaps he decided that quite small fraction affected.

Actually it affects Every Mac Pro running Darwin 15.4 or higher, which is just about ALL of them. From what I can see, there was very little difference between the Older App and the Newer App, the main difference being the Older App actually reported Gaussians. From Chris's Mac, I'd say the Mac Pros are going to have about a 50% Inconclusive rate with the Newer App...Hundreds per machine.
Even my current "Special" CUDA App is much better than that. Maybe I should post it.
ID: 1789307 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1789314 - Posted: 21 May 2016, 18:02:41 UTC - in response to Message 1789307.  
Last modified: 21 May 2016, 18:10:10 UTC

Btw, why did the 8.10 app get pushed to main

Perhaps he decided that quite small fraction affected.

Actually it affects Every Mac Pro running Darwin 15.4 or higher, which is just about ALL of them. From what I can see, there was very little difference between the Older App and the Newer App, the main difference being the Older App actually reported Gaussians. From Chris's Mac, I'd say the Mac Pros are going to have about a 50% Inconclusive rate with the Newer App...Hundreds per machine.
Even my current "Special" CUDA App is much better than that. Maybe I should post it.


I passed this info to Eric.

EDIT: Also, I await logs for benches in controled environment of same task different revs.
ID: 1789314 · Report as offensive
Chris Adamek
Volunteer tester

Send message
Joined: 15 May 99
Posts: 251
Credit: 434,772,072
RAC: 236
United States
Message 1789325 - Posted: 21 May 2016, 19:18:36 UTC - in response to Message 1789314.  

Btw, why did the 8.10 app get pushed to main

Perhaps he decided that quite small fraction affected.

Actually it affects Every Mac Pro running Darwin 15.4 or higher, which is just about ALL of them. From what I can see, there was very little difference between the Older App and the Newer App, the main difference being the Older App actually reported Gaussians. From Chris's Mac, I'd say the Mac Pros are going to have about a 50% Inconclusive rate with the Newer App...Hundreds per machine.
Even my current "Special" CUDA App is much better than that. Maybe I should post it.


I passed this info to Eric.

EDIT: Also, I await logs for benches in controled environment of same task different revs.


I'm going have some time tonight after the kiddo's bedtime. I'm single parenting it this week while my better half is in Japan. I'll post over in beta when I get some successful runs in.

Chris
ID: 1789325 · Report as offensive
Profile Rune Bjørge

Send message
Joined: 5 Feb 00
Posts: 45
Credit: 30,508,204
RAC: 5
Norway
Message 1789450 - Posted: 22 May 2016, 10:22:09 UTC

Had some TDR (Timeout Detect and Recovery) related crashes With the MB v8.12 WU's.
The cruncher With the problem was http://setiathome.berkeley.edu/show_host_detail.php?hostid=7968069

Solved the crash issue by setting longer timeout in the registry.
Link to information about TDR : https://msdn.microsoft.com/en-us/library/windows/hardware/ff570087(v=vs.85).aspx

This might help out if you experience driver crashes.
ID: 1789450 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1789451 - Posted: 22 May 2016, 11:05:16 UTC - in response to Message 1789450.  

Also one can use -period_iterations_num 100 parameter in tuning line.
ID: 1789451 · Report as offensive
Profile Questor Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 3 Sep 04
Posts: 471
Credit: 230,506,401
RAC: 157
United Kingdom
Message 1789991 - Posted: 24 May 2016, 9:25:16 UTC
Last modified: 24 May 2016, 9:38:32 UTC

I have been setting -instances_per_device 3 -cpu_lock -hp in the MB command line.

When I received some AP tasks yesterday I noticed one running task was stalled with zero CPU activity. (Pausing all other GPU tasks allowed the AP to start processing again)

The AP command line also has -instances_per_device 3 -cpu_lock -hp

The AP readme file says [quote]-instances_per_device N :Sets allowed number of simultaneously executed GPU app instances per GPU device (shared with MultiBeam app instances).

However, AP and MB apps are setting affinity independantly of each other.

MB tasks were on CPU0 and CPU1
and AP task was on CPU0 instead of CPU2

I hadn't previously had the commandline parameters set so do not know if this is correct behaviour or not?
GPU Users Group



ID: 1789991 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34253
Credit: 79,922,639
RAC: 80
Germany
Message 1790032 - Posted: 24 May 2016, 13:06:05 UTC

Add -total_GPU_instances_num 3 to your MB comandline.txt file and remove -hp from both.


With each crime and every kindness we birth our future.
ID: 1790032 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1790037 - Posted: 24 May 2016, 13:43:53 UTC - in response to Message 1789991.  


However, AP and MB apps are setting affinity independantly of each other.

MB tasks were on CPU0 and CPU1
and AP task was on CPU0 instead of CPU2

I hadn't previously had the commandline parameters set so do not know if this is correct behaviour or not?


If after adding -total_GPU_instances_num 3 AP and MB will still share same CPU (both having -cpu_lock and -instances_per_device 3 ) please report again.
ID: 1790037 · Report as offensive
Profile Questor Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 3 Sep 04
Posts: 471
Credit: 230,506,401
RAC: 157
United Kingdom
Message 1790039 - Posted: 24 May 2016, 13:56:58 UTC - in response to Message 1790037.  
Last modified: 24 May 2016, 14:06:02 UTC


However, AP and MB apps are setting affinity independantly of each other.

MB tasks were on CPU0 and CPU1
and AP task was on CPU0 instead of CPU2

I hadn't previously had the commandline parameters set so do not know if this is correct behaviour or not?


If after adding -total_GPU_instances_num 3 AP and MB will still share same CPU (both having -cpu_lock and -instances_per_device 3 ) please report again.


I have that set in both AP and MB command line files now (although I could not see that AP README referred to it)

It is still :

MB tasks are on CPU0 and CPU1
and AP task is on CPU0 instead of CPU2


Edit : also tried with just in MB file and without -hp : no difference.

I only have one GPU running three tasks.
http://setiathome.berkeley.edu/show_host_detail.php?hostid=7973566
GPU Users Group



ID: 1790039 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1790042 - Posted: 24 May 2016, 14:41:02 UTC - in response to Message 1790039.  
Last modified: 24 May 2016, 14:43:13 UTC

Thanks for report. AP binary needs rebuild to include that option.

P.S. Also, could you run tool like ProcessExplorer and list named MutExes that created both for AP and MB processes in your case?
ID: 1790042 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1790070 - Posted: 24 May 2016, 21:37:28 UTC - in response to Message 1790039.  


However, AP and MB apps are setting affinity independantly of each other.

MB tasks were on CPU0 and CPU1
and AP task was on CPU0 instead of CPU2

I hadn't previously had the commandline parameters set so do not know if this is correct behaviour or not?


If after adding -total_GPU_instances_num 3 AP and MB will still share same CPU (both having -cpu_lock and -instances_per_device 3 ) please report again.


I have that set in both AP and MB command line files now (although I could not see that AP README referred to it)

It is still :

MB tasks are on CPU0 and CPU1
and AP task is on CPU0 instead of CPU2


Edit : also tried with just in MB file and without -hp : no difference.

I only have one GPU running three tasks.
http://setiathome.berkeley.edu/show_host_detail.php?hostid=7973566


What happens if you remove -cpu_lock?
ID: 1790070 · Report as offensive
Profile Questor Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 3 Sep 04
Posts: 471
Credit: 230,506,401
RAC: 157
United Kingdom
Message 1790089 - Posted: 24 May 2016, 22:35:07 UTC - in response to Message 1790042.  

Thanks for report. AP binary needs rebuild to include that option.

P.S. Also, could you run tool like ProcessExplorer and list named MutExes that created both for AP and MB processes in your case?


Hope this is what you're after :-
SOG1, SOG2 and AP

For SOG1
Mutant \BaseNamedObjects\SETI_GPU_App_Slot8_Mutex
Mutant \Sessions\3\BaseNamedObjects\LoggerMutex(005692)
Mutant \Sessions\3\BaseNamedObjects\LoggerMutex(005692)
Mutant \BaseNamedObjects\SETI_NV_GPU_App_Slot0_Mutex
Mutant \BaseNamedObjects\SETI_NV_GPU_App_Slot1_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot1_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot0_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot2_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot3_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot4_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot5_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot6_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot7_Mutex
Mutant \BaseNamedObjects\SETI_NV_GPU_App_Slot2_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot9_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot10_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot11_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot12_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot13_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot14_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot15_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot16_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot17_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot18_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot19_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot20_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot21_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot22_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot23_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot24_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot25_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot26_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot27_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot28_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot29_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot30_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot31_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot32_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot33_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot34_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot35_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot36_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot37_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot38_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot39_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot40_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot41_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot42_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot43_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot44_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot45_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot46_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot47_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot48_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot49_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot50_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot51_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot52_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot53_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot54_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot55_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot56_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot57_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot58_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot59_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot60_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot61_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot62_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot63_Mutex



For SOG2
Mutant \BaseNamedObjects\SETI_GPU_App_Slot0_Mutex
Mutant \Sessions\3\BaseNamedObjects\LoggerMutex(003624)
Mutant \Sessions\3\BaseNamedObjects\LoggerMutex(003624)
Mutant \BaseNamedObjects\SETI_NV_GPU_App_Slot0_Mutex
Mutant \BaseNamedObjects\SETI_NV_GPU_App_Slot1_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot8_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot1_Mutex
Mutant \BaseNamedObjects\SETI_NV_GPU_App_Slot2_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot2_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot3_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot4_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot5_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot6_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot7_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot9_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot10_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot11_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot12_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot13_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot14_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot15_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot16_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot17_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot18_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot19_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot20_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot21_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot22_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot23_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot24_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot25_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot26_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot27_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot28_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot29_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot30_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot31_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot32_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot33_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot34_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot35_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot36_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot37_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot38_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot39_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot40_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot41_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot42_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot43_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot44_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot45_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot46_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot47_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot48_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot49_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot50_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot51_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot52_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot53_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot54_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot55_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot56_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot57_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot58_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot59_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot60_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot61_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot62_Mutex
Mutant \BaseNamedObjects\SETI_GPU_App_Slot63_Mutex


For AP
Mutant \BaseNamedObjects\SETI_NV_GPU_App_Slot0_Mutex
Mutant \Sessions\3\BaseNamedObjects\LoggerMutex(011408)
Mutant \Sessions\3\BaseNamedObjects\LoggerMutex(011408)
Mutant \BaseNamedObjects\SETI_NV_GPU_App_Slot2_Mutex
Mutant \BaseNamedObjects\SETI_NV_GPU_App_Slot1_Mutex
GPU Users Group



ID: 1790089 · Report as offensive
Profile Questor Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 3 Sep 04
Posts: 471
Credit: 230,506,401
RAC: 157
United Kingdom
Message 1790096 - Posted: 24 May 2016, 22:52:59 UTC

Just after I posted this the sytem went down - didn't know my post had got though.

While the system was down I ran some further tests and I think I've got to the bottom of it now.

I removed the -total_GPU_instances_num 3 from the MB command line file and the affinity assignment is now working OK.

I forced switching of tasks to include mixes of AP/MB and affinity works every time without -total_GPU_instances_num N in there.

For me putting that back in there reintroduces the problem on two different machines. Apologies, I didn't make it clear that I had -total_GPU_instances_num N in my command line to begin with.

I see there are a completely different set of mutex when -total_GPU_instances_num present. I assume the extras are used when -total_GPU_instances_num is present and is therefore incompatible with the AP version which currently doesn't support it.

Current mutex entries :-

SOG1
Mutant \Sessions\3\BaseNamedObjects\LoggerMutex(010856)
Mutant \Sessions\3\BaseNamedObjects\LoggerMutex(010856)
Mutant \BaseNamedObjects\SETI_NV_GPU_App_Slot1_Mutex
Mutant \BaseNamedObjects\SETI_NV_GPU_App_Slot0_Mutex
Mutant \BaseNamedObjects\SETI_NV_GPU_App_Slot2_Mutex

SOG2
Mutant \Sessions\3\BaseNamedObjects\LoggerMutex(010224)
Mutant \Sessions\3\BaseNamedObjects\LoggerMutex(010224)
Mutant \BaseNamedObjects\SETI_NV_GPU_App_Slot0_Mutex
Mutant \BaseNamedObjects\SETI_NV_GPU_App_Slot1_Mutex
Mutant \BaseNamedObjects\SETI_NV_GPU_App_Slot2_Mutex

AP
Mutant \Sessions\3\BaseNamedObjects\LoggerMutex(005952)
Mutant \Sessions\3\BaseNamedObjects\LoggerMutex(005952)
Mutant \BaseNamedObjects\SETI_NV_GPU_App_Slot1_Mutex
Mutant \BaseNamedObjects\SETI_NV_GPU_App_Slot0_Mutex
Mutant \BaseNamedObjects\SETI_NV_GPU_App_Slot2_Mutex

So no problem for me with only one GPU but anyone using multiple GPUs should avoid using -cpu_lock if running AP and MB especially if using -hp as two or more tasks will be bound to the same CPU thread depending on number of GPUs in use and mix of AP and MB.

Zalster: If I remove -cpu_lock and -hp then task assignment is as normal just at the whim of the windows multi-tasking management and affinity switches around all threads. With my setup, however, the -cpu_lock works better.
GPU Users Group



ID: 1790096 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1790124 - Posted: 25 May 2016, 0:04:20 UTC - in response to Message 1790096.  


So no problem for me with only one GPU but anyone using multiple GPUs should avoid using -cpu_lock if running AP and MB especially if using -hp as two or more tasks will be bound to the same CPU thread depending on number of GPUs in use and mix of AP and MB.

Thanks for issue investigation. But from your report I don't see from where your conclusion about -cpu_lock comes.
It seems -total_GPU_instances_num caused incompatibility, not -cpu_lock.
Please make this more clear.
ID: 1790124 · Report as offensive
Profile Questor Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 3 Sep 04
Posts: 471
Credit: 230,506,401
RAC: 157
United Kingdom
Message 1790225 - Posted: 25 May 2016, 9:11:56 UTC - in response to Message 1790124.  
Last modified: 25 May 2016, 9:22:11 UTC


So no problem for me with only one GPU but anyone using multiple GPUs should avoid using -cpu_lock if running AP and MB especially if using -hp as two or more tasks will be bound to the same CPU thread depending on number of GPUs in use and mix of AP and MB.

Thanks for issue investigation. But from your report I don't see from where your conclusion about -cpu_lock comes.
It seems -total_GPU_instances_num caused incompatibility, not -cpu_lock.
Please make this more clear.


The problem only occurs with a specific set of circumstances.

What I mean is -total_GPU_instances_num caused the issue but it would not have been an issue if I had not been using -cpu_lock because then the apps would not have tried to set the affinity so even with -total_GPU_instances_num in the command line with no_cpu_lock the affinity would have been random on any of the 8 possible threads.

What I mean is with just -total_GPU_instances_num in the MB cmd it would not have been used because -cpu_lock was not there. Then both apps would just be randomly assigned a thread.

However with -cpu_lock they are forced to the same thread by the AP and MB app because they are not using the same mutex. They cannot move because they only have one thread assigned. With -hp this may be worse because each app then has heightened priority. When running AP and MB for some reason the effect is that the AP task ends up with 0% and the MB with 100% of the thread. (I assume the app starts at 0 and work up as they find the thread is in use via the mutex)

However, if I replicate this by in task manager manually setting the affinity of two simultaneaous MB GPU tasks then they both are on the same thread with 50% (i.e what you would expect) each not 0 and 100.


i.e.

AP thread 0 only affinity 0% util of thread
MB thread 0 only affinity 100% util of thread

MB1 thread 0 only affinity 50% util of thread
MB2 thread 0 only affinity 50% util of thread

As the AP and MB apps use pretty much 100% of the thread then it seems unlikely that windows would assign them to the same thread so without -cpu_lock in place this would never normally occur.

However if you remove -total_GPU_instances_num from the MB command line both apps use the same mutex and the conflict never occurs and I assume if the AP app supported -total_GPU_instances_num then the conflict would not occur.

This is with an Nvidia GTX 970 - don't know if it applies to ATI if the SoG app is different.
GPU Users Group



ID: 1790225 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1790238 - Posted: 25 May 2016, 10:47:16 UTC - in response to Message 1790225.  

Thank you for report.
Indeed AP little outdated.
For correct operation with GPUs from the same vendor key -total_GPU_instances_num N should be omitted in case AP + MB mix used and -cpu_lock enabled.

For multi-vendor GPU host and mix of AP/MB correct CPUlock operation unsupported. This will be fixed in next AP rebuild.
ID: 1790238 · Report as offensive
Profile Questor Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 3 Sep 04
Posts: 471
Credit: 230,506,401
RAC: 157
United Kingdom
Message 1790316 - Posted: 25 May 2016, 18:03:19 UTC - in response to Message 1790238.  
Last modified: 25 May 2016, 18:16:22 UTC

Thank you for report.
Indeed AP little outdated.
For correct operation with GPUs from the same vendor key -total_GPU_instances_num N should be omitted in case AP + MB mix used and -cpu_lock enabled.

For multi-vendor GPU host and mix of AP/MB correct CPUlock operation unsupported. This will be fixed in next AP rebuild.


Two cases.

Yes, problem anyway for multiple GPUs where you need to use -total_GPU_instances_num because of no support in AP app for -total_GPU_instances_num

however also as is my case

Where single GPU in use do not put -total_GPU_instances_num N (it is not needed anyway if only single) when processing MB & AP otherwise it breaks what does work and just specify -instances_per_device N with -hp -cpu_lock
GPU Users Group



ID: 1790316 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1791640 - Posted: 29 May 2016, 6:41:05 UTC - in response to Message 1789201.  

no difference wether 1 or 4 monitors...

Then it seems only driver difference with working w/o lags system remains important. Worth to try downgrade from 36x.xx to 35x.xx driver.

Did it help with lags?
ID: 1791640 · Report as offensive
Profile Questor Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 3 Sep 04
Posts: 471
Credit: 230,506,401
RAC: 157
United Kingdom
Message 1791792 - Posted: 29 May 2016, 19:40:58 UTC
Last modified: 29 May 2016, 19:51:31 UTC

What are the units for -use_sleep_ex N and the default value of -use_sleep, please?

I have one machine which suffers from very bad screen lag which I have been largely ignoring up to now while sorting out my other machines.

http://setiathome.berkeley.edu/show_host_detail.php?hostid=6985897

The following is with a GTX570 running one task only on GPU.

With no command line the screen lag is very bad.

However, -use_sleep cannot be used with non GUPPI tasks.


With non GUPPI task.
--------------------

With cmdline = "-use_sleep" only, intially CPU activity on the SoG app is at 100% of a thread and the GPU RAM rises (current task is 302MB) but CPU quickly reduces to zero.
MSI Afterburner shows GPU Usage is 0%, BUS usage is 0% but GPU memory does not decrease.

Elapsed time continually increases but time remaing stays the same i.e. the task will never end.

I have tried -use_sleep_ex with various test values without much luck.

Suspending the task in this state or stopping BOINC and there is a long delay before the app closes down.



With GUPPI tasks.
-----------------

With cmdline = "-use_sleep" works OK.

GPU usage oscillates between 0 and 30% and FB usage between 0 and 20%. GPU memory at about 300MB

Elapsed and remaining work as normal.

Removing -use_sleep and screen lag is bad but GPU usage is at 98% and FB usage 37-70%.

I was previously using 353.62 driver and upgraded to 365.19 to see if it would help - it didn't.



Are there any other command line settings which might help to avoid using -use_sleep? Just using -period_iterations_num 300 does not help screen lag.
GPU Users Group



ID: 1791792 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1791822 - Posted: 29 May 2016, 20:58:54 UTC - in response to Message 1791792.  
Last modified: 29 May 2016, 21:16:13 UTC

please post links to results that used -period_iterations_num 300 and those that used -use_sleep.

Also try -period_iterations_num 500 -sbs 512

P.S. as debugging measure please try to run -use_sleep -v 6 on non-GUPPI task and send me stderr.txt from slot directory.

Default value for -use_leep's Sleep is 1. sleep_ex values can be from 0 to any reasonable positive integer number (sleep time in ms )
ID: 1791822 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : Number crunching : OpenCL MB v8.12 issues thread attempt 2


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.