Issues with 1 GPU on Penta Nano System

Message boards : Number crunching : Issues with 1 GPU on Penta Nano System
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1826461 - Posted: 24 Oct 2016, 11:55:13 UTC
Last modified: 24 Oct 2016, 12:32:13 UTC

My main cruncher 7953787 has been running with 5 R9 Nano GPUs for a while with no issue, but today I discovered that one of the GPUs is taking much longer to process tasks than the others. Typical GUPPI processing time has been ~5min, but this card is taking up to 25 minutes recently. If I look at GPU loading, this one is mostly zero with sporadic spikes of loading. Other 4 cards are fine. It is still producing valid results. Any ideas?
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1826461 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1826463 - Posted: 24 Oct 2016, 12:12:08 UTC - in response to Message 1826461.  
Last modified: 24 Oct 2016, 12:16:18 UTC

Seems all >1ks ATi results had
2 slot of 64 used for this instance; total_GPU_instances_num=5
Info: BOINC provided OpenCL device ID used
Info: CPU affinity mask used: 10; system mask is ff

Check what else run on that CPU.
EDIT: in the light of recent findings I would consider that 8-thread FX as 4-cores only so it can sustain only 4 GPU devices w/o resources overlapping.
EDIT2: though it doesn't explain why only single affinity mask affected (should be 2 of them).
Check all longer than usual results - if they all have affinity mask of 10?
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1826463 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1826464 - Posted: 24 Oct 2016, 12:13:09 UTC

I posted this a few times recently Rick.
Your FX CPU only has 4 FPU`s, so when you`re running 5 GPU tasks your CPU is running low on ressources.
As soon the CPU has no ressources left the app goes into stall until ressources are available again.

The only solution is either reduce number of GPU tasks or using -use_sleep but this will reduce output as well.


With each crime and every kindness we birth our future.
ID: 1826464 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1826465 - Posted: 24 Oct 2016, 12:19:35 UTC - in response to Message 1826461.  

And please re-enable normal output (verbose=1). Lack of counters output diminishes chances for debug.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1826465 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1826467 - Posted: 24 Oct 2016, 12:29:04 UTC

And here is the bug:
BOINC assigns device 2
4 slot of 64 used for this instance; total_GPU_instances_num=5
Info: BOINC provided OpenCL device ID used
Info: CPU affinity mask used: 0; system mask is ff
With such mask no CPUs allowed at all.
Perhaps Windows API reacts on such mask as it would be 0xff instead allowing ALL CPUs.
Please check affinity of all active tasks on that host - each of them is pinned to only single CPU or some have few available CPUs?
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1826467 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1826468 - Posted: 24 Oct 2016, 12:29:12 UTC - in response to Message 1826464.  

I posted this a few times recently Rick.
Your FX CPU only has 4 FPU`s, so when you`re running 5 GPU tasks your CPU is running low on ressources.
As soon the CPU has no ressources left the app goes into stall until ressources are available again.

The only solution is either reduce number of GPU tasks or using -use_sleep but this will reduce output as well.


Hi Mike, After seeing your posts on it, that was my first thought. I typically run with 1.4 cores per task leaving 2 cores to run CPU tasks, so I increased to 1.6, so that no CPU tasks were running and it made no difference. Also, I looked through my valid tasks and this seems to be happening for the last couple of days. But that is only from manually view valid tasks, so it might not be accurate. Can't wait to upgrade this to Zen to eliminate that concern.
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1826468 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1826469 - Posted: 24 Oct 2016, 12:31:04 UTC - in response to Message 1826465.  

And please re-enable normal output (verbose=1). Lack of counters output diminishes chances for debug.

I just made the change.
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1826469 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1826471 - Posted: 24 Oct 2016, 12:35:41 UTC - in response to Message 1826467.  

And here is the bug:
BOINC assigns device 2
4 slot of 64 used for this instance; total_GPU_instances_num=5
Info: BOINC provided OpenCL device ID used
Info: CPU affinity mask used: 0; system mask is ff
With such mask no CPUs allowed at all.
Perhaps Windows API reacts on such mask as it would be 0xff instead allowing ALL CPUs.
Please check affinity of all active tasks on that host - each of them is pinned to only single CPU or some have few available CPUs?


Each MB process appears to switch between multiple CPU cores. Core 10 is only being used by MB processes and switches between the processes. Sometimes multiple MB process are using core 10.
ID: 1826471 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1826473 - Posted: 24 Oct 2016, 12:45:41 UTC - in response to Message 1826471.  

Better post screenshots of affinity tab for active GPU tasks.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1826473 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1826474 - Posted: 24 Oct 2016, 12:47:45 UTC - in response to Message 1826467.  
Last modified: 24 Oct 2016, 12:49:17 UTC

And here is the bug:
BOINC assigns device 2
4 slot of 64 used for this instance; total_GPU_instances_num=5
Info: BOINC provided OpenCL device ID used
Info: CPU affinity mask used: 0; system mask is ff
With such mask no CPUs allowed at all.
Perhaps Windows API reacts on such mask as it would be 0xff instead allowing ALL CPUs.
Please check affinity of all active tasks on that host - each of them is pinned to only single CPU or some have few available CPUs?


I found that running with -hp enabled isn`t a good idea with SOG builds on FX CPU`s.
Have checked some tasks and a few have more CPU time than run time.
That`s a clear indication of lack of CPU ressources.
I`m not sure cc_config is a good way to reduce CPU on FX`s.
I reduce via Boinc manager this works perfectly.


With each crime and every kindness we birth our future.
ID: 1826474 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1826479 - Posted: 24 Oct 2016, 13:02:26 UTC - in response to Message 1826473.  

Better post screenshots of affinity tab for active GPU tasks.

I took a screen record of it running, but need to wait for another video I am processing to finish. In the meantime, I will try running with -no_cpu_lock and see what happens.
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1826479 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1826484 - Posted: 24 Oct 2016, 13:38:06 UTC - in response to Message 1826473.  

Better post screenshots of affinity tab for active GPU tasks.


Here is the system running Arecibo task: https://flic.kr/p/MyuFKB

Here it is running a GUPPI: https://flic.kr/p/NoJDLi
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1826484 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1826488 - Posted: 24 Oct 2016, 14:02:50 UTC - in response to Message 1826484.  

No. What I need is:
http://clip2net.com/s/3DCFYVb
for all 5 tasks.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1826488 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1826493 - Posted: 24 Oct 2016, 14:29:04 UTC - in response to Message 1826488.  

No. What I need is:
http://clip2net.com/s/3DCFYVb
for all 5 tasks.


The background looks like task manager, but I am not sure what the dialog box in the foreground is. How do I display that information?
ID: 1826493 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1826494 - Posted: 24 Oct 2016, 14:30:11 UTC - in response to Message 1826479.  

Better post screenshots of affinity tab for active GPU tasks.

I took a screen record of it running, but need to wait for another video I am processing to finish. In the meantime, I will try running with -no_cpu_lock and see what happens.
Looks like -no_cpu_lock made no difference.
ID: 1826494 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1826518 - Posted: 24 Oct 2016, 16:31:52 UTC - in response to Message 1826493.  

No. What I need is:
http://clip2net.com/s/3DCFYVb
for all 5 tasks.


The background looks like task manager, but I am not sure what the dialog box in the foreground is. How do I display that information?

Task manager -> Right click on process line-> Set affinity...
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1826518 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1826523 - Posted: 24 Oct 2016, 16:50:34 UTC - in response to Message 1826493.  

No. What I need is:
http://clip2net.com/s/3DCFYVb
for all 5 tasks.


The background looks like task manager, but I am not sure what the dialog box in the foreground is. How do I display that information?

He needs "Processor Affinity" (some programs show it as "CPU Affinity" or just "Affinity")
Some show cores as 0 1 2 3 ... and some 1 2 3 4 ...

My screenshots from:

Process Lasso
https://bitsum.com/

System Explorer
http://systemexplorer.net/








I used this site for the above pictures (which are saved as PNG to avoid any quality loss):
https://postimage.org/
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1826523 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1826528 - Posted: 24 Oct 2016, 17:16:15 UTC - in response to Message 1826523.  

Thanks, BilBg. These tools better suited for showing what I wanted to see indeed.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1826528 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1826533 - Posted: 24 Oct 2016, 17:32:21 UTC

It works with taskmanager also.




With each crime and every kindness we birth our future.
ID: 1826533 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1826544 - Posted: 24 Oct 2016, 18:18:50 UTC - in response to Message 1826533.  

It works with taskmanager also.

Sure but only single task per screenshot. And we need to see all 5 of them.
I'm afraid 5th task will not pinned. Will see. Unfortunately, such mighty hosts ignore beta testing :/ and change in affinity handling was done between v8.12 and v8.19...
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1826544 · Report as offensive
1 · 2 · 3 · Next

Message boards : Number crunching : Issues with 1 GPU on Penta Nano System


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.