BOINC assigns device X - Problem

Message boards : Number crunching : BOINC assigns device X - Problem
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1814579 - Posted: 2 Sep 2016, 8:16:44 UTC - in response to Message 1814578.  

Unfortunately, in current implementation this required -total_GPU_instances N to be set. So, not for default/stock config.

Is there any way to query BOINC how many GPU-based tasks in fly?
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1814579 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1814583 - Posted: 2 Sep 2016, 8:31:42 UTC - in response to Message 1814579.  

Unfortunately, in current implementation this required -total_GPU_instances N to be set. So, not for default/stock config.

Is there any way to query BOINC how many GPU-based tasks in fly?

My initial answer would be "probably not", but I think that's a good question we could usefully brainstorm around the other developers - both for SETI, and for other projects.

What would you like to know?

1) Other SETI tasks running on this GPU
2) Other SETI tasks running on different GPUs
3) Other project tasks running on different GPUs
4) Other project tasks running on the same GPU as SETI

(I'm usually in mode (3) in that list)

Does it make any difference if the SETI tasks are MB or Astropulse?
Does it make any difference if the other project tasks are CIDA or OpenCL?
ID: 1814583 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1814584 - Posted: 2 Sep 2016, 8:38:59 UTC - in response to Message 1814583.  
Last modified: 2 Sep 2016, 8:40:52 UTC

Well, for particular task of managing CPU affinity I would need to know number of SETI OpenCL GPU tasks running on particular host.
Cause other projects don't implement same affinity managing their number of GPU taks in fly will not help. But even to know total GPU tasks in fly number along all projects will help enough cause most often BOINC schedule tasks from same project.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1814584 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1814585 - Posted: 2 Sep 2016, 8:40:36 UTC - in response to Message 1814583.  

The nearest I know is this, but it doesn't specifically identify GPU tasks. I think we could reasonably ask for device information to be added.

D:\BOINC>boinccmd --get_simple_gui_info

[big snip]
======== Tasks ========
1) -----------
   name: e9s19_e6s9p0f497-GIANNI_A2A-0-1-RND9600_0
   WU name: e9s19_e6s9p0f497-GIANNI_A2A-0-1-RND9600
   project URL: http://www.gpugrid.net/
   report deadline: Tue Sep 06 20:58:31 2016
   ready to report: no
   got server ack: no
   final CPU time: 0.000000
   state: downloaded
   scheduler state: scheduled
   exit_status: 0
   signal: 0
   suspended via GUI: no
   active_task_state: EXECUTING
   app version num: 848
   checkpoint CPU time: 6217.638000
   current CPU time: 6219.962000
   fraction done: 0.649856
   swap size: 370 MB
   working set size: 331 MB
   estimated CPU time remaining: 35662.333506
2) -----------
   name: wu_sf3_DS-11x271_Grp350195of614400_0
   WU name: wu_sf3_DS-11x271_Grp350195of614400
   project URL: http://numberfields.asu.edu/NumberFields/
   report deadline: Thu Sep 08 23:46:12 2016
   ready to report: no
   got server ack: no
   final CPU time: 0.000000
   state: downloaded
   scheduler state: scheduled
   exit_status: 0
   signal: 0
   suspended via GUI: no
   active_task_state: EXECUTING
   app version num: 212
   checkpoint CPU time: 19524.690000
   current CPU time: 19555.110000
   fraction done: 0.550761
   swap size: 290 MB
   working set size: 8 MB
   estimated CPU time remaining: 14101.334770
3) -----------
   name: PM0155_03481_14_0
   WU name: PM0155_03481_14
   project URL: http://einstein.phys.uwm.edu/
   report deadline: Thu Sep 15 23:44:11 2016
   ready to report: no
   got server ack: no
   final CPU time: 0.000000
   state: downloaded
   scheduler state: scheduled
   exit_status: 0
   signal: 0
   suspended via GUI: no
   active_task_state: EXECUTING
   app version num: 152
   checkpoint CPU time: 119.996000
   current CPU time: 120.183200
   fraction done: 0.189805
   swap size: 350 MB
   working set size: 256 MB
   estimated CPU time remaining: 24732.876829
4) -----------
   name: wu_sf3_DS-11x271_Grp362860of614400_0
   WU name: wu_sf3_DS-11x271_Grp362860of614400
   project URL: http://numberfields.asu.edu/NumberFields/
   report deadline: Fri Sep 09 02:37:39 2016
   ready to report: no
   got server ack: no
   final CPU time: 0.000000
   state: downloaded
   scheduler state: scheduled
   exit_status: 0
   signal: 0
   suspended via GUI: no
   active_task_state: EXECUTING
   app version num: 212
   checkpoint CPU time: 4546.540000
   current CPU time: 4601.671000
   fraction done: 0.512654
   swap size: 290 MB
   working set size: 8 MB
   estimated CPU time remaining: 15297.463789
5) -----------
   name: 11ja09ab.19590.25021.11.38.104_0
   WU name: 11ja09ab.19590.25021.11.38.104
   project URL: http://setiathome.berkeley.edu/
   report deadline: Mon Oct 24 05:21:30 2016
   ready to report: no
   got server ack: no
   final CPU time: 0.000000
   state: downloaded
   scheduler state: scheduled
   exit_status: 0
   signal: 0
   suspended via GUI: no
   active_task_state: EXECUTING
   app version num: 800
   checkpoint CPU time: 531.339400
   current CPU time: 550.683500
   fraction done: 0.806055
   swap size: 123 MB
   working set size: 106 MB
   estimated CPU time remaining: 214.210100
ID: 1814585 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1814586 - Posted: 2 Sep 2016, 8:42:46 UTC - in response to Message 1814585.  
Last modified: 2 Sep 2016, 8:45:42 UTC

And any API-based access to the same info? To parse another text output not exactly I would like to do :)

LoL, only single SETI task in all list? ;)
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1814586 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1814588 - Posted: 2 Sep 2016, 8:51:13 UTC - in response to Message 1814586.  

And any API-based access to the same info? To parse another text output not exactly I would like to do :)

I'll have to poke around for that. There will certainly be a GUI RPC version - that's how the manager displays running tasks, and that version includes GPU data. But you don't want to add a TCP/IP stack to your apps, do you? That was the whole point of moving the comms from the science application in SETI Classic, to middleware under BOINC.

Can't spend much time on this today - got to prepare for a long weekend journey, and I'm busy Monday and Tuesday next week as well. Back to normal by Wednesday, if all goes well.
ID: 1814588 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1814590 - Posted: 2 Sep 2016, 8:57:54 UTC - in response to Message 1814586.  

LoL, only single SETI task in all list? ;)

There would be two, if running SoG two-up on the GTX 750 Ti didn't steal another CPU core from Numberfields :P

You would have seen five if I'd shown that during WOW! last week - single SoG on GTX 970, dual cuda50 on GTX 750 Ti, two AVX CPU. One CPU core reserved for daily-driver use (15 browser tabs and remote monitoring of the other six machines), and one CPU core surrendered to SoG. Einstein intel_gpu was running throughout.
ID: 1814590 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1814597 - Posted: 2 Sep 2016, 9:45:49 UTC - in response to Message 1814590.  
Last modified: 2 Sep 2016, 9:48:25 UTC

Check -use_sleep -high_prec_timer options with r3500+ builds. Some reports show improvement in this area.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1814597 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1814604 - Posted: 2 Sep 2016, 10:07:46 UTC - in response to Message 1814597.  

Check -use_sleep -high_prec_timer options with r3500+ builds. Some reports show improvement in this area.

I'll give that a try when I'm back in full circulation after Tuesday.
ID: 1814604 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1814667 - Posted: 2 Sep 2016, 16:47:44 UTC

This is not good. I just started to check my overnight SoG testing results, and found that all my Stderr output is almost completely filled with:
<core_client_version>7.6.22</core_client_version>
<![CDATA[
<stderr_txt>
PU: icfft==144015
GPU: icfft=144015

CPU: icfft==144016
GPU: icfft=144016

CPU: icfft==144017
GPU: icfft=144017

CPU: icfft==144018
GPU: icfft=144018

CPU: icfft==144019
GPU: icfft=144019

CPU: icfft==144020
GPU: icfft=144020

... and on and on and on.

All the useful information, at least for me, has apparently been chopped off at the beginning due to the Stderr size limitation. Is there any way to shut this output off? I had hoped to try some other tuning parameters today, but will now have to revert to the earlier version of r3500 and wait for readable results with the current parameters.
ID: 1814667 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1814668 - Posted: 2 Sep 2016, 16:49:07 UTC - in response to Message 1814667.  

Yea, I ran it for about an hour and got the same result so when back to the original SoG r3500
ID: 1814668 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1814796 - Posted: 3 Sep 2016, 10:05:55 UTC

It's r3525 ? Await next one...
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1814796 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1814797 - Posted: 3 Sep 2016, 10:12:11 UTC - in response to Message 1814796.  

It's r3525 ? Await next one...

Or it's just about temp version to prove concept of fix?Should I rebuild 3525 or not??
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1814797 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1814812 - Posted: 3 Sep 2016, 13:21:09 UTC - in response to Message 1814796.  

It's r3525 ? Await next one...


No, it was the revised r3500 to correct which device was showing in the stderr.

The last post here was before you released r3522.

Since then you have pulled it and r3522.

Have not yet tried v3525.
ID: 1814812 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1815154 - Posted: 5 Sep 2016, 11:24:14 UTC - in response to Message 1814584.  

Well, for particular task of managing CPU affinity I would need to know number of SETI OpenCL GPU tasks running on particular host.

I think your apps already "talk" to each other (?) (which can be used to count them)
Else how one instance knows which CPUs are currently "in use" (per CPU affinity mask of other running) to select the "next free" CPU for own CPU affinity mask?
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1815154 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1815200 - Posted: 5 Sep 2016, 16:29:04 UTC - in response to Message 1815154.  

Well, for particular task of managing CPU affinity I would need to know number of SETI OpenCL GPU tasks running on particular host.

I think your apps already "talk" to each other (?) (which can be used to count them)
Else how one instance knows which CPUs are currently "in use" (per CPU affinity mask of other running) to select the "next free" CPU for own CPU affinity mask?

They share MutExes. I would like to know max possible number of app in fly, not how many run currently or what ID in sequence current one has (that MutEx provide).

I could count instances in fly number (with additional code but it should be possible cause some "copylefted" apps refuse to start if N instances already running.
But this would make logic harder.
Consider one has 4 cores and already 1 instance running - where next should go?
Currently in case of total instances of 2 it will go to CPU 2. But if total num would be 3 or 4 or more it should go to CPU 1 (CPU 0 busy with first instance).

With counting I would need to fill even CPUs then switch to odd ones then switch to even again. Doable but not as easy as current solution.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1815200 · Report as offensive
Juha
Volunteer tester

Send message
Joined: 7 Mar 04
Posts: 388
Credit: 1,857,738
RAC: 0
Finland
Message 1815220 - Posted: 5 Sep 2016, 18:21:15 UTC - in response to Message 1815200.  

I would like to know max possible number of app in fly


Does APP_INIT_DATA::gpu_usage work?
ID: 1815220 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1815232 - Posted: 5 Sep 2016, 19:17:20 UTC - in response to Message 1815220.  

I would like to know max possible number of app in fly


Does APP_INIT_DATA::gpu_usage work?

And what it should represent?
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1815232 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1815329 - Posted: 6 Sep 2016, 7:26:45 UTC - in response to Message 1815200.  

fill even CPUs then switch to odd ones then switch to even again

If the current instance knows mask of already running like (for 8 "cores"):
10001001 -> 10 00 10 01
00110100 -> 00 11 01 00
10100010 -> 10 10 00 10

- group like above by 2 bits [ maybe by (mask & (3 << groupN*2)) ]

Select first group which is 00
Else select first group which is 01 or 10
Else all groups are 11 - select random?
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1815329 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1815349 - Posted: 6 Sep 2016, 9:21:25 UTC - in response to Message 1815329.  
Last modified: 6 Sep 2016, 9:26:01 UTC

Nope, random will result in randon hangs on ATi hosts under CPU load - it's already passed stage. App should be fixed just to one CPU to avoid such hangs on ATi.

And as I understand (please correct if wrong) Hyperthreaded CPUs grouped in adjacement ID numbers, that is, 0+1 - first real core, 2+3 - second real core and so on.
If so, scheme I wrote before (first fill all even, then all odd, then repeat) will provide most even distrubution amongst CPU device.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1815349 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : BOINC assigns device X - Problem


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.