Message boards :
Number crunching :
BOINC assigns device X - Problem
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Unfortunately, in current implementation this required -total_GPU_instances N to be set. So, not for default/stock config. Is there any way to query BOINC how many GPU-based tasks in fly? SETI apps news We're not gonna fight them. We're gonna transcend them. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Unfortunately, in current implementation this required -total_GPU_instances N to be set. So, not for default/stock config. My initial answer would be "probably not", but I think that's a good question we could usefully brainstorm around the other developers - both for SETI, and for other projects. What would you like to know? 1) Other SETI tasks running on this GPU 2) Other SETI tasks running on different GPUs 3) Other project tasks running on different GPUs 4) Other project tasks running on the same GPU as SETI (I'm usually in mode (3) in that list) Does it make any difference if the SETI tasks are MB or Astropulse? Does it make any difference if the other project tasks are CIDA or OpenCL? |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Well, for particular task of managing CPU affinity I would need to know number of SETI OpenCL GPU tasks running on particular host. Cause other projects don't implement same affinity managing their number of GPU taks in fly will not help. But even to know total GPU tasks in fly number along all projects will help enough cause most often BOINC schedule tasks from same project. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
The nearest I know is this, but it doesn't specifically identify GPU tasks. I think we could reasonably ask for device information to be added. D:\BOINC>boinccmd --get_simple_gui_info [big snip] ======== Tasks ======== 1) ----------- name: e9s19_e6s9p0f497-GIANNI_A2A-0-1-RND9600_0 WU name: e9s19_e6s9p0f497-GIANNI_A2A-0-1-RND9600 project URL: http://www.gpugrid.net/ report deadline: Tue Sep 06 20:58:31 2016 ready to report: no got server ack: no final CPU time: 0.000000 state: downloaded scheduler state: scheduled exit_status: 0 signal: 0 suspended via GUI: no active_task_state: EXECUTING app version num: 848 checkpoint CPU time: 6217.638000 current CPU time: 6219.962000 fraction done: 0.649856 swap size: 370 MB working set size: 331 MB estimated CPU time remaining: 35662.333506 2) ----------- name: wu_sf3_DS-11x271_Grp350195of614400_0 WU name: wu_sf3_DS-11x271_Grp350195of614400 project URL: http://numberfields.asu.edu/NumberFields/ report deadline: Thu Sep 08 23:46:12 2016 ready to report: no got server ack: no final CPU time: 0.000000 state: downloaded scheduler state: scheduled exit_status: 0 signal: 0 suspended via GUI: no active_task_state: EXECUTING app version num: 212 checkpoint CPU time: 19524.690000 current CPU time: 19555.110000 fraction done: 0.550761 swap size: 290 MB working set size: 8 MB estimated CPU time remaining: 14101.334770 3) ----------- name: PM0155_03481_14_0 WU name: PM0155_03481_14 project URL: http://einstein.phys.uwm.edu/ report deadline: Thu Sep 15 23:44:11 2016 ready to report: no got server ack: no final CPU time: 0.000000 state: downloaded scheduler state: scheduled exit_status: 0 signal: 0 suspended via GUI: no active_task_state: EXECUTING app version num: 152 checkpoint CPU time: 119.996000 current CPU time: 120.183200 fraction done: 0.189805 swap size: 350 MB working set size: 256 MB estimated CPU time remaining: 24732.876829 4) ----------- name: wu_sf3_DS-11x271_Grp362860of614400_0 WU name: wu_sf3_DS-11x271_Grp362860of614400 project URL: http://numberfields.asu.edu/NumberFields/ report deadline: Fri Sep 09 02:37:39 2016 ready to report: no got server ack: no final CPU time: 0.000000 state: downloaded scheduler state: scheduled exit_status: 0 signal: 0 suspended via GUI: no active_task_state: EXECUTING app version num: 212 checkpoint CPU time: 4546.540000 current CPU time: 4601.671000 fraction done: 0.512654 swap size: 290 MB working set size: 8 MB estimated CPU time remaining: 15297.463789 5) ----------- name: 11ja09ab.19590.25021.11.38.104_0 WU name: 11ja09ab.19590.25021.11.38.104 project URL: http://setiathome.berkeley.edu/ report deadline: Mon Oct 24 05:21:30 2016 ready to report: no got server ack: no final CPU time: 0.000000 state: downloaded scheduler state: scheduled exit_status: 0 signal: 0 suspended via GUI: no active_task_state: EXECUTING app version num: 800 checkpoint CPU time: 531.339400 current CPU time: 550.683500 fraction done: 0.806055 swap size: 123 MB working set size: 106 MB estimated CPU time remaining: 214.210100 |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
And any API-based access to the same info? To parse another text output not exactly I would like to do :) LoL, only single SETI task in all list? ;) SETI apps news We're not gonna fight them. We're gonna transcend them. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
And any API-based access to the same info? To parse another text output not exactly I would like to do :) I'll have to poke around for that. There will certainly be a GUI RPC version - that's how the manager displays running tasks, and that version includes GPU data. But you don't want to add a TCP/IP stack to your apps, do you? That was the whole point of moving the comms from the science application in SETI Classic, to middleware under BOINC. Can't spend much time on this today - got to prepare for a long weekend journey, and I'm busy Monday and Tuesday next week as well. Back to normal by Wednesday, if all goes well. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
LoL, only single SETI task in all list? ;) There would be two, if running SoG two-up on the GTX 750 Ti didn't steal another CPU core from Numberfields :P You would have seen five if I'd shown that during WOW! last week - single SoG on GTX 970, dual cuda50 on GTX 750 Ti, two AVX CPU. One CPU core reserved for daily-driver use (15 browser tabs and remote monitoring of the other six machines), and one CPU core surrendered to SoG. Einstein intel_gpu was running throughout. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Check -use_sleep -high_prec_timer options with r3500+ builds. Some reports show improvement in this area. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Check -use_sleep -high_prec_timer options with r3500+ builds. Some reports show improvement in this area. I'll give that a try when I'm back in full circulation after Tuesday. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
This is not good. I just started to check my overnight SoG testing results, and found that all my Stderr output is almost completely filled with: <core_client_version>7.6.22</core_client_version> <![CDATA[ <stderr_txt> PU: icfft==144015 GPU: icfft=144015 CPU: icfft==144016 GPU: icfft=144016 CPU: icfft==144017 GPU: icfft=144017 CPU: icfft==144018 GPU: icfft=144018 CPU: icfft==144019 GPU: icfft=144019 CPU: icfft==144020 GPU: icfft=144020 ... and on and on and on. All the useful information, at least for me, has apparently been chopped off at the beginning due to the Stderr size limitation. Is there any way to shut this output off? I had hoped to try some other tuning parameters today, but will now have to revert to the earlier version of r3500 and wait for readable results with the current parameters. |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Yea, I ran it for about an hour and got the same result so when back to the original SoG r3500 |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
It's r3525 ? Await next one... SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
It's r3525 ? Await next one... Or it's just about temp version to prove concept of fix?Should I rebuild 3525 or not?? SETI apps news We're not gonna fight them. We're gonna transcend them. |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
It's r3525 ? Await next one... No, it was the revised r3500 to correct which device was showing in the stderr. The last post here was before you released r3522. Since then you have pulled it and r3522. Have not yet tried v3525. |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
Well, for particular task of managing CPU affinity I would need to know number of SETI OpenCL GPU tasks running on particular host. I think your apps already "talk" to each other (?) (which can be used to count them) Else how one instance knows which CPUs are currently "in use" (per CPU affinity mask of other running) to select the "next free" CPU for own CPU affinity mask? Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Well, for particular task of managing CPU affinity I would need to know number of SETI OpenCL GPU tasks running on particular host. They share MutExes. I would like to know max possible number of app in fly, not how many run currently or what ID in sequence current one has (that MutEx provide). I could count instances in fly number (with additional code but it should be possible cause some "copylefted" apps refuse to start if N instances already running. But this would make logic harder. Consider one has 4 cores and already 1 instance running - where next should go? Currently in case of total instances of 2 it will go to CPU 2. But if total num would be 3 or 4 or more it should go to CPU 1 (CPU 0 busy with first instance). With counting I would need to fill even CPUs then switch to odd ones then switch to even again. Doable but not as easy as current solution. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Juha Send message Joined: 7 Mar 04 Posts: 388 Credit: 1,857,738 RAC: 0 |
I would like to know max possible number of app in fly Does APP_INIT_DATA::gpu_usage work? |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
I would like to know max possible number of app in fly And what it should represent? SETI apps news We're not gonna fight them. We're gonna transcend them. |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
fill even CPUs then switch to odd ones then switch to even again If the current instance knows mask of already running like (for 8 "cores"): 10001001 -> 10 00 10 01 00110100 -> 00 11 01 00 10100010 -> 10 10 00 10 - group like above by 2 bits [ maybe by (mask & (3 << groupN*2)) ] Select first group which is 00 Else select first group which is 01 or 10 Else all groups are 11 - select random? Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Nope, random will result in randon hangs on ATi hosts under CPU load - it's already passed stage. App should be fixed just to one CPU to avoid such hangs on ATi. And as I understand (please correct if wrong) Hyperthreaded CPUs grouped in adjacement ID numbers, that is, 0+1 - first real core, 2+3 - second real core and so on. If so, scheme I wrote before (first fill all even, then all odd, then repeat) will provide most even distrubution amongst CPU device. SETI apps news We're not gonna fight them. We're gonna transcend them. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.