BOINC assigns device X - Problem

Message boards : Number crunching : BOINC assigns device X - Problem
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5

AuthorMessage
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1815394 - Posted: 6 Sep 2016, 19:47:54 UTC - in response to Message 1815349.  
Last modified: 6 Sep 2016, 19:53:52 UTC

Nope, random will result in randon hangs on ATi hosts under CPU load

By "random" I mean - when all the "cores" are already used (i.e. 5th app starts when 4 already run on a 4 "core" CPU) - set affinity to "random"/any core.


And as I understand (please correct if wrong) Hyperthreaded CPUs grouped in adjacement ID numbers, that is, 0+1 - first real core, 2+3 - second real core and so on.

I think the same (yes). Applies also to AMD Modules in Bulldozer+
And even if the CPU have in fact real cores - this will probably "distribute" temperature/heating over different (non-adjacent) parts of the chip.


If so, scheme I wrote before (first fill all even, then all odd, then repeat) will provide most even distrubution amongst CPU device.

But apps end in not-predictable moments (e.g. some tasks end in 10-30 seconds (overflow)) and another app starts.
Maybe (for test) make some simulation - feed the algorithm with random times (ending/starting apps) to see how it "distributes" over time (on different # of cores and different # of apps - e.g. 5, 6, 7 apps over 8 cores)

Also the algorithm should work on CPU with odd # of cores (like mine AMD Athlon II X3 455)
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1815394 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1815397 - Posted: 6 Sep 2016, 20:02:02 UTC - in response to Message 1815394.  
Last modified: 6 Sep 2016, 20:09:01 UTC


But apps end in not-predictable moments (e.g. some tasks end in 10-30 seconds (overflow)) and another app starts.

This doesn't matter. When app process finishes its MutEx freed so next task's process will take it and hence take exactly same place in affinity map.
That is, if CPU1-pined process finishes then next one will be pinned exactly to CPU1, not to CPU0 or CPU2.


Also the algorithm should work on CPU with odd # of cores (like mine AMD Athlon II X3 455)

And currently correct work not guarantied for such devices.
I do separate checks for 2, 4, 8 and 16 cores only.
Your device will have 11 affinity mask perhaps, that is, it misses 100 check (for 4 cores) and passes duo test (10).

Please try to launch different numbers of app's instances (-total_GPU_instances_num N is mandatory for this test currently) and post what CPU# will be used for what config.

BTW, I suppose such CPU devices with odd number of CPUs should be "all cores are real" kind?

P.S. still don't get why random would be needed. Random would imply such config for example: 0,1,2,3,0,0 , that is, 3 processes pinned to the same 0 CPU, instead of 0,2,1,3,0,2 that more even distribution.
For currently implemented version it will be 0,1,2,3,0,1 (if app knows that total instances num is 6).
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1815397 · Report as offensive
Juha
Volunteer tester

Send message
Joined: 7 Mar 04
Posts: 388
Credit: 1,857,738
RAC: 0
Finland
Message 1815551 - Posted: 7 Sep 2016, 13:45:01 UTC - in response to Message 1815232.  

I would like to know max possible number of app in fly


Does APP_INIT_DATA::gpu_usage work?

And what it should represent?


I don't have a GPU but I would expect it to be the same as in app_config or app_info. That is, reciprocal of the maximum number of tasks BOINC will run on one GPU of that particular type. In case of multi-GPU host your app already enumerates all the GPUs in the host and therefore knows how many there are.

In case of multi-vendor multi-GPU host with possibly different gpu_usage it gets a bit harder...
ID: 1815551 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1815563 - Posted: 7 Sep 2016, 14:43:34 UTC - in response to Message 1815551.  

Does APP_INIT_DATA::gpu_usage work?

And what it should represent?

I don't have a GPU but I would expect it to be the same as in app_config or app_info. That is, reciprocal of the maximum number of tasks BOINC will run on one GPU of that particular type. In case of multi-GPU host your app already enumerates all the GPUs in the host and therefore knows how many there are.

In case of multi-vendor multi-GPU host with possibly different gpu_usage it gets a bit harder...

Not quite. gpu_usage is an application-specific (strictly, app_version-specific) value. It should be available through the app_init_data::struct [*], but it will only tell you how many of "this application" are supposed to share each GPU. If I'm using my other GPU to run a different app, or a different project, that will be invisible to this test. But [total number of GPUs]/[gpu_usage] should give a host maximum limit for the application.

* yes, it's in the struct. And the actual init_data.xml file contains the value derived from the active combination of app_info and app_config.
ID: 1815563 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1815570 - Posted: 7 Sep 2016, 15:16:33 UTC

Thanks, this sounds plaucible alternative to inner counter (better or not - not sure so far). From the other side, currently only ATi builds have CPUlock enabled by default. Worth to check if NV would run better (or worse?) being pinned to partcular core (on CPU-busy system that usual default config represents).
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1815570 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1815720 - Posted: 8 Sep 2016, 11:47:59 UTC - in response to Message 1815397.  

Also the algorithm should work on CPU with odd # of cores (like mine AMD Athlon II X3 455)

And currently correct work not guarantied for such devices.
I do separate checks for 2, 4, 8 and 16 cores only.
Your device will have 11 affinity mask perhaps, that is, it misses 100 check (for 4 cores) and passes duo test (10).

Please try to launch different numbers of app's instances (-total_GPU_instances_num N is mandatory for this test currently) and post what CPU# will be used for what config.

In 5 identical directories I put:
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe
MultiBeam_Kernels_r3500.cl
libfftw3f-3-3-4_x86.dll

r3500_AMDAthlontmIIX3455Processor_x86.wisdom
*.bin*

work_unit.sah (= PG1327_v7.wu)
mb_cmdline.txt


Before every round of tests I do:
Del /S result.sah state.sah boinc_*.

Start the .exe files manually in ~30 seconds intervals.
Monitoring in Process Lasso (I select/mark already running to easy see the next app started)
I also see in SIV the initial CPU load (for ~20 s) on the allocated CPU core or all cores if app failed to choose a core.


AMD Athlon(tm) II X3 455 Processor [Family 16 Model 5 Stepping 3]
Number of processors 3
AMD ATI Radeon HD 6570 (NI TURKS) {ASUS EAH6570/DI/1GD3(LP)} (1024MB) driver: 1.4.1646
Windows XP
http://setiathome.berkeley.edu/show_host_detail.php?hostid=4832843


First results:
A) Not possible for me to run >3 tasks on "default": vRAM 1 GB, and also too much lag:
(Empty mb_cmdline.txt)
0 1 2 (first started app use CPU0, second - CPU1, ...)

Kill test (-: kill process by Process Lasso,  +: start new app):
-0 -1 = 2
+     = 0 2
-2    = 0
+     = 0 1

_________________________


B) To reduce vRAM usage & lag and allow for 5 apps:
 -period_iterations_num 500 -sbs 8 
(I see -sbs 8 act the same as -sbs 32)

("0-2": "All cores" as shown by Process Lasso, no "CPU-pin")

0 1 2 0-2 0

Kill 0-2 = 0 1 2 0 
+        = 0 1 2 0 0-2


0 slot of 64 used for this instance; total_GPU_instances_num=64
Info: CPU affinity mask used: 1; system mask is 7

1 slot of 64 used for this instance; total_GPU_instances_num=64
Info: CPU affinity mask used: 2; system mask is 7

2 slot of 64 used for this instance; total_GPU_instances_num=64
Info: CPU affinity mask used: 4; system mask is 7

3 slot of 64 used for this instance; total_GPU_instances_num=64
Info: CPU affinity mask used: 0; system mask is 7

4 slot of 64 used for this instance; total_GPU_instances_num=64
Info: CPU affinity mask used: 1; system mask is 7


Now, which N do you want to see for -total_GPU_instances_num N
? Maybe 2, 3, 4, 5
 
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1815720 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5

Message boards : Number crunching : BOINC assigns device X - Problem


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.