Message boards :
Number crunching :
BOINC assigns device X - Problem
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5
Author | Message |
---|---|
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
Nope, random will result in randon hangs on ATi hosts under CPU load By "random" I mean - when all the "cores" are already used (i.e. 5th app starts when 4 already run on a 4 "core" CPU) - set affinity to "random"/any core. And as I understand (please correct if wrong) Hyperthreaded CPUs grouped in adjacement ID numbers, that is, 0+1 - first real core, 2+3 - second real core and so on. I think the same (yes). Applies also to AMD Modules in Bulldozer+ And even if the CPU have in fact real cores - this will probably "distribute" temperature/heating over different (non-adjacent) parts of the chip. If so, scheme I wrote before (first fill all even, then all odd, then repeat) will provide most even distrubution amongst CPU device. But apps end in not-predictable moments (e.g. some tasks end in 10-30 seconds (overflow)) and another app starts. Maybe (for test) make some simulation - feed the algorithm with random times (ending/starting apps) to see how it "distributes" over time (on different # of cores and different # of apps - e.g. 5, 6, 7 apps over 8 cores) Also the algorithm should work on CPU with odd # of cores (like mine AMD Athlon II X3 455) Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
This doesn't matter. When app process finishes its MutEx freed so next task's process will take it and hence take exactly same place in affinity map. That is, if CPU1-pined process finishes then next one will be pinned exactly to CPU1, not to CPU0 or CPU2.
And currently correct work not guarantied for such devices. I do separate checks for 2, 4, 8 and 16 cores only. Your device will have 11 affinity mask perhaps, that is, it misses 100 check (for 4 cores) and passes duo test (10). Please try to launch different numbers of app's instances (-total_GPU_instances_num N is mandatory for this test currently) and post what CPU# will be used for what config. BTW, I suppose such CPU devices with odd number of CPUs should be "all cores are real" kind? P.S. still don't get why random would be needed. Random would imply such config for example: 0,1,2,3,0,0 , that is, 3 processes pinned to the same 0 CPU, instead of 0,2,1,3,0,2 that more even distribution. For currently implemented version it will be 0,1,2,3,0,1 (if app knows that total instances num is 6). SETI apps news We're not gonna fight them. We're gonna transcend them. |
Juha Send message Joined: 7 Mar 04 Posts: 388 Credit: 1,857,738 RAC: 0 |
I would like to know max possible number of app in fly I don't have a GPU but I would expect it to be the same as in app_config or app_info. That is, reciprocal of the maximum number of tasks BOINC will run on one GPU of that particular type. In case of multi-GPU host your app already enumerates all the GPUs in the host and therefore knows how many there are. In case of multi-vendor multi-GPU host with possibly different gpu_usage it gets a bit harder... |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Does APP_INIT_DATA::gpu_usage work? Not quite. gpu_usage is an application-specific (strictly, app_version-specific) value. It should be available through the app_init_data::struct [*], but it will only tell you how many of "this application" are supposed to share each GPU. If I'm using my other GPU to run a different app, or a different project, that will be invisible to this test. But [total number of GPUs]/[gpu_usage] should give a host maximum limit for the application. * yes, it's in the struct. And the actual init_data.xml file contains the value derived from the active combination of app_info and app_config. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Thanks, this sounds plaucible alternative to inner counter (better or not - not sure so far). From the other side, currently only ATi builds have CPUlock enabled by default. Worth to check if NV would run better (or worse?) being pinned to partcular core (on CPU-busy system that usual default config represents). SETI apps news We're not gonna fight them. We're gonna transcend them. |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
Also the algorithm should work on CPU with odd # of cores (like mine AMD Athlon II X3 455) In 5 identical directories I put: MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe MultiBeam_Kernels_r3500.cl libfftw3f-3-3-4_x86.dll r3500_AMDAthlontmIIX3455Processor_x86.wisdom *.bin* work_unit.sah (= PG1327_v7.wu) mb_cmdline.txt Before every round of tests I do: Del /S result.sah state.sah boinc_*. Start the .exe files manually in ~30 seconds intervals. Monitoring in Process Lasso (I select/mark already running to easy see the next app started) I also see in SIV the initial CPU load (for ~20 s) on the allocated CPU core or all cores if app failed to choose a core. AMD Athlon(tm) II X3 455 Processor [Family 16 Model 5 Stepping 3] Number of processors 3 AMD ATI Radeon HD 6570 (NI TURKS) {ASUS EAH6570/DI/1GD3(LP)} (1024MB) driver: 1.4.1646 Windows XP http://setiathome.berkeley.edu/show_host_detail.php?hostid=4832843 First results: A) Not possible for me to run >3 tasks on "default": vRAM 1 GB, and also too much lag: (Empty mb_cmdline.txt) 0 1 2 (first started app use CPU0, second - CPU1, ...) Kill test (-: kill process by Process Lasso, +: start new app): -0 -1 = 2 + = 0 2 -2 = 0 + = 0 1 _________________________ B) To reduce vRAM usage & lag and allow for 5 apps: -period_iterations_num 500 -sbs 8 (I see -sbs 8 act the same as -sbs 32) ("0-2": "All cores" as shown by Process Lasso, no "CPU-pin") 0 1 2 0-2 0 Kill 0-2 = 0 1 2 0 + = 0 1 2 0 0-2 0 slot of 64 used for this instance; total_GPU_instances_num=64 Info: CPU affinity mask used: 1; system mask is 7 1 slot of 64 used for this instance; total_GPU_instances_num=64 Info: CPU affinity mask used: 2; system mask is 7 2 slot of 64 used for this instance; total_GPU_instances_num=64 Info: CPU affinity mask used: 4; system mask is 7 3 slot of 64 used for this instance; total_GPU_instances_num=64 Info: CPU affinity mask used: 0; system mask is 7 4 slot of 64 used for this instance; total_GPU_instances_num=64 Info: CPU affinity mask used: 1; system mask is 7 Now, which N do you want to see for -total_GPU_instances_num N ? Maybe 2, 3, 4, 5 Â Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.