Message boards :
Number crunching :
BOINC assigns device X - Problem
Message board moderation
Author | Message |
---|---|
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
I remember that originally I could tell which graphics card a task ran on by looking at the summary of each task for this entry: BOINC assigns device 0 I did some of my early troubleshooting with this information, but now, every task says device 0. Maybe I messed something up when I tried to manually update to a new app version from Lunatics. Any recommendation on what could be wrong? It is this computer:http://setiathome.berkeley.edu/show_host_detail.php?hostid=7953787 Thanks! GitHub: Ricks-Lab Instagram: ricks_labs |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Most probably - new BOINC version that dropped command line --device N parameter support. SETI apps news We're not gonna fight them. We're gonna transcend them. |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
Most probably - new BOINC version that dropped command line --device N parameter support. I have not changed BOINC. Only major changes were to install Lunatics 0.44 using installer and later I tried to manually install MB app 3401. I messed things up and finally got it running again with 0.44 apps. I looked at stderr from another Fury user: Stderr output <core_client_version>7.6.32</core_client_version> <![CDATA[ <stderr_txt> Running on device number: 3 Defaults scaling is disabled, basic defaults will be used. Tuning on user's discretion. Number of app instances per device set to:1 Difference is that mine is missing the line: Running on device number: x Also, I am running client 7.6.22, where the sample above it 7.6.32. The output above is from rev 3430, where mine our 3330. Could it be an issue with the app? GitHub: Ricks-Lab Instagram: ricks_labs |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13720 Credit: 208,696,464 RAC: 304 |
Difference is that mine is missing the line: I'd say the application, as it is responsible for what is written in Stderr_ouput, however I don't know where the application gets the GPU information from; the system, BOINC or bits of both? Grant Darwin NT |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14649 Credit: 200,643,578 RAC: 874 |
Difference is that mine is missing the line: It's the role of BOINC to 'schedule' the GPU applications - give instructions for which task/application pairing should be run on which GPU. BOINC gave that instruction via a command line in the early versions (before OpenCL was fully supported), but we should have switched to reading init_data.xml several years ago - around December 2012, with v7.0.40: there hasn't been any change in that procedure in recent versions, so far as I can tell. The only other possibility is that RueiKe has introduced a --device 0 directive into one or other of the many places where manual configuration is possible via a manual edit. |
Harri Liljeroos Send message Joined: 29 May 99 Posts: 3988 Credit: 85,281,665 RAC: 126 |
I have the problem. Here http://setiathome.berkeley.edu/result.php?resultid=4994650280 is a task that was run today. It has the line BOINC assigns device 0but is missing the line Running on device number: 0And it was actually running on device 1 based on the reported compute units (4). This machine has two GPUs, device 0 = GTX970 and device 1 is GTX650Ti. The application used was SoG r3472 from Lunatics 0.45b3. Boinc is 7.6.22 on Win7 64 bit. |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
The only other possibility is that RueiKe has introduced a --device 0 directive into one or other of the many places where manual configuration is possible via a manual edit. I have checked the following places: mb_cmdline_win_x86_SSE2_OpenCL_ATi_HD5.txt mb_cmdline_win_x86_SSE2_OpenCL_ATi.txt both have this entry: -instances_per_device 1 -no_cpu_lock -sbs 975 -period_iterations_num 4 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -hp app_config.html <app_config> <app> <name>setiathome_v8</name> <gpu_versions> <gpu_usage>1</gpu_usage> <cpu_usage>1.6666</cpu_usage> </gpu_versions> </app> <app> <name>astropulse_v7</name> <gpu_versions> <gpu_usage>1</gpu_usage> <cpu_usage>1.6666</cpu_usage> </gpu_versions> </app> </app_config> cc_config.xml <cc_config> <options> <use_all_gpus>1</use_all_gpus> </options> </cc_config> I have used grep in almost all directories for \-device and found no suspicious occurrences. GitHub: Ricks-Lab Instagram: ricks_labs |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
remove -no_cpu_lock - will it change anything? SETI apps news We're not gonna fight them. We're gonna transcend them. |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
A quick look at your Inconclusives back May 7 it looks like the same thing. May 5 you have an AP task that does show running on device 0. Maybe your just thinking of different reporting with different apps. |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
remove -no_cpu_lock - will it change anything? I have removed it and do see a difference, though still not correct: With -no_cpu_lock BOINC assigns device 0 Info: BOINC provided OpenCL device ID used Without: BOINC assigns device 0 1 slot of 64 used for this instance Info: BOINC provided OpenCL device ID used I see the slot number differing between tasks but device is still always 0. Tasks are taking much longer now, so I need to add back -no_cpu_lock. I think 3430 is supposed to fix the issue with not using -no_cpu_lock so maybe I should try to upgrade to those apps. GitHub: Ricks-Lab Instagram: ricks_labs |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
A quick look at your Inconclusives back May 7 it looks like the same thing. This has been an issue for a while. I think I caused it when I attempted to install 3401 apps, but it could have happened when I installed Lunatics and just didn't notice. GitHub: Ricks-Lab Instagram: ricks_labs |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
May 7 you were using r3330 and it has the same output as now. EDIT: Scratch that comment, you still have r3330. |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
I have not added any option that says anything about what device number should be used for which device, but despite of that the commandline which can be seen in the Task Manager/Processes tab, always have "--device 0", after the filename. Thanks for this! I used task manager on two systems. My main desktop which just runs standard apps has --device 0 shown in the command line. The system in question doesn't have anything specified in the command line after the executable. GitHub: Ricks-Lab Instagram: ricks_labs |
[DPC] hansR Send message Joined: 14 Jul 00 Posts: 47 Credit: 235,829,569 RAC: 8 |
I think you should this as: iGPU - device 0 as being the first device of type iGPU and NVIDIA - device 0 as being the first device of type NVIDIA |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
I have upgraded to the r3430 MB app which doesn't require -no_cpu_lock to perform well, so I have now removed the option. I am getting slot numbers in the output but device is still always 0. GitHub: Ricks-Lab Instagram: ricks_labs |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
I just verified that AP tasks show the proposer device number: <core_client_version>7.6.22</core_client_version> <![CDATA[ <stderr_txt> Running on device number: 2 CPU affinity adjustment enabled, fixed CPU 6 will be used Number of app instances per device set to:1 Maximum single buffer size set to:768MB Priority of worker thread raised successfully Priority of process adjusted successfully, high priority class used OpenCL platform detected: Advanced Micro Devices, Inc. BOINC assigns device 2 Info: BOINC provided OpenCL device ID used Does this give a clue to the source of the problem? GitHub: Ricks-Lab Instagram: ricks_labs |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
I finally figured this out. I did a nearly clean install of BOINC by uninstalling BOINC and Lunatics, and removing the project directory. I then installed BOINC. It came up working fine, but my command line options were missing. I added them back and the problem returned. I found that putting a space after the last item in the command line fixed the problem. GitHub: Ricks-Lab Instagram: ricks_labs |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13720 Credit: 208,696,464 RAC: 304 |
I finally figured this out. I did a nearly clean install of BOINC by uninstalling BOINC and Lunatics, and removing the project directory. I then installed BOINC. It came up working fine, but my command line options were missing. I added them back and the problem returned. I found that putting a space after the last item in the command line fixed the problem. I added the command line settings not long after using the Lunatics Installer Beta v4 to install the current SoG application. I don't remember if i checked the Stderr_output before adding the command line settings, but I did notice that it always reports BOINC assigns device 0 regardless of which device it is when I did check them. I've just put a space at the end of my command line settings in mb_cmdline_win_x86_SSE3_OpenCL_NV_SoG.txt We'll see if that lets the correct device number be reported. Grant Darwin NT |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
I just noticed this happening today with SoG r3500. I've normally been running Cuda50 on my new host 8064262 which has 4 GTX 960s. However, I took some time today to switch a handful of tasks to SoG, intermixed with the Cuda tasks so I could get some comparisons between the two apps for tasks that were split from the same files and had matching ARs. I realized later, when I was pulling the results to put into my spreadsheet, that while the matching Cuda tasks showed as being distributed across all 4 GPUs, every one of the SoG tasks (all 11 of them) showed "BOINC assigns device 0". Pretty much a statistical impossibility, I think, although I didn't actually watch the tasks run, so I didn't see the true device numbers that each task really ran on. Just to be sure though, once I saw this thread resurface this evening, I went and swapped another group of four tasks over to SoG and this time recorded which GPU got which task, as follows: Task 5123149033 - Device 0 Task 5123193201 - Device 1 Task 5123193448 - Device 1 Task 5123193447 - Device 3 Here's a screenshot of the properties for that last one: The Stderr for every one shows "BOINC assigns device 0", making it impossible to know which GPU the task actually ran on. That makes it difficult to accurately match test results across app, GPU, and AR. Perhaps not such a big deal on this box, although even with 4 GTX 960s, there are 3 different clock speeds involved. However, if I wanted to do something similar on my host with a GTX 670, GTX 780, and GTX 960, I'd have to pay attention to the tasks at some point while they're actually running, and not just pick up the results later. This would also be a long-term problem if it ever became necessary to identify a specific GPU when one starts to cough up hairballs. Today's testing was run plain vanilla, with no command line parameters for the SoG tasks. The host is on BOINC 7.6.22. Has anybody done anything to look into this problem? |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
put some option into command line and place spacebar after it or just put some spaces into cmd line - will it help? SETI apps news We're not gonna fight them. We're gonna transcend them. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.