Message boards :
Number crunching :
SNB-E not using all threads as it should
Message board moderation
Author | Message |
---|---|
Eric B Send message Joined: 9 Mar 00 Posts: 88 Credit: 168,875,085 RAC: 762 |
I have 2 OpenSuse 12.1 x64 Linux systems: a 4 Core-HT Sandy Bridge system with a Nvidia GTX460, and 8G DRAM - on that system the total is 14 boinc threads - 8 cpu and 2 gpu. OK, thats great and what i would expect. I also have an SNB-E system which is 6 core HT (12 threads) and it also has an Nidia GTX460, but 16G DRAM. On that system i get only 11 cpu and 2 cuda tasks running at a time. Both systems are using seti boinc version 6.10.58 The app_info.xml is virtually identical on both systems. and both are using Alex's AK_V* optimized linux fermi apps e.g the SNB-E system: cat ~/BOINC/projects/setiathome.berkeley.edu/app_info.xml <app_info> <app> <name>setiathome_enhanced</name> </app> <file_info> <name>AK_V8_linux64_ssse3</name> <executable/> </file_info> <app_version> <app_name>setiathome_enhanced</app_name> <version_num>603</version_num> <file_ref> <file_name>AK_V8_linux64_ssse3</file_name> <main_program/> </file_ref> </app_version> <app> <name>setiathome_enhanced</name> </app> <file_info> <name>setiathome-6.11.x86_64-pc-linux-gnu__cuda32</name> <executable/> </file_info> <app_version> <app_name>setiathome_enhanced</app_name> <version_num>611</version_num> <plan_class>cuda_fermi</plan_class> <avg_ncpus>0.250</avg_ncpus> <max_ncpus>0.50</max_ncpus> <coproc> <type>CUDA</type> <count>0.50</count> </coproc> <file_ref> <file_name>setiathome-6.11.x86_64-pc-linux-gnu__cuda32</file_name> <main_program/> </file_ref> </app_version> </app_info> ldd AK_V8_linux64_ssse3 linux-vdso.so.1 => (0x00007fff563aa000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fa3a5570000) libc.so.6 => /lib64/libc.so.6 (0x00007fa3a51e0000) /lib64/ld-linux-x86-64.so.2 (0x00007fa3a578d000) libm.so.6 => /lib64/libm.so.6 (0x00007fa3a4f89000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fa3a4d73000) libdl.so.2 => /lib64/libdl.so.2 (0x00007fa3a4b6f000) locate libcuda: BOINC/projects/setiathome.berkeley.edu/libcudart.so.3 /usr/lib/libcuda.so /usr/lib/libcuda.so.1 /usr/lib/libcuda.so.304.43 /usr/lib64/libcuda.so /usr/lib64/libcuda.so.1 /usr/lib64/libcuda.so.304.43 /usr/local/cuda/lib/libcudart.so /usr/local/cuda/lib/libcudart.so.4 /usr/local/cuda/lib/libcudart.so.4.1.28 /usr/local/cuda/lib64/libcudart.so /usr/local/cuda/lib64/libcudart.so.4 /usr/local/cuda/lib64/libcudart.so.4.1.28 And on the 8 thread SNB system it looks like this: cat ~/BOINC/projects/setiathome.berkeley.edu/app_info.xml <app_info> <app> <name>setiathome_enhanced</name> </app> <file_info> <name>AK_V8_linux64_ssse3</name> <executable/> </file_info> <app_version> <app_name>setiathome_enhanced</app_name> <version_num>603</version_num> <file_ref> <file_name>AK_V8_linux64_ssse3</file_name> <main_program/> </file_ref> </app_version> <app> <name>setiathome_enhanced</name> </app> <file_info> <name>setiathome-6.11.x86_64-pc-linux-gnu__cuda32</name> <executable/> </file_info> <app_version> <app_name>setiathome_enhanced</app_name> <version_num>611</version_num> <plan_class>cuda_fermi</plan_class> <avg_ncpus>0.250</avg_ncpus> <max_ncpus>0.50</max_ncpus> <coproc> <type>CUDA</type> <count>0.50</count> </coproc> <file_ref> <file_name>setiathome-6.11.x86_64-pc-linux-gnu__cuda32</file_name> <main_program/> </file_ref> </app_version> </app_info> and: ldd AK_V8_linux64_ssse3 linux-vdso.so.1 => (0x00007fff129c6000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007feb1d660000) libc.so.6 => /lib64/libc.so.6 (0x00007feb1d2d0000) /lib64/ld-linux-x86-64.so.2 (0x00007feb1d87d000) libm.so.6 => /lib64/libm.so.6 (0x00007feb1d079000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007feb1ce63000) libdl.so.2 => /lib64/libdl.so.2 (0x00007feb1cc5f000) libcuda seeems to be 4.1.28 BOINC/projects/setiathome.berkeley.edu/libcudart.so.3 usr/lib/libcuda.so /usr/lib/libcuda.so.1 /usr/lib/libcuda.so.304.43 /usr/lib64/libcuda.so /usr/lib64/libcuda.so.1 /usr/lib64/libcuda.so.304.43 /usr/local/cuda/lib/libcudart.so /usr/local/cuda/lib/libcudart.so.4 /usr/local/cuda/lib/libcudart.so.4.1.28 /usr/local/cuda/lib64/libcudart.so /usr/local/cuda/lib64/libcudart.so.4 /usr/local/cuda/lib64/libcudart.so.4.1.28 |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
If the machine has downloaded more than 11 tasks for the CPU. Then you might want to check the following BOINC settings on your profile and locally. On multiprocessors, use at most n processors On multiprocessors, use at most n% of the processors Also you would want to verify that BOINC is reporting 12 CPUs in the startup. Sort of like this: 12-Oct-2012 16:34:35 [---] Processor: 24 GenuineIntel Intel(R) Xeon(R) CPU E5645 @ 2.40GHz SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Eric B Send message Joined: 9 Mar 00 Posts: 88 Credit: 168,875,085 RAC: 762 |
I did find that the SNB-E has this cc_config.xml file in the BOINC/projects directory (but its named cc_config.xml.off) which i assume means it wont be read and there is no corresponding file in the the other system. Other than that both are set to use 100% processors and at most use xx processors is set at 192 due to there are a few rare occasions i can get to play with a very big server and 192 more than covers the number of threads that thing has. I checked these settings on website and also on he manager preferences menu, Cold this be causing the problem even tho its name is cc_config.xml.off? Is there any config file in ~/BOINC i can examine to help determine why it only runs 11 cpu tasks? All threads seem fully occupied if i go by the gkrellm display cat ~/BOINC/cc_config.xml.off <cc_config> <log_flags> <cpu_sched>1</cpu_sched> <debt_debug>1</debt_debug> <cpu_sched_debug>1</cpu_sched_debug> <coproc_debug>1</coproc_debug> <cpu_sched>1</cpu_sched> <file_xfer>0</file_xfer> <file_xfer_debug>0</file_xfer_debug> <app_msg_send>1</app_msg_send> <app_msg_receive>1</app_msg_receive> <unparsed_xml>1</unparsed_xml> <work_fetch_debug>1</work_fetch_debug> </log_flags> </cc_config> |
Eric B Send message Joined: 9 Mar 00 Posts: 88 Credit: 168,875,085 RAC: 762 |
oh, i missed answering one of your questions - yeh i have plenty of cpu and gpu tasks on both machines, according to the boinc manager anyway (hand counted well over 25 each of cpu and gpu bfore i stopped counting). I wrote a script to track some things and while i dont claim its 100% accurate (the estimates of "available" work are only estimates based on what i see as how the average work progresses so they could be off a bit, but its darn close) here is its output. My stats come from analyzing the client_state.xml file and deducing what things meant by looking the boinc manager for clues, eg. find WU xx_yy and see what its state was in the manager then go find it in client_state.xml and see what i could learn. I think I have the id's of most of the states pretty well nailed down. There are actually 2 other states i haven’t worked into my script yet and they are called "active_task_states" : state 0 "started but currently suspended" and state 1 "actually executing" and I'm always on the hunt for more info i can ferret out of that file and add to my script. I do network upload/downloads once a day and run this script via cron about 5 minutes before that, I'm watching for errors and so forth because i find if you try to do 3 cuda tasks you start to see some errors, maybe 7 out of 50 completed fermi tasks or so, could just be the fermi SW as its the only linux fermi app out there that i know of anyway)
sys1 is the 16 thread snb and snb2 is the quad core HT system sorry for the formatting, there doesn’t seem to be a way to get the script output to space out properly. You can try to copy and paste it into an editor with fixed spacing and it should be more readable |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
I did find that the SNB-E has this cc_config.xml file in the BOINC/projects directory (but its named cc_config.xml.off) which i assume means it wont be read and there is no corresponding file in the the other system. Other than that both are set to use 100% processors and at most use xx processors is set at 192 due to there are a few rare occasions i can get to play with a very big server and 192 more than covers the number of threads that thing has. I would expect BOINC is ignoring your cc_config.xml.off file. You could modify your client config with <ncpus>N</ncpus> to force 12 CPU tasks to run, but that might just run 12 tasks on 11 cores. I am guessing this the machine in question is host 4520457. Which does show BOINC reporting 12 cores. If you have something limiting the processor affinity of BOINC to not use one of the processors/cores that might explain what is happening. Are you seeing only 11 instances of AK_V8_linux64_ssse3 running then? EDIT: Also the BBCode tags [ pre ] [ /pre ] are for preformatted text. CUDA CUDA MB MB MB Total CUDA MB MB Ready CUDA Rdy Ready to Uploads Ready to Uploads Downloads Available Available Average System WUs WUs WUs to Start to Start Report Pending Report Pending Pending CUDA Work MB Work Credit ------------------------------------------------------------------------------------------------------------------------------ sys1 1966 677 668 507 489 1 97 1 138 0 3.7 days 5.0 days 27154 sys2 1962 1397 574 443 1261 10 13 42 84 0 4.1 days 10.1 days 15541 ------------------------------------------------------------------------------------------------------------------------------ SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Eric B Send message Joined: 9 Mar 00 Posts: 88 Credit: 168,875,085 RAC: 762 |
Thanks for the tags tip! The [] part kinda threw me for a minute and made me wonder why <tag></tag> didn’t do anything, then i realized the format was not standard and i would assume it's to prevent attacks. Yeh post this is better looking now. also i re-wrapped some stuff for easier reading. > ps aux|grep AK|grep -v grep |wc -l 11 From boinc mgr msgs: Sun 14 Oct 2012 03:15:10 AM PDT Starting BOINC client version 6.10.58 for x86_64-pc-linux-gnu Sun 14 Oct 2012 03:15:10 AM PDT Config: GUI RPC allowed from: Sun 14 Oct 2012 03:15:10 AM PDT Config: 192.168.1.17 Sun 14 Oct 2012 03:15:10 AM PDT Config: 192.168.1.103 Sun 14 Oct 2012 03:15:10 AM PDT log flags: file_xfer, sched_ops, task Sun 14 Oct 2012 03:15:10 AM PDT Libraries: libcurl/7.18.0 OpenSSL/0.9.8g zlib/1.2.5 c-ares/1.5.1 Sun 14 Oct 2012 03:15:10 AM PDT Data directory: /home/erbenton/BOINC Sun 14 Oct 2012 03:15:10 AM PDT Processor: 12 GenuineIntel Intel(R) Core(TM) i7-3960X CPU @ 3.30GHz [Family 6 Model 45 Stepping 7] Sun 14 Oct 2012 03:15:10 AM PDT Processor: 15.00 MB cache Sun 14 Oct 2012 03:15:10 AM PDT Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni Sun 14 Oct 2012 03:15:10 AM PDT OS: Linux: 3.1.10cstm-1.16-cstm Sun 14 Oct 2012 03:15:10 AM PDT Memory: 15.63 GB physical, 512.00 MB virtual Sun 14 Oct 2012 03:15:10 AM PDT Disk: 52.87 GB total, 19.04 GB free Sun 14 Oct 2012 03:15:10 AM PDT Local time is UTC -7 hours Sun 14 Oct 2012 03:15:10 AM PDT NVIDIA GPU 0: GeForce GTX 460 (driver version unknown, CUDA version 4020, compute capability 2.1, 1024MB, 641 GFLOPS peak) Sun 14 Oct 2012 03:15:10 AM PDT SETI@home Found app_info.xml; using anonymous platform Sun 14 Oct 2012 03:15:10 AM PDT SETI@home URL http://setiathome.berkeley.edu/; Computer ID 4520457; resource share 100 Sun 14 Oct 2012 03:15:10 AM PDT General prefs: from http://milkyway.cs.rpi.edu/milkyway/ (last modified 29-May-2011 00:31:18) Sun 14 Oct 2012 03:15:10 AM PDT Host location: none Sun 14 Oct 2012 03:15:10 AM PDT General prefs: using your defaults Sun 14 Oct 2012 03:15:10 AM PDT Reading preferences override file Sun 14 Oct 2012 03:15:10 AM PDT Preferences: Sun 14 Oct 2012 03:15:10 AM PDT max memory usage when active: 8002.02MB Sun 14 Oct 2012 03:15:10 AM PDT max memory usage when idle: 12803.23MB Sun 14 Oct 2012 03:15:10 AM PDT max disk usage: 4.00GB Sun 14 Oct 2012 03:15:10 AM PDT (to change preferences, visit the web site of an attached project, or select Preferences in the Manager) Could it be that Milkyway at home entry? i have not been able to figure out how to get fully rid of it. e.g it never shows in the boinc mgr but its in various files. How do i completely clean that thing out of there? It seems a likely candidate for trouble in my case so it would be good to clean that out and see if it clears up the missing instance problem. I wonder if its safe to just go delete all these references? >grep milkyway * |less Sun 14 Oct 2012 03:15:10 AM PDT Starting BOINC client version 6.10.58 for x86_64-pc-linux-gnu Sun 14 Oct 2012 03:15:10 AM PDT Config: GUI RPC allowed from: Sun 14 Oct 2012 03:15:10 AM PDT Config: 192.168.1.17 Sun 14 Oct 2012 03:15:10 AM PDT Config: 192.168.1.103 Sun 14 Oct 2012 03:15:10 AM PDT log flags: file_xfer, sched_ops, task Sun 14 Oct 2012 03:15:10 AM PDT Libraries: libcurl/7.18.0 OpenSSL/0.9.8g zlib/1.2.5 c-ares/1.5.1 Sun 14 Oct 2012 03:15:10 AM PDT Data directory: /home/erbenton/BOINC Sun 14 Oct 2012 03:15:10 AM PDT Processor: 12 GenuineIntel Intel(R) Core(TM) i7-3960X CPU @ 3.30GHz [Family 6 Model 45 Stepping 7] Sun 14 Oct 2012 03:15:10 AM PDT Processor: 15.00 MB cache Sun 14 Oct 2012 03:15:10 AM PDT Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni Sun 14 Oct 2012 03:15:10 AM PDT OS: Linux: 3.1.10cstm-1.16-cstm Sun 14 Oct 2012 03:15:10 AM PDT Memory: 15.63 GB physical, 512.00 MB virtual Sun 14 Oct 2012 03:15:10 AM PDT Disk: 52.87 GB total, 19.04 GB free Sun 14 Oct 2012 03:15:10 AM PDT Local time is UTC -7 hours Sun 14 Oct 2012 03:15:10 AM PDT NVIDIA GPU 0: GeForce GTX 460 (driver version unknown, CUDA version 4020, compute capability 2.1, 1024MB, 641 GFLOPS peak) Sun 14 Oct 2012 03:15:10 AM PDT SETI@home Found app_info.xml; using anonymous platform Sun 14 Oct 2012 03:15:10 AM PDT SETI@home URL http://setiathome.berkeley.edu/; Computer ID 4520457; resource share 100 Sun 14 Oct 2012 03:15:10 AM PDT General prefs: from http://milkyway.cs.rpi.edu/milkyway/ (last modified 29-May-2011 00:31:18) Sun 14 Oct 2012 03:15:10 AM PDT Host location: none Sun 14 Oct 2012 03:15:10 AM PDT General prefs: using your defaults Sun 14 Oct 2012 03:15:10 AM PDT Reading preferences override file Sun 14 Oct 2012 03:15:10 AM PDT Preferences: Sun 14 Oct 2012 03:15:10 AM PDT max memory usage when active: 8002.02MB Sun 14 Oct 2012 03:15:10 AM PDT max memory usage when idle: 12803.23MB Sun 14 Oct 2012 03:15:10 AM PDT max disk usage: 4.00GB Sun 14 Oct 2012 03:15:10 AM PDT (to change preferences, visit the web site of an attached project, or select Preferences in the Manager) |
Highlander Send message Joined: 5 Oct 99 Posts: 167 Credit: 37,987,668 RAC: 16 |
about the milkyway "problem": General prefs: from http://milkyway.cs.rpi.edu/milkyway/ (last modified 29-May-2011 00:31:18) The Computing Preferences ar boinc-wide setup, only the last modified are used. imho no problem; if you like to change this, go on to your http://setiathome.berkeley.edu/prefs.php?subset=global here at seti, do only an edit + save and a following boinc-client update, then this is done. assuming from your boinc-startup log, there isnt a milkyway directory at your projects-directory at all.... for calculating on gpus, personally meaning is that a free thread/core for feeding the gpu is very welcome. ok, that isnt helping a lot... (if you had 2 gtx460 in your 12thread machine, then the missing thread can be explained, but try a change of <avg_ncpus>0.250</avg_ncpus> to <avg_ncpus>0.2</avg_ncpus> + restart boinc - Performance is not a simple linear function of the number of CPUs you throw at the problem. - |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.