Message boards :
Number crunching :
I have a new system, expected runtimes?
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13732 Credit: 208,696,464 RAC: 304 |
Another update, with 1 module in either socket, the runtime has reduced from ~10 hrs to ~5 hrs using the AVX app That sounds much better. Having at least dual channel memory operation I suspect would still give a significant boost, but at least the present runtimes aren't nearly as ridiculous as they were before. Would be worth re-checking the CPU clock speed & temperatures. They should still be at maximum speed, but i'd expect the temperature to have picked up a bit (or compare the power usage figures). Going from the SSSE3 application (at the time) to the AVX application I had to replace my i7 stock cooler with an after market one as it made the CPU work that much harder. Grant Darwin NT |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
Another update, with 1 module in either socket, the runtime has reduced from ~10 hrs to ~5 hrs using the AVX app With 1 DIMM per CPU socket lower CPU times would make sense to me. Given each CPU would have direct access to the memory. Instead of one CPU having to access memory via the QPI link to the other CPU. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Kiska Send message Joined: 31 Mar 12 Posts: 302 Credit: 3,067,762 RAC: 0 |
Another update, with 1 module in either socket, the runtime has reduced from ~10 hrs to ~5 hrs using the AVX app The core clock is still 3005Mhz and the package temp is 53C in a room that is about 12C Except I am still noticing once in a while a few tasks will still go up and above 9+ hrs |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13732 Credit: 208,696,464 RAC: 304 |
Except I am still noticing once in a while a few tasks will still go up and above 9+ hrs A result of Memory contention, combined with feeding the GPU is my guess. One module in each bank should reduce overall runtimes and reduce that occurrence. Also, if you use an app_config to reserve 1 CPU core for each GPU WU, it should reduce that occasional extra long runtime occurrence even with the limited memory. <app_config> <app> <name>setiathome_v8</name> <gpu_versions> <gpu_usage>1.00</gpu_usage> <cpu_usage>1.00</cpu_usage> </gpu_versions> </app> And if you set sbs to 1024 and period_iterations to 1 (for a dedicated cruncher. Try 5, or 10 or 30, whichever has the least impact on usability, for a general use system) you should get a bit more out of your GPU. Grant Darwin NT |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
Another update, with 1 module in either socket, the runtime has reduced from ~10 hrs to ~5 hrs using the AVX app That is GREAT news! Tom A proud member of the OFA (Old Farts Association). |
Kiska Send message Joined: 31 Mar 12 Posts: 302 Credit: 3,067,762 RAC: 0 |
Another update, with 1 module in either socket, the runtime has reduced from ~10 hrs to ~5 hrs using the AVX app It is good news but because memory is still being heavily contended, some units especially units with Angle Ranges of ~0.44(midAR), these run upto 9hrs. I have seen 2.x(VHAR) AR running for 5 hrs. I am seeing vlar tasks do about 6 hrs, with this new memory configuration. Plus I have a GPU taking time doing work as well |
Kiska Send message Joined: 31 Mar 12 Posts: 302 Credit: 3,067,762 RAC: 0 |
So another update: I have installed PCM - Processor Counter Monitor And in there I can see memory bandwidth utilisation And wow is my memory getting hammered! 18GB/s on single channel DDR3-1333(PC3-10600R) memory, and because my motherboard does weird things with memory, if I only populate 1 slot of a channel it halves the memory bandwidth! |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
So another update: I have installed PCM - Processor Counter Monitor That is my argument for using the preferences on Seti/Boinc website (or locally in the Boinc manager) to reduce the number of cpu's you are trying to use. It will reduce the memory contention until you are able to add to memory. And I think it will increase productivity. If the test goal were to reduce to processing to say four (4) cores you would set the amount of cpu to: (4/36 actual cores) = 11% It is also very possible the paging file on your hard disk is being hit on a fulltime basis (which slows life down). The best fix for that is more memory :) But there are two free things you can do that MIGHT help. 1) Set your paging file minimum and maximum's to whatever the current recommendation is by windows (re-visit it every time you add memory). Reboot as required. 2) Download a free version of "DeFragler" from Piriform. Under Settings -> Boot time defrag -> do once. What this will do is defrag your paging file so that it has the fastest possible access. This might help your memory contention issue some, and it will help system "responsiveness". It does take time, so don't freak out if it takes 5 minutes+ to defrag the paging file (that is why I only do it "once"). The above trick will help, for instance, a PC/Laptop/Netbook that is pausing but otherwise has sufficient memory. It gets less "laggy". HTH, Tom A proud member of the OFA (Old Farts Association). |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13732 Credit: 208,696,464 RAC: 304 |
It is also very possible the paging file on your hard disk is being hit on a fulltime basis (which slows life down). The best fix for that is more memory :) But there are two free things you can do that MIGHT help. 1) Set your paging file minimum and maximum's to whatever the current recommendation is by windows (re-visit it every time you add memory). Reboot as required. 2) Download a free version of "DeFragler" from Piriform. Under Settings -> Boot time defrag -> do once. What this will do is defrag your paging file so that it has the fastest possible access. Or better yet, use a SSD. Grant Darwin NT |
Kiska Send message Joined: 31 Mar 12 Posts: 302 Credit: 3,067,762 RAC: 0 |
So another update: I have installed PCM - Processor Counter Monitor Its not hitting my paging file at all, I'll grab a snippet of what is happening from pcm: ---------------------------------------||---------------------------------------| |-- Socket 0 --||-- Socket 1 --| |---------------------------------------||---------------------------------------| |-- Memory Channel Monitoring --||-- Memory Channel Monitoring --| |---------------------------------------||---------------------------------------| |-- Mem Ch 1: Reads (MB/s): -1.00 --||-- Mem Ch 1: Reads (MB/s): 5795.37 --| |-- Writes(MB/s): -1.00 --||-- Writes(MB/s): 3507.86 --| |-- Mem Ch 3: Reads (MB/s): 5527.16 --||-- Mem Ch 3: Reads (MB/s): -1.00 --| |-- Writes(MB/s): 2983.31 --||-- Writes(MB/s): -1.00 --| |-- NODE 0 Mem Read (MB/s) : 5527.16 --||-- NODE 1 Mem Read (MB/s) : 5795.37 --| |-- NODE 0 Mem Write(MB/s) : 2983.31 --||-- NODE 1 Mem Write(MB/s) : 3507.86 --| |-- NODE 0 P. Write (T/s): 1665388 --||-- NODE 1 P. Write (T/s): 2343303 --| |-- NODE 0 Memory (MB/s): 8510.47 --||-- NODE 1 Memory (MB/s): 9303.23 --| |---------------------------------------||---------------------------------------| |---------------------------------------||---------------------------------------| |-- System Read Throughput(MB/s): 11322.53 --| |-- System Write Throughput(MB/s): 6491.17 --| |-- System Memory Throughput(MB/s): 17813.70 --| |---------------------------------------||---------------------------------------| That is it running 27 CPU tasks + 1 GPU. SO what is happening is that the app is waiting for data from memory, and it waits and waits, but it still uses 100% of the thread until it has that data Say I reduce it to 4 CPU tasks and each of them run for 1.5 hrs and leaving them at 27 tasks and each of them run for 5 hrs for example. This PC would be more productive in doing 27 tasks in a 5 hr period than 12-14 tasks in the same 5 hr period when I reduce it |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
So another update: I have installed PCM - Processor Counter Monitor I would probably start BOINC with the affinity command and set it to only use CPU 0 and then have the 2 dimms in slots for CPU0. Something along the lines of "start /affinity FFFF C:\BOINC\boinc.exe". Alternativly if you are using NUMA you could simplify the command like "start /NODE 0 C:\BOINC\boinc.exe" Also making use of the cc_config.xml option for <ncpus> or telling BOINC to only use 50% of the CPUs would be needed. Otherwise it would try to run 32 tasks across 16 CPUs. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Kiska Send message Joined: 31 Mar 12 Posts: 302 Credit: 3,067,762 RAC: 0 |
So I just installed 2 more sticks/modules into my system, I'll report on the runtimes when they have a chance to run, with 26 tasks active at one time, this time no GPU as that is not being used |
Kiska Send message Joined: 31 Mar 12 Posts: 302 Credit: 3,067,762 RAC: 0 |
Update: with 2 channels occupied, preliminary results are around about 9k seconds per task with 26 tasks running at once |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
Update: with 2 channels occupied, preliminary results are around about 9k seconds per task with 26 tasks running at once I read that as 2.5 hours / task. That is WAY much better! I remember reading as high as 9+ hours / task. Tom A proud member of the OFA (Old Farts Association). |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
Update: with 2 channels occupied, preliminary results are around about 9k seconds per task with 26 tasks running at once That is looking much better. That is around the upper limit for tasks on mine running 32 tasks at once with 8 dimms (4 per CPU) in quad channel mode. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Kiska Send message Joined: 31 Mar 12 Posts: 302 Credit: 3,067,762 RAC: 0 |
Update: with 2 channels occupied, preliminary results are around about 9k seconds per task with 26 tasks running at once SO I am going to spend about another $120 for more dimms, so I can occupy the last 2 channels |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13732 Credit: 208,696,464 RAC: 304 |
So I just installed 2 more sticks/modules into my system, I'll report on the runtimes when they have a chance to run, with 26 tasks active at one time, this time no GPU as that is not being used So at present you've got 1 module in each bank? (blue slots i'm guessing, and the ones closest to the CPUs?) Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13732 Credit: 208,696,464 RAC: 304 |
For those that are interested, Tom's Hardware posted an article looking at the new Mesh architecture Intel are using for their high core count/multi socket CPUs to replace their long standing Ring Bus architecture. ... on the Broadwell LCC (Low Core Count) die... for instance, moving data from one core to its closest neighbor requires one cycle. Moving data to more distant cores requires more cycles, thus increasing the latency associated with data transit. It can take up to 12 cycles to reach the most distant core... So now I can see why, even though Seti work itself doesn't benefit from huge amounts of memory bandwidth, Kiska's runtimes were so high with the original setup of both DIMMs on the one CPU socket. Intel mesh architecture. Grant Darwin NT |
Kiska Send message Joined: 31 Mar 12 Posts: 302 Credit: 3,067,762 RAC: 0 |
For those that are interested, Tom's Hardware posted an article looking at the new Mesh architecture Intel are using for their high core count/multi socket CPUs to replace their long standing Ring Bus architecture. Even with 1 dimm in each socket |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
For those that are interested, Tom's Hardware posted an article looking at the new Mesh architecture Intel are using for their high core count/multi socket CPUs to replace their long standing Ring Bus architecture. Not populating all of the memory channels for a CPU reminds me of the saying "You can't put 10lbs of 'stuff' into a 5lb box" SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.