The Saga Begins (LotsaCores 2.0)

Message boards : Number crunching : The Saga Begins (LotsaCores 2.0)
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 8 · Next

AuthorMessage
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1812739 - Posted: 26 Aug 2016, 7:27:36 UTC - in response to Message 1812731.  
Last modified: 26 Aug 2016, 7:28:56 UTC

You can change the layout of SIV via the [SIV] > [Tools] > [Initial Display] >[Maximum Lines] > [Maximum Columns]. That should allow you to fit all the info from all your processors. You could also check the [Xeon PIR] feature since you do have Xeons.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1812739 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1812741 - Posted: 26 Aug 2016, 7:30:32 UTC - in response to Message 1812739.  

Will do, thanks for the tips. I'll check it out this evening.

ID: 1812741 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1812792 - Posted: 26 Aug 2016, 15:57:20 UTC - in response to Message 1812717.  

Jeff, here is a screenshot of my resource monitor on that machine

http://i.imgur.com/qFHDBp1.jpg

Do you see anything out of the ordinary? I did try disabling things like extra LAN ports, both COM ports, the audio that is installed with each card, that kind of thing, but it didn't seem to help.

No, it doesn't appear that your Hardware Reserve is anywhere near being an issue, at least compared to what I've run into. The problem I had/have on my 6980751 host, with 2 GTX660s and 2 GTX750Tis, is that the Hardware Reserve is 1793 MB for some reason, and when I tried replacing one of the 660s with a GTX960, the Hardware Reserve jumped to 2049 MB. Since it's running 32-bit Win7, that reserve reduced the available memory to 2047 MB, not enough to keep S@H running full blast. All my other boxes, whether 32-bit or 64-bit, only show a Hardware Reserve in the 1 MB to 3 MB range, so the reserved memory on that one machine is very puzzling to me.

You might also try that exercise that Keith mentioned in his post previous to mine:
You would have to look at one 750's system properties and the Resources tab to see how much memory footprint one card takes.

I tried that also, last week, just for my own edification and to see if the total memory mapped for the cards came close to matching the Hardware Reserve, but found it only accounted for about half of it. However, I did notice that the memory that the cards were mapped to appears to be included in the memory allocated to the PCI Buses. It would be just a guess on my part, but if that PCI Bus memory allocation is fixed, perhaps your six cards are using it all up. On my box, the total memory used for the 4 cards was 948 MB out of 1445 MB allocated to the PCI Buses. Perhaps one of the more knowledgeable hardware or OS guys could chime in on how this device memory mapping all works.
ID: 1812792 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1812799 - Posted: 26 Aug 2016, 16:53:19 UTC

The better way to mix diferent GPU models and optimize all of them is by running one instance of Boinc for each model of GPU. Not an easy task, only for "advanced users".

Unless you do that you will need to program the Boinc to match one family of GPU and run the others under or over it´s best optimizations.

My 0.02 Cents.
ID: 1812799 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1813867 - Posted: 30 Aug 2016, 18:36:58 UTC - in response to Message 1812799.  

Al,

Not seeing any commandlines for your GPUs in your results.

Have you not applied it yet?
ID: 1813867 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1813902 - Posted: 30 Aug 2016, 20:22:59 UTC - in response to Message 1813867.  

Zalster, I had planned on doing it, but a few things came up. Quick question, as I have all the 750s and the one 980Ti in there, is there anything I should watch out for before applying that command line that was recommended?

ID: 1813902 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1813904 - Posted: 30 Aug 2016, 20:26:55 UTC - in response to Message 1813902.  

I would try 1 work unit per card to start and see how it does. Run about 40-50 to make sure no errors.
ID: 1813904 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1813918 - Posted: 30 Aug 2016, 21:12:02 UTC - in response to Message 1813904.  

Ok, I've been running one per card since I fired it up, but other than that, would using the line

-sbs 256 -spike_fft_thresh 2048 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 32 -oclfft_tune_cw 32

work even if I have the 980 in there?

ID: 1813918 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1813941 - Posted: 30 Aug 2016, 22:11:49 UTC - in response to Message 1813918.  

Try increasing the value of sbs to 512 and add -hp
ID: 1813941 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1813966 - Posted: 30 Aug 2016, 23:38:14 UTC - in response to Message 1813941.  

So it should be
-sbs 512 -hp -spike_fft_thresh 2048 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 32 -oclfft_tune_cw 32

?

ID: 1813966 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1813977 - Posted: 31 Aug 2016, 0:31:09 UTC - in response to Message 1813966.  

Yes. Try that and let's see
ID: 1813977 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1814052 - Posted: 31 Aug 2016, 4:10:27 UTC - in response to Message 1813977.  

K, I will put it in there right now and see how things look in the morning. Thanks!

ID: 1814052 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1814069 - Posted: 31 Aug 2016, 6:01:50 UTC - in response to Message 1814052.  

I notice 3 errors so far, and all of them are the same,
<message>
finish file present too long
</message>
Grant
Darwin NT
ID: 1814069 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1814122 - Posted: 31 Aug 2016, 10:59:36 UTC - in response to Message 1814069.  
Last modified: 31 Aug 2016, 11:06:46 UTC

Ok, Grant, seeing that it's now tossing errors which look to have been caused by my changes, how should I adjust the command line I used to reduce and hopefully eliminate them? I don't want to be putting out any bad data in my quest to optimize my production of science results. Thanks!

*edit* I just took a quick look at my errors, and since I made the change at around 4:00:00 UTC, there were 2 errors, and they both appear to be on the CPU side. My thought was that the changes I made were related to the GPU side, or am I incorrect in this presumption?

ID: 1814122 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1814175 - Posted: 31 Aug 2016, 21:54:39 UTC - in response to Message 1814122.  

Al,

Agree, the 4 most recent errors were all CPU related.

The 5th error listed is a GPU error but is before any changes to your commandline so not related to recent addition

I've looked at the times of some of your GPU work (which takes a lot of digging) and they are all running good. BLC are anywhere from 15 minutes down to a shorty of 6 minutes. Hard to tell which GPU is did the 6 minutes since we still have that issue with it saying all tasks were done on GPU 0.. But that belongs to another thread.

Zalster
ID: 1814175 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1814242 - Posted: 1 Sep 2016, 4:16:25 UTC - in response to Message 1814175.  

Thanks for taking the time to look into it, I'll just let it run for a bit and see how it goes, since it appears so far at least to be running properly.

ID: 1814242 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1814247 - Posted: 1 Sep 2016, 4:32:50 UTC - in response to Message 1814242.  

Looks like that machine as an average of 15-16 for BLC and 12-13 minute for nonvlar work units.

If you are feeling brave, you can increase the sbs up to 768

If it starts to get sluggish, then decrease it back to 512.
ID: 1814247 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1814259 - Posted: 1 Sep 2016, 4:59:18 UTC - in response to Message 1814247.  

Surprised that with the mix of 750Ti's and the one 980Ti in play, that coming up with a compromised command line that works for such a disparity in compute capabilities is a real challenge. With regard to the CPU errors, I've seen those myself and I can always point to me shutting down BOINC right before a task finishes up when you think you've still got several percent to go till finish and it is safe to do so. Unfortunately, BOINC isn't good at maintaining good file housekeeping when tasks are close to finishing. It doesn't help that the percentage of completion reported in the Manager is often several percent behind the reality. I've learned to always let tasks finish and hopefully report already before shutting down the system for maintenance. I've never seen an issue where shutting down tasks right after they have started ever hurts anything. Either it restarts at the last written checkpoint or just starts over again.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1814259 · Report as offensive
_heinz
Volunteer tester

Send message
Joined: 25 Feb 05
Posts: 744
Credit: 5,539,270
RAC: 0
France
Message 1814283 - Posted: 1 Sep 2016, 8:03:18 UTC

Hi Al,
To avoid all issues with mixed configurations, I decided to run still one type of NVIDIA Hardware together in one machine. Therefore I have three equal graphicadapters (Titan) in V8-Xeon.
It would be better took off your 980 and set it into a other separat machine.

My expirience :-)
ID: 1814283 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1814914 - Posted: 4 Sep 2016, 0:49:43 UTC
Last modified: 4 Sep 2016, 0:59:07 UTC

Just wondering, on my machine with the 2 1060's, the current command line is

-sbs 256 -spike_fft_thresh 2048 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 32 -oclfft_tune_cw 32

and as it's been running for a week or so now, since it seems to be fairly stable with it's current settings, would it make sense to tweak the -sbs setting a little? I just bumped it up to 768 on the 56 core machine a few minutes ago, and it did make a difference in it's 'laggyness', but as all it usually does is sit there and crunch, other when I need to make the occasional post here in the forum from it, I find that lag an acceptable tradeoff as long as it increases performance.

Not sure with this CPU and these cards if bumping it up a notch or 2 is wise, but please let me know your thoughts. Thanks!

*Edit* Also am only running one task per card on it, would it make sense to move it to 2, or does SoG like 1 at a time better? Also just took a closer look at the CPU tasks that are currently running, and the average run time is about 3 1/2 to 4 hours per task, and is running 12 tasks (all cores, with HT) currently. The GPU's are running the usual 0.04 CPUs.

ID: 1814914 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 8 · Next

Message boards : Number crunching : The Saga Begins (LotsaCores 2.0)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.