Posts by Woodgie


log in
1) Message boards : Number crunching : I need some help to find a software (Message 1657586)
Posted 5 days ago by Profile WoodgieProject donor
I'll third the use of VNC (Virtual Network Computing) for remote access. If the computers are on the same network I actually found TightVNC a breeze to set up. If the computers are on different networks then I can imagine it involving VPN or port forwarding which might get a bit more complex, but still works well.

VNC works by opening what would be displayed on a monitor attached directly to the computer you're connecting to in a window on the computer you're sitting in front of. It won't allow you to play games - the 'screen' can't refresh anywhere near fast enough - but it's absolutely fine for things like using Word, Excel or... controlling BOINC.
2) Message boards : Number crunching : Just added 3rd GPU and CPU is 'Waiting for Memory' (Message 1656776)
Posted 7 days ago by Profile WoodgieProject donor
jason_gee. Wow, thanks for that! Proper technical insight into a proper technical subject. I find it fascinating.

May I ask a few questions on behalf of the class? I'm going to say now I understand it's a complex subject and generalisations are going have to be made.

(1) When you say 'Windows display driver model' I take it you mean Microsoft have dictated "This is how you need to write a driver to interface between your hardware and the OS because this is how we've designed the OS'.

(2) Can you tell us how this differs from Linux & Mac OS X and does this make a difference as to how efficient the platform is as a number crunching entity; that is, does the latency introduced by the Windows 'double-buffering' affect how fast the same working would be crunched on Windows vs. Linux/OS X, all other things being equal. (Yes, I am aware I'm asking you to explain how long a piece of quantum superstring is :) )

(3) Would adding more RAM help the issue, i.e. reduce paging or is it "not as simple as that". I've got 8GB RAM across the 3 cards (4+2+2) so I'm assuming it's trying to reserve 8GB kernel space to call its own. (I don't know where to find the window you showed to check).
3) Message boards : Number crunching : Just added 3rd GPU and CPU is 'Waiting for Memory' (Message 1656338)
Posted 8 days ago by Profile WoodgieProject donor
.... the Titan can handle 4 tasks but the 750TI can’t.

It's not a case of can't, it's a case of it's not efficient.
And I suspect it's the same with the Titan- it may very well be able to crunch 5 or ever 6WUs at a time, but what good is that if you end up doing less work?
Even with the Titan, there's a good chance that 2 WUs at a time will give the most work per hour. 3 would most likely give slightly less, I'd suggest 4 at a time gives you much less return than just running 2 at a time.


That's really what I meant by "Can" and "Can't", I see your point entirely.

I'm going to give it a couple of days; wait for the weekly DB housekeeping to be done today and then see what setting gives me the best bang for my buck.

I ALWAYS learn something useful from these threads.

(I still want a couple of K80s and failing that a couple of Titan-X)

EDIT: Why not? 2 it is.
4) Message boards : Number crunching : Just added 3rd GPU and CPU is 'Waiting for Memory' (Message 1656333)
Posted 8 days ago by Profile WoodgieProject donor
Addendum:

Found where to set the CPU core count. In cc-config the following option:

<ncpus>N</ncpus> Act as if there were N CPUs; e.g. to simulate 2 CPUs on a machine that has only 1. To use the number of available CPUs, set the value to -1 (was 0 which in newer clients really means zero).


So setting ncpus to 7 should, theoretically, free up a core to feed the trolls GPUs

EDIT: Yep, that's the ticket. And all is quiet again.


Until next time
5) Message boards : Number crunching : Just added 3rd GPU and CPU is 'Waiting for Memory' (Message 1656329)
Posted 8 days ago by Profile WoodgieProject donor
As ever, I want to give my thanks to everyone involved with helping me troubleshoot things. You all chip in with the useful stuff and I just do the legwork :)

This is going to be a long one, sorry. I'm also going to take things out of order to try to apply them in an sensible troubleshooting order. Again, I’m being verbose on the off chance that this will help someone else.

- - - - -
As far as the "waiting for memory" bit, you could look in Task Manager and find out how much memory each CPU and GPU task are taking, and compare that to your actual RAM. This would give you a firm idea if the WFM is actual memory, or something else misleadingly labeled.

For example, in my case, CPU tasks take about 35MB each, and GPU about 125MB each, so 8*35 + 12*125 is < 2GB, so I don't think WFM is referring to real RAM if you have 4GB or more.

Perhaps 20 threads fighting over the CPU is causing excessive system overhead(?). Turn on (in Task Manager) View -> Show Kernel Times. If it is mostly red, then that is likely the problem - the system is thrashing trying to support all those compute-bound threads. Remember, you are running 8 + 12*.04 cores worth of threads, even by BOINC/SETI's estimate.

If you have only 8 cores, you are going to be switching tasks A LOT. (Hence more red in the graphs). I bet it would help a lot if you went to 7 CPU tasks, leaving one CPU for the 12 GPU tasks. And if they are HT cores, even worse, since they already share resources pair-wise.


OK, they ARE HT cores (4 physical cores) so that’s a consideration.

Please remember I’m not a Windows person, so If I’m reading this wrongly, apologies.

Firstly, I can’t see where to turn on Show Kernel Times, I certainly can’t see it in Task (or Resource) Manager’s ‘View’ menu. Still, it’s not essential as I think your simple equation has shown me something very important. That I need to reduce the number of CPU tasks. Which is what I thought.

With regards to RAM. It appears that:
The CUDA tasks are taking between 105MB and 130MB
The CPU tasks seem to be taking 36MB
The amount of physical memory being used (the number reported at the bottom of the window) is 22% to 25%, fluctuating. This, to me says I’m using about 4GB of the 16GB in the system, plenty of overhead there.

If I look under the ‘Performance’ tab I see
Physical Memory(MB)
Total: 16322
Cached: 3267
Available: 12316
Free: 9208

So I don’t think it’s actual RAM problems… probably. I’ll come back to the number of tasks thing in a minute.
- - - - -


In your BOINC Computing preferences. What do you have for
    When computer is in use, use at most
    When computer is not in use, use at most
    Page/swap file: use at most

I think the default values are like 40 or 50%. Which if I'm doing my maths correctly should be fine for 12 GPU + 8 CPU tasks. However BOINC seems to think otherwise at the moment. So I would try bumping up those values if you have not already done so.
I pretty much maxed out the settings in the BOINC prefs for everything. SO I wouldn't have weird resource issues. At least not ones caused by BOINC deciding it needed to do something about.



OK, just done a check and experiment.
In Use was at 50% changed to 80%
Not In Use was at 80% changed to 90%
Page/Swap was at 20% changed to 90%

Forced an update and it doesn't seem to have affected things. My thinking is I won’t see any change as I wasn’t anywhere near using 50% RAM with the original settings, so upping its allocation won’t help.

I checked all the same info as above with the new settings and I was tight, there was no change. (Changed it back for now as I can easily change it again should I need)
- - - - -


If previous suggestions don't help, I suggest setting the mem_usage_debug log flag in cc_config.xml. That will produce multiple lines in the event log each time BOINC decides what tasks should be running, so turning it off again after it captures the usage info would be sensible.

The "Waiting for memory" is based on the smoothed working set size of each active task. That is, BOINC begins with the available RAM and for each task it's going to start or leave running it subtracts that smoothed value. If available goes negative, the task is not started but remains in the active task list.


For the sake of completeness I’ll mention this essentially showed me what Task Manager showed me. It’s always good to remember to read the logs, people! :)
- - - - -

My GTX 750Tis produce more work per hour running only 2 a time (I'm MB only). 3 at a time was very close, but not quite as good. 4 at a time would have resulted in significantly less WUs crunched per hour than running 2 at a time.


4 work units per 750? Very ambitious I'm sure the Titan has no problem with that but think your stressing those 750s. I'd take it down to 3 work units per 750. That might still be too much but Jason seems to think under best conditions you could get three. I only run 2 on mine as I notice lock ups but that has to do with my AMD chip. Best option is teaming the Titan with similar gpu that doesn't hamstring it. I guess with this new boinc you might be able to direct how many work units per specific gpu. I have tired that yet. I'd first try reducing the total number per gpu first and if it relives you know which direction to go.


Well, I HAD teamed it with another Titan, but it died :(

I was wondering if it was possible to set the number of tasks per GPU and there’s something in the back of my brain nudging me saying it’s come up in a thread of mine before but I’m going to type this before researching it.
- - - - -


OK, so here’s what I’m going to do. I’m going to reduce the number of GPU tasks per card. This is a bit of a trade off, as has been pointed out, the Titan can handle 4 tasks but the 750TI can’t. So I’m going to split the difference and drop them to 3 tasks each. This should, by jravin’s equation, reduce the number of threads the CPU is trying to contend with to 8+(9*.04)

And… It worked.

BUT! That’s still higher than probably it should be, so I should drop the number of CPU tasks but here’s the question, how?
6) Message boards : Number crunching : Just added 3rd GPU and CPU is 'Waiting for Memory' (Message 1656128)
Posted 9 days ago by Profile WoodgieProject donor
So previously you were running OK with 4 instances per GPU now?


That's correct, with 2 x GPUs (either the 2 x Titans or 1 x Titan & 1 x 750Ti) all was good, I had 4 CUDA tasks running per GPU (8 GPU CUDA tasks total) and 1 task per CPU core (8 x CPU tasks total).

Now, with 3 x GPUs (1 x Titan & 2 x 750Ti), 4 or 5 of the CPU tasks (it flips a bit) sit there saying 'Waiting for Memory'. I do, however, still have 4 CUDA tasks running per GPU still (12 GPU CUDA tasks total).

Makes sense?

As I said, I'm sure I'm not being efficient with the workload I'm assigning the GPUs/CPUs.
7) Message boards : Number crunching : Just added 3rd GPU and CPU is 'Waiting for Memory' (Message 1656075)
Posted 9 days ago by Profile WoodgieProject donor
The title says it all really but here goes.

I've just added another GPU (a second 750Ti to go with the Titan and other 750Ti) and now my CPU tasks halt occasionally saying 'Waiting for Memory'. The machine has 16GB RAM, I doubt it's that. I'm certain it's this whole concept of 'Leaving a core free to feed the GPU' which I've never understood and never encountered.

So, Oh Wise Ones, time to educate me and help me tune the app_info.xml file to work best!

Here's a brief overview of what I think are the important bits. Please ask for more info if you want, I'm more than happy to give it as I hope threads like this will help others further down the line.

I'm running Lunatics latest (0.43a) and probably have the worst setup imaginable in my app_info file. At the moment it's set to run 4 CUDA tasks per GPU; using 0.04 of a CPU and 0.25 of a GPU

Here's an example snippet, all the CUDA sections are set up like this (Astropulse differs only in that I've set it to 0.33 GPU, the rest is the same)

<app_version> <app_name>setiathome_v7</app_name> <version_num>700</version_num> <platform>windows_intelx86</platform> <plan_class>cuda50</plan_class> <avg_ncpus>0.040000</avg_ncpus> <max_ncpus>0.040000</max_ncpus> <coproc> <type>CUDA</type> <count>0.25</count> </coproc>


I have a feeling it's not as simple as setting the <avg_ncpus> or <max_ncpus> to 1.0, is it?
8) Message boards : Number crunching : Nvidia Titan X (Message 1655746)
Posted 10 days ago by Profile WoodgieProject donor
You have very cool friends, Woodgie :D Also: Happy early birthday!


I do, and thank you.

To be fair, they don't quite understand why I'm looking for ET but I keep telling them there's more to be found in the radio signals than just ET's equivalent of The Archers.
9) Message boards : Number crunching : Nvidia Titan X (Message 1655711)
Posted 10 days ago by Profile WoodgieProject donor
HA!

As I typed the previous reply I got a phone call from a friend. As it's my birthday in a week or so they've clubbed together to buy me "...another GPU to do that science thing you do. You know, the one with the T-Shirt..."

It's 'only' a(other) 750Ti but apparently I "...don't know enough people who care about you to get you the 'X' one, sorry buddy. Anyway, it's not actually out yet, is it?.."

Who's complaining? Not me!
1 x Titan (original spec, I had 2 but one died :-( Well, it HAD been running 24/7 for a year, I'm just waiting for the other to give up the ghost)
2 x 750Ti

ON WITH THE SCIENCE!
10) Message boards : Number crunching : Nvidia Titan X (Message 1655710)
Posted 10 days ago by Profile WoodgieProject donor
X it is then. Awesome, it's cheap!

(relatively speaking and for what it is...)

Cheap? In the US maybe ($999). Over here it's coming in around £900 ($US1,350).
http://www.ebuyer.com/store/Components/cat/Graphics-Cards-Nvidia/subcat/NVIDIA-GTX-TITAN

Here in Australia it's more like AU $2,000 (US $1,555) (pre order at this stage)


As I said; relatively! Compared to the Titan-Z or some variation of the K series (oh, for a K80 or two..!)
11) Message boards : Number crunching : Nvidia Titan X (Message 1654676)
Posted 13 days ago by Profile WoodgieProject donor
X it is then. Awesome, it's cheap!

(relatively speaking and for what it is...)
12) Message boards : Number crunching : Nvidia Titan X (Message 1654670)
Posted 13 days ago by Profile WoodgieProject donor
I watched the product release live and one thing that stood out was when Jen-Hsun Huang slipped in that it's the fastest SINGLE precision GPU ever, for the fastest DOUBLE precision you still need to look at the Titan-Z.

So, oh venerable people who know more than I do, which would SETI@Home benefit more from; the Titan-X with its single precision performance or the Titan-Z with its double precision performance?

Me, I just have the measly old Titan. I did have 2 but then one died. BOOHOO.

(Yes, I'm asking this question because I'm thinking of buying something insane)
13) Message boards : Number crunching : Work not being reported correctly or me not reading things correctly? (Message 1650935)
Posted 23 days ago by Profile WoodgieProject donor
Resetting the project on the host certainly seems to have done the trick. Thanks Claggy.
14) Message boards : Number crunching : Work not being reported correctly or me not reading things correctly? (Message 1650667)
Posted 24 days ago by Profile WoodgieProject donor
Well that's the weird thing, I didn't abandon any work on the host but suddenly a bunch turned up as being marked abandoned.

I thought if a work unit was marked as abandoned the host would report it as being abandoned and the back end would re-assign it to someone else but that the abandoning host wouldn't then crunch it

How might I be crunching abandoned workings then? Do you think abandoning all the workunits and requesting a complete new batch would help matters?

I don't quite want to go there (or re-installing BOINC) yet until I understand what might be causing it in the first place.

The server can mark work abandoned if it receives requests from the client in the wrong order.

Just Reset the project, No need to Detach/Reattach (or Remove/Add project), or Reinstall Boinc.

Claggy


Right y'are guv. Let's give it a whirl.
15) Message boards : Number crunching : Work not being reported correctly or me not reading things correctly? (Message 1650663)
Posted 24 days ago by Profile WoodgieProject donor
For the record, only you can see your host names.

Claggy


Didn't think abut that..


These ones
7511424
7511431
7511446
7511432
7511472
16) Message boards : Number crunching : Work not being reported correctly or me not reading things correctly? (Message 1650662)
Posted 24 days ago by Profile WoodgieProject donor

Half it's work has been set as Abandoned, so it's problably been crunching Abandoned work:

Error tasks for computer 7511430

Reset the project, you'll get resent work back that hasn't already been abandoned.

Claggy


Well that's the weird thing, I didn't abandon any work on the host but suddenly a bunch turned up as being marked abandoned.

I thought if a work unit was marked as abandoned the host would report it as being abandoned and the back end would re-assign it to someone else but that the abandoning host wouldn't then crunch it

How might I be crunching abandoned workings then? Do you think abandoning all the workunits and requesting a complete new batch would help matters?

I don't quite want to go there (or re-installing BOINC) yet until I understand what might be causing it in the first place.
17) Message boards : Number crunching : Work not being reported correctly or me not reading things correctly? (Message 1650656)
Posted 24 days ago by Profile WoodgieProject donor
For the record, only you can see your host names.

Claggy


Didn't think abut that..
18) Message boards : Number crunching : Work not being reported correctly or me not reading things correctly? (Message 1650648)
Posted 24 days ago by Profile WoodgieProject donor
For the record, it was created at the same time as seti-mm-01 to 06 and therefore should have a similar credit of ±9,000.
19) Message boards : Number crunching : Work not being reported correctly or me not reading things correctly? (Message 1650646)
Posted 24 days ago by Profile WoodgieProject donor
My computer seti-mm-05 is acting strangely in reporting work done, I think. I'm a bit confused here.

It's been stuck at 990 credit for over a week now but it's definitely been doing work and reporting.

For example for Task 4007635808, Workunit 1722118309 reported here:
Sun 8 Mar 13:30:49 2015 | SETI@home | Computation for task 14fe13aa.10710.8252.438086664203.12.186_1 finished Sun 8 Mar 13:30:51 2015 | SETI@home | Started upload of 14fe13aa.10710.8252.438086664203.12.186_1_0 Sun 8 Mar 13:30:55 2015 | SETI@home | Finished upload of 14fe13aa.10710.8252.438086664203.12.186_1_0


It shows up in 'Validation pending' now, but soon will disappear from there and... nothing.

I'm not sure how to make sense of this as I don't really know if it's the back end database issues affecting things.

Can anyone see any smoking guns?
20) Message boards : Number crunching : Dual GPU, One Crunching, One Not. (Message 1639263)
Posted 9 Feb 2015 by Profile WoodgieProject donor
Update:

The Good news: Installed the 7870 drive (14.501.1003.0), BOINC is happy now crunching.

The Bad news: Windows now doesn't like the 4870 and have stopped the device (Windows has stopped this device because it has reported problems. (Code 43))

Same error as before.

So the main issue seems to be installing drivers for both GPU's. As Catalysts 14.***** doesn't recognize the 4870.

Hmmmmmmm


(Standard caveat of "I'm IN NO WAY a Windows person")

Code 43 is what I got with mine and my research said it was either a driver issue or a hardware issue (helpful, huh?)

SO I did a bit of slot swapping etc and determined it was indeed a hardware issue. With you though I have to say it looks like a driver issue where the two different versions required for the two different cards trip over each other.

So far not helpful and nothing you didn't already know.

However maybe the VirtualBox version of BOINC might help here. I say MAYBE as I have no idea HOW it is set up to run. My thinking is along the lines that you might be able to set Windows up to use the card you want (4870?) and then 'pass' the 7870 to VirtualBox/BOINC.

This would depend on a great many things which I have no understanding of, like exactly how BOINC uses VirttualBox, is is a full OS (linux?) a cut down OS, is it even set up to understand hardware itself or does it just ask the host OS what's available and trust it?

All of which is an awful lot of 'maybe'.


Next 20

Copyright © 2015 University of California