Boinc 6.6.36 behaviour (Pictures enclosed)

Message boards : Number crunching : Boinc 6.6.36 behaviour (Pictures enclosed)
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile -= Vyper =-
Volunteer tester
Avatar

Send message
Joined: 5 Sep 99
Posts: 1652
Credit: 1,065,191,981
RAC: 2,537
Sweden
Message 917137 - Posted: 12 Jul 2009, 13:18:00 UTC
Last modified: 12 Jul 2009, 13:20:03 UTC

Hi there

Just want to share my experiences of Boinc versions later than 6.4.7 and running Cuda when you need a large queue to keep up with demand.

Boinc edf cannot keep track of that amount of work and thus starts to pause WU's and resume and bork systemram and swap.

I disclosed some images and they will tell you more than i need to write here.

If i uninstall and revert to older build 6.4.7 it's fine..

This is actually old news because this has been written before but i wanted to post what's happening then you get the idea of why Cuda wu's got out of memory and reverts to cpu processing :)





Kind regards Vyper

_________________________________________________________________________
Addicted to SETI crunching!
Founder of GPU Users Group
ID: 917137 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 917138 - Posted: 12 Jul 2009, 13:27:12 UTC
Last modified: 12 Jul 2009, 13:49:52 UTC

Funnily enough, I just switched both my machines to the 6.6.37 variant in testing, which apparently has modifications designed to address this issue in particular. I Only just upgraded, but no apparent repeat of the issue (so far). 'Waiting to Run' tasks are created, but there's no corresponding tasks manager processes for those.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 917138 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 917144 - Posted: 12 Jul 2009, 13:49:53 UTC

I've been running this version for over 5 days now, no evidence of CPU fallback at all,
GPU tasks do go into EDF and some times complete fully, some times only partially,
which is a lot better doing than any CPU fallback (when your GPU isn't doing anything)

Claggy
ID: 917144 · Report as offensive
Profile TCP JESUS
Avatar

Send message
Joined: 19 Jan 03
Posts: 205
Credit: 1,248,845
RAC: 0
Canada
Message 917163 - Posted: 12 Jul 2009, 15:13:20 UTC

I have the same problem (although not as bad) on my main cruncher. I just restart BOINC atleast once or twice a day and all is well (till the problem gets addressed in a future release of BOINC that is).
I am TCP JESUS...The Carpenter Phenom Jesus....and HAMMERING is what I do best!
formerly known as...MC Hammer.
ID: 917163 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 917194 - Posted: 12 Jul 2009, 17:32:12 UTC
Last modified: 12 Jul 2009, 17:34:47 UTC


I have the same 'prob' Vyper.. :-(


BOINC V6.6.37 is out for testing.. it 'should' be resolved in this V..

But.. to now I didn't made the test.


BTW.
If some CUDA WUs would go in EDF and stop and start more and more other CUDA WUs.. everytime a new CUDA WU start, the crunched time of the stopped WU is for nothing, or?
Because the CUDA WUs don't stay in system RAM (that's nice), but the half calculated WU will be saved somewhere (checkpoint) ?

ID: 917194 · Report as offensive
Profile mimo
Volunteer tester
Avatar

Send message
Joined: 7 Feb 03
Posts: 92
Credit: 14,957,404
RAC: 0
Slovakia
Message 917235 - Posted: 12 Jul 2009, 20:54:26 UTC

it is known error that is removed in 6.6.37:
- client: when suspending a GPU job, always remove it from memory, even if it hasn't checkpointed. Otherwise we'll typically run another GPU job right away, and it will bomb out or revert to CPU mode because it can't allocate video RAM


ID: 917235 · Report as offensive
Profile -= Vyper =-
Volunteer tester
Avatar

Send message
Joined: 5 Sep 99
Posts: 1652
Credit: 1,065,191,981
RAC: 2,537
Sweden
Message 917237 - Posted: 12 Jul 2009, 21:06:05 UTC - in response to Message 917235.  

it is known error that is removed in 6.6.37:
- client: when suspending a GPU job, always remove it from memory, even if it hasn't checkpointed. Otherwise we'll typically run another GPU job right away, and it will bomb out or revert to CPU mode because it can't allocate video RAM


Nice, but the EDF shuffling would still occur for what it seems.
But it's better than nothing so it wouldn't hog ram when gpu reordering.

//Vyper

_________________________________________________________________________
Addicted to SETI crunching!
Founder of GPU Users Group
ID: 917237 · Report as offensive

Message boards : Number crunching : Boinc 6.6.36 behaviour (Pictures enclosed)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.