Optimum Cache Size?

Message boards : Number crunching : Optimum Cache Size?
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Jim Wilkins
Volunteer tester

Send message
Joined: 11 Oct 99
Posts: 70
Credit: 1,658,376
RAC: 0
United States
Message 941200 - Posted: 18 Oct 2009, 21:21:12 UTC

I have 2 medium fast machines. Each one runs around 7- 8 projects. My cache size is set to zero and I always have work, albeit maybe not SETI work. Works for me.

Jim
ID: 941200 · Report as offensive
Profile gizbar
Avatar

Send message
Joined: 7 Jan 01
Posts: 586
Credit: 21,087,774
RAC: 0
United Kingdom
Message 941177 - Posted: 18 Oct 2009, 19:33:49 UTC

I am in awe of the programmers. I dabbled with Basic enough to get my Computer Sciences 'O' level (in UK) over 20 years ago. I know what I want to achieve, but normally can't make it do what I want.

Back on topic, I let Boinc do the managing. If I come across a problem, I try to let enough people know so that a workaround or fix can be engineered. Normally the problem has already been seen or dealt with.

My cache, I believe, is set to 7 days. It's been that long since I checked.

regards, Gizbar.



A proud GPU User Server Donor!
ID: 941177 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 941161 - Posted: 18 Oct 2009, 18:46:55 UTC - in response to Message 940610.  

Figure it out. You are the programmers, not I.......

People say this like programming is easy.

It isn't.

First one has to figure out the exact cause. It could be the RPCs, I haven't done the analysis. Makes sense.

Then, the question is: do you need a new RPC, or do you need a smarter way to use the RPCs you have.

Then it's time to write the code.

Then there is testing. For each hour of coding, 5 to 10 hours of "bench testing" isn't unusual.

If the fix doesn't work, you have to play again.

ID: 941161 · Report as offensive
Profile Keith T.
Volunteer tester
Avatar

Send message
Joined: 23 Aug 99
Posts: 919
Credit: 537,293
RAC: 20
United Kingdom
Message 941152 - Posted: 18 Oct 2009, 18:20:21 UTC
Last modified: 18 Oct 2009, 18:22:01 UTC

My car speedo goes up to 160MPH, should I drive it at that speed?
BOINC allows 10 days cache, should I use it?

I usually drive within, or maybe 5 or 10 over the limit.
I usually run 2 or 3 days cache.
ID: 941152 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 50494
Credit: 1,018,363,574
RAC: 2,276
United States
Message 940775 - Posted: 17 Oct 2009, 13:56:50 UTC - in response to Message 940773.  



But my point was.......

BOINC SHOULD BE ABLE TO HANDLE IT.

The simple fact that the sucker downloaded WAY too much work should not bring the show to it's knees.

There is a problem with Boinc, and I simply found the flaw.


I guess the testers did not do a stress test on a machine with a Cuda card and the cache set on a high setting? Aren't problems like this usually caught in the testing phase of the project's life cycle? The programmers usually code the thing and do some basic testing. Unless they are experts in testing, problems do and will creep through...

It's a basic flaw in the way Boinc works.
Not a simple one-line cockup.
I am not the only one who has documented this.
And will not be the last.
BM has to be fixed. There are far too many RPCs to handle......
It should not have to check every WU every freakin' second to make sure all is well.

"Learn from yesterday. Live for today. Hope for tomorrow." Albert Einstein
"With cats." kittyman

ID: 940775 · Report as offensive
Nick

Send message
Joined: 16 Oct 09
Posts: 81
Credit: 112,909
RAC: 0
Canada
Message 940773 - Posted: 17 Oct 2009, 13:50:46 UTC - in response to Message 940771.  



But my point was.......

BOINC SHOULD BE ABLE TO HANDLE IT.

The simple fact that the sucker downloaded WAY too much work should not bring the show to it's knees.

There is a problem with Boinc, and I simply found the flaw.


I guess the testers did not do a stress test on a machine with a Cuda card and the cache set on a high setting? Aren't problems like this usually caught in the testing phase of the project's life cycle? The programmers usually code the thing and do some basic testing. Unless they are experts in testing, problems do and will creep through...
ID: 940773 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 50494
Credit: 1,018,363,574
RAC: 2,276
United States
Message 940771 - Posted: 17 Oct 2009, 13:44:20 UTC - in response to Message 940650.  

So anybody tell the Boinc devs about their 'task handling' problem?

Or is their only response is to 'not carry such a large cache'


Sorry excuse for such bad manager behaviour.

The sucker DOES crash itself trying to keep track of over 4-5k WUs...............................


It should be bullitproof, dudes.

And don't mind telling my what kind of cache I should carry.......that is NOT the freakin' issue........your manager is.

1000, 2000, 9000....
The manager should be able to handle it without pulling a rope around the nuts of the system it is being hosted on.......this is pure BS.
Figure it out. You are the programmers, not I.......

I only report reality.......and this just bites.


Mark.....

Why are you punishing your machines with an excessive cache?

It has been known for some time that excessive cache size results in Boinc.exe constantly using CPU time creating updates for the client_state.xml file. This draws CPU resources away from the science app and as you say makes the system unstable.

At this time my main machine has an uptime of 2 days and 2 hours. Boinc.exe has used 0:12:58 of cpu time in that same time. This all according to the task manager. I run a 4 day cache and this computer has about 1100 work units in it's cache for both the CPU and GPU.

I tried a 5 day cache and immediately saw CPU time for Boinc.exe go up so I reduced it back to a 4 day cache which I consider optimal. It's long enough to cover a holiday weekend, what else do you need?

Because I CAN........LOL.

No, I did not intend for it to get that out of hand.........at all.

The rescheduler simply tweaked something in Boinc that made it go rouge.
And without me asking it to........it downloaded many thousands of bits of work....which, I might add, it will probably turn around within the alloted timeframe.

But my point was.......

BOINC SHOULD BE ABLE TO HANDLE IT.

The simple fact that the sucker downloaded WAY too much work should not bring the show to it's knees.

There is a problem with Boinc, and I simply found the flaw.

"Learn from yesterday. Live for today. Hope for tomorrow." Albert Einstein
"With cats." kittyman

ID: 940771 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 940737 - Posted: 17 Oct 2009, 10:31:25 UTC - in response to Message 940713.  
Last modified: 17 Oct 2009, 10:47:01 UTC

Hi, since the use of CUDA, it seems logical, IMHO, you up your cache, with a few days.

For over an half year, I used:Maintain enough work for an additional 5 days
Enforced by version 5.10+


It strongly depends on your hardware, that why I added 2 days, to my 3 days, cache. Just have to try, f.i. by starting with 2 days and up it slowly, if you do run out of work.
Once you've set it, let BOINC adapt to it, which takes some time, about a week.
ID: 940737 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6242
Credit: 106,370,077
RAC: 275
Russia
Message 940713 - Posted: 17 Oct 2009, 7:22:24 UTC - in response to Message 940617.  

If ten days cause issues, why can you set for 10 days?

Because they cause issues not for all hosts, only for high-performance ones.
And nobody had time to invent some algorithm to account for this and change upper limit for each host individually.
Each user has own head to see consequencies of his/her actions. 10 days is not default value. If not sure - use default, if change default - think and learn about consequencies.
ID: 940713 · Report as offensive
Profile James Sotherden
Avatar

Send message
Joined: 16 May 99
Posts: 10436
Credit: 110,373,059
RAC: 123
United States
Message 940656 - Posted: 17 Oct 2009, 2:09:54 UTC

I run a 3/4 day cache. I find I can get by most small outages. I run all 3 of my crunchers 24/7 also. If i run out of work theres allways the need to blow the dust bunnies out, and then power ther things down for a rest. Or crunch another project if you desire. My Mac shares seti with milkyway. Thats just my two cents .
[/quote]

Old James
ID: 940656 · Report as offensive
Nick

Send message
Joined: 16 Oct 09
Posts: 81
Credit: 112,909
RAC: 0
Canada
Message 940653 - Posted: 17 Oct 2009, 1:57:06 UTC - in response to Message 940617.  
Last modified: 17 Oct 2009, 2:07:38 UTC

If ten days cause issues, why can you set for 10 days?


If you are running a modest machine without a Cuda gpu, a 10 day cache would give you, say in the case of my laptop with the GPU turned off, about 130 80-credit WUs (with an optimizer app). BOINC can handle this without breaking a sweat.

Now, my E8400 desktop with its mid-range Cuda GPU would load about 900 WUs. Still not too bad, but I'd hate to have that kind of responsibility sitting on my HD in case I have a hardware failure (it does happen).

A super cruncher like Sutaru Tsureku runs: http://setiathome.berkeley.edu/show_host_detail.php?hostid=4789793 would probably give BOINC issues if it was set to 10 days...
ID: 940653 · Report as offensive
Profile Geek@Play
Volunteer tester
Avatar

Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,146,931
RAC: 0
United States
Message 940650 - Posted: 17 Oct 2009, 1:36:58 UTC - in response to Message 940610.  

So anybody tell the Boinc devs about their 'task handling' problem?

Or is their only response is to 'not carry such a large cache'


Sorry excuse for such bad manager behaviour.

The sucker DOES crash itself trying to keep track of over 4-5k WUs...............................


It should be bullitproof, dudes.

And don't mind telling my what kind of cache I should carry.......that is NOT the freakin' issue........your manager is.

1000, 2000, 9000....
The manager should be able to handle it without pulling a rope around the nuts of the system it is being hosted on.......this is pure BS.
Figure it out. You are the programmers, not I.......

I only report reality.......and this just bites.


Mark.....

Why are you punishing your machines with an excessive cache?

It has been known for some time that excessive cache size results in Boinc.exe constantly using CPU time creating updates for the client_state.xml file. This draws CPU resources away from the science app and as you say makes the system unstable.

At this time my main machine has an uptime of 2 days and 2 hours. Boinc.exe has used 0:12:58 of cpu time in that same time. This all according to the task manager. I run a 4 day cache and this computer has about 1100 work units in it's cache for both the CPU and GPU.

I tried a 5 day cache and immediately saw CPU time for Boinc.exe go up so I reduced it back to a 4 day cache which I consider optimal. It's long enough to cover a holiday weekend, what else do you need?

Boinc....Boinc....Boinc....Boinc....
ID: 940650 · Report as offensive
Nick

Send message
Joined: 16 Oct 09
Posts: 81
Credit: 112,909
RAC: 0
Canada
Message 940645 - Posted: 17 Oct 2009, 1:28:12 UTC
Last modified: 17 Oct 2009, 1:29:22 UTC

Holy ET!

Came back after Smallville and the thread has lots of great input. Decided to go with 3 days since my GTS250 is crunching away at over 4 times the speed of my E8400.

Thanks everyone for the input...
ID: 940645 · Report as offensive
Dena Wiltsie
Volunteer tester

Send message
Joined: 19 Apr 01
Posts: 1626
Credit: 24,230,968
RAC: 59
United States
Message 940623 - Posted: 17 Oct 2009, 0:28:30 UTC

Setting ten days would not cause me problems because I could never do more than 48 a day and most of the time it would be far less. It's the people with supper crunchers that have the problems.
ID: 940623 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 12988
Credit: 208,696,464
RAC: 690
Australia
Message 940624 - Posted: 17 Oct 2009, 0:28:30 UTC - in response to Message 940610.  

So anybody tell the Boinc devs about their 'task handling' problem?

They know of it (along with many other issues).
The easiest fix would be to reduce the maximum cache size people can select. Problem solved. Is that what you want?

From posts in other threads, they are working on a work around, but as it involves some significant changes in how things are done it's not an easy problem to resolve without causing other problems in the process.


The manager should be able to handle it without pulling a rope around the nuts of the system it is being hosted on.......this is pure BS.
Figure it out. You are the programmers, not I.......

Take a few deep breaths.
Back in the early days of computing, total memory was less than a few kB. Hundreds of kB just wasn't possible to imagine. Let alone MBs or GB of system RAM.
Likewise the thought that 10 days work of work would involve 1,000s of work units & not just a few hundreds just didn't enter in to any calculations untill the GPU client came along.
Grant
Darwin NT
ID: 940624 · Report as offensive
Profile j mercer
Avatar

Send message
Joined: 3 Jun 99
Posts: 2408
Credit: 12,323,733
RAC: 1
United States
Message 940617 - Posted: 17 Oct 2009, 0:20:22 UTC

If ten days cause issues, why can you set for 10 days?
...
ID: 940617 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 50494
Credit: 1,018,363,574
RAC: 2,276
United States
Message 940610 - Posted: 17 Oct 2009, 0:05:52 UTC

So anybody tell the Boinc devs about their 'task handling' problem?

Or is their only response is to 'not carry such a large cache'


Sorry excuse for such bad manager behaviour.

The sucker DOES crash itself trying to keep track of over 4-5k WUs...............................


It should be bullitproof, dudes.

And don't mind telling my what kind of cache I should carry.......that is NOT the freakin' issue........your manager is.

1000, 2000, 9000....
The manager should be able to handle it without pulling a rope around the nuts of the system it is being hosted on.......this is pure BS.
Figure it out. You are the programmers, not I.......

I only report reality.......and this just bites.
"Learn from yesterday. Live for today. Hope for tomorrow." Albert Einstein
"With cats." kittyman

ID: 940610 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 12
Germany
Message 940607 - Posted: 16 Oct 2009, 23:55:30 UTC


From my experiences.. 1,500+ tasks and you have one CPU-Core only for boinc.exe (BOINC client).
Windows TaskManager.

This is a ~ 3 day WU cache on my GPU cruncher.

ID: 940607 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 50494
Credit: 1,018,363,574
RAC: 2,276
United States
Message 940605 - Posted: 16 Oct 2009, 23:48:14 UTC
Last modified: 16 Oct 2009, 23:57:14 UTC

Oh Jesus.......let me tell you about this....
Boinc just sux at handling massive caches.
My top rig just went through a phase.....
Used the 'rescheduler' one too many times, and Boinc went Boinczerk......

It tried to download over 4000 tasks in a couple of days......and then hellstruck.

There was so much boinc.exe traffic that it could hardly even manage freakin' comms........got errors on almost every try it made.....so it could hardly clear things.......90% of CPU time was just Boinc.exe.........no real crunching getting done.........nasty. Had thousands of WUs it was trying to upload and download........and couldn't get a word in edgewize against the boinc.exe that was taking full control of the cpu.

I fianlly set the master control (took me a loooooooong time to manage to even get there.......) to no new tasks........and it took another 2 days to download, 1 by 1, the tasks it had already been assigned by the scheduler.

Very nasty situation.......

If you go over 4k tasks, you risk meltdown............
BabyK had over 8000........it is now less than 6000, and things are starting to flow normally again.
So I would say anything more the 4k Wus .......you might be in trouble.
"Learn from yesterday. Live for today. Hope for tomorrow." Albert Einstein
"With cats." kittyman

ID: 940605 · Report as offensive
Profile Pappa
Volunteer tester
Avatar

Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 940599 - Posted: 16 Oct 2009, 23:40:44 UTC

Welcome

Before anyone tells you to set 10 Days or your Cuda cards will starve. There are some things to know.

With the more recent versions of Boinc you can set a low connect to (how often the machine phones home in your preferences) and in a file called "cc_config.xml" you can tell it to maintain work for xx.xx days. If you just set it for xx.xx days then it cause other problems (Pending Credits, When it actually reqeuts work and more). So a low connect time coupled with the caching effect gets past a few obsticales. Your computer asks for work when it needs it.

So with the "newer versions" of Boinc it is actually attmepting to request what is needed for the hardware resource.

Regards




Please consider a Donation to the Seti Project.

ID: 940599 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : Optimum Cache Size?


 
©2020 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.