More AP WU's please!


log in

Advanced search

Message boards : Number crunching : More AP WU's please!

1 · 2 · Next
Author Message
JAMC
Volunteer tester
Send message
Joined: 2 Jan 07
Posts: 71
Credit: 9,506,431
RAC: 0
Message 897261 - Posted: 20 May 2009, 15:32:28 UTC

Any idea on when we'll be getting more AP work to crunch?

Profile Pappa
Volunteer tester
Avatar
Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 897280 - Posted: 20 May 2009, 16:08:14 UTC
Last modified: 20 May 2009, 16:09:07 UTC

This is interesting. When the Beta Group got AP ready for Seti Main, Seti could not get anyone to Crunch AP. If I look at the "Server Status Page" we see AP all but complete (with trickles and resends) and tons of images (134 currently) trying to split/complete MultiBeam WU's

Joe describes a part of the process in this post/thread Astropulse Beam 3 Polarity 1 Errors

So we end up with a delima, some users have entirely focused on AP for various reasons. MultiBeam has fallen behind. To an extent while Tape Images are being retrieved from off site storage; Matt's Tech post Countdown (May 18 2009). With 134 images loaded locally and as Joe states ~220000 MB WU's/image it tends to indicate that MultiBeam is roughly ~29480000 WU's behind. My feeling would be the users should focus on eating as many of the MultiBeam WU's as they can.

Regards
____________
Please consider a Donation to the Seti Project.

Profile Pappa
Volunteer tester
Avatar
Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 897310 - Posted: 20 May 2009, 16:48:45 UTC - in response to Message 897291.

If only it were that easy... :)

This is interesting. When the Beta Group got AP ready for Seti Main, Seti could not get anyone to Crunch AP. If I look at the "Server Status Page" we see AP all but complete (with trickles and resends) and tons of images (134 currently) trying to split/complete MultiBeam WU's

Joe describes a part of the process in this post/thread Astropulse Beam 3 Polarity 1 Errors

So we end up with a delima, some users have entirely focused on AP for various reasons. MultiBeam has fallen behind. To an extent while Tape Images are being retrieved from off site storage; Matt's Tech post Countdown (May 18 2009). With 134 images loaded locally and as Joe states ~220000 MB WU's/image it tends to indicate that MultiBeam is roughly ~29480000 WU's behind. My feeling would be the users should focus on eating as many of the MultiBeam WU's as they can.

Regards

Mebbe us AP folks figured the Cuda crowd was gonna gobble up all that MB work....LOL.


____________
Please consider a Donation to the Seti Project.

Profile Tw34k3d
Avatar
Send message
Joined: 18 May 99
Posts: 342
Credit: 85,846,621
RAC: 136
United States
Message 897312 - Posted: 20 May 2009, 16:49:31 UTC - in response to Message 897291.
Last modified: 20 May 2009, 16:50:03 UTC

I'm doing my part! :)

I crunch both. Prefer AP for the CPUs but a quick survey shows a good deal of 603's waiting in addition to my 608's done by CUDA.

Rob
____________

JAMC
Volunteer tester
Send message
Joined: 2 Jan 07
Posts: 71
Credit: 9,506,431
RAC: 0
Message 897316 - Posted: 20 May 2009, 16:53:57 UTC - in response to Message 897280.
Last modified: 20 May 2009, 16:54:18 UTC

So we end up with a delima, some users have entirely focused on AP for various reasons. MultiBeam has fallen behind. To an extent while Tape Images are being retrieved from off site storage; Matt's Tech post Countdown (May 18 2009). With 134 images loaded locally and as Joe states ~220000 MB WU's/image it tends to indicate that MultiBeam is roughly ~29480000 WU's behind. My feeling would be the users should focus on eating as many of the MultiBeam WU's as they can.

Regards


Hey up to a couple of weeks ago I had done nothing but MB... tried CUDA and dropped it just as fast...

Profile Pappa
Volunteer tester
Avatar
Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 897318 - Posted: 20 May 2009, 17:14:10 UTC - in response to Message 897312.

I'm doing my part! :)

I crunch both. Prefer AP for the CPUs but a quick survey shows a good deal of 603's waiting in addition to my 608's done by CUDA.

Rob

With MultiBeam is roughly ~29480000 WU's behind, that takes us down to ~29475000 LOL

____________
Please consider a Donation to the Seti Project.

Profile perryjay
Volunteer tester
Avatar
Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 15,483,865
RAC: 11,289
United States
Message 897319 - Posted: 20 May 2009, 17:16:55 UTC - in response to Message 897316.

Looks like I'll be helping out with the MBs here shortly. Right now I'm running two APs and a Cuda. After these two I have one more AP. Today the server has been feeding me 6.08s for my cuda and 6.03s for my CPUs. It's been awhile since I've seen this. The server has been just feeding me APs along with my Cudas
____________


PROUD MEMBER OF Team Starfire World BOINC

Profile Pappa
Volunteer tester
Avatar
Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 897324 - Posted: 20 May 2009, 17:34:05 UTC - in response to Message 897316.

So we end up with a delima, some users have entirely focused on AP for various reasons. MultiBeam has fallen behind. To an extent while Tape Images are being retrieved from off site storage; Matt's Tech post Countdown (May 18 2009). With 134 images loaded locally and as Joe states ~220000 MB WU's/image it tends to indicate that MultiBeam is roughly ~29480000 WU's behind. My feeling would be the users should focus on eating as many of the MultiBeam WU's as they can.

Regards


Hey up to a couple of weeks ago I had done nothing but MB... tried CUDA and dropped it just as fast...


Depending on the computer and video card, Cuda is not for everyone.
Not counting some the issues getting it working properly. The generally recommend version of Boinc is 6.6.20 (has its own story) 182.5 drivers and when the Server recognizes it, it "should" send Cuda. Because of how the Server is configured, it usually tells Boinc that the Cuda Application should use x.xxx percent of the CPU and as much of the Video card as it can grab. This sometimes overpowers the Video Card. What you end up with is something "odd" that may seem unworkable. In the case of the Optimized Cuda, it tells Boinc and the Cuda Application to back off a bit in the use of the CPU and uses the video card more effectively. Then you end up with something that is "workable" or a reason to disable Cuda. No Cuda is not for everyone.

____________
Please consider a Donation to the Seti Project.

Chelski
Avatar
Send message
Joined: 3 Jan 00
Posts: 121
Credit: 8,824,102
RAC: 613
Malaysia
Message 897631 - Posted: 21 May 2009, 3:50:12 UTC
Last modified: 21 May 2009, 3:50:38 UTC

Once upon a time there was a flock of ugly ducklings called APs. And there was many calls from the masses that AP be dropped or disabled by default. Almost everyone wanted the more of ducks called MBs which are especially delicious served with CUDA sauce.

Then someone found out that APs are prettier than swans - they are geese that lays golden eggs. And then were hunted to almost extinction and joined Spock as members of endangered species.

Actually the MB splitting have always lagged behind APs even, but never this acutely.

The file size difference between MB and AP is almost 22:1 - for each tape we can get about 22x in number of MB WU vs AP

I don't think the actual crunch rate is this high in average (of course, if we sample now, it could be higher because we are starving the project of AP WUs). And overall we need to get the ratio to be bigger than 22:1 to start correcting the backlog. A back of envelope estimate of my cuda machine does 80 MBs per day on CUDA and 4 APs per day (20:1) ratio, so yes, I think I'm part of the problem. Will be interesting if some of the crunchers with Dreadnought class CUDA machines if they are doing much better.
____________

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2264
Credit: 8,657,670
RAC: 4,243
United States
Message 897641 - Posted: 21 May 2009, 4:59:16 UTC - in response to Message 897280.
Last modified: 21 May 2009, 4:59:59 UTC

This is interesting. When the Beta Group got AP ready for Seti Main, Seti could not get anyone to Crunch AP. If I look at the "Server Status Page" we see AP all but complete (with trickles and resends) and tons of images (134 currently) trying to split/complete MultiBeam WU's

Joe describes a part of the process in this post/thread Astropulse Beam 3 Polarity 1 Errors

So we end up with a delima, some users have entirely focused on AP for various reasons. MultiBeam has fallen behind. To an extent while Tape Images are being retrieved from off site storage; Matt's Tech post Countdown (May 18 2009). With 134 images loaded locally and as Joe states ~220000 MB WU's/image it tends to indicate that MultiBeam is roughly ~29480000 WU's behind. My feeling would be the users should focus on eating as many of the MultiBeam WU's as they can.

Regards

As I have stated several times over the past few months, I would love to be able to crunch MB and "help the problem", however, when things are running normally, the scheduler will not send me MBs if I have AP selected as an allowed application. From the testing I have done, it appears that part of the scheduler logic is something like:

if (CUDA == yes)
give MB;
else
give AP;

I don't know the validity to that statement, but that's what it seems like. If I go to an MB-only venue, I obviously get nothing but MBs.

Before the APs ran out, there were times where the ready-to-send queue was running low and I requested work, and the messages tab said "no jobs available", but I switched to the MB venue before the next work fetch retry and got 50 MBs in short order.

So the bottom line is, there is still some scheduler logic that seems to only want to give MB out to the CUDA folks. This may need some deeper analysis, but I've done all I can do from the client-side, of course.

However, now that APs are pretty hard to come by, the scheduler does go ahead and give me some MBs, but I'm sure it really does not want to.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Profile Geek@PlayProject donor
Volunteer tester
Avatar
Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,098,804
RAC: 22,997
United States
Message 897650 - Posted: 21 May 2009, 5:18:31 UTC - in response to Message 897641.
Last modified: 21 May 2009, 5:19:24 UTC

The server logic you stated as............

if (CUDA == yes)
give MB;
else
give AP;

May be true when Boinc 6.2.19 is making the request.

I know for a fact that Boinc version 6.6.28, the current recommended version, will make seperate work requests for MB and AP. Thus if MB is requested and available it will be issued to you. If AP is requested and available it will be issued to you.

Perhaps you should consider upgrading to the current version of Boinc. Should upgrade fine without destroying the work you have on hand.
____________
Boinc....Boinc....Boinc....Boinc....

Fred W
Volunteer tester
Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 897661 - Posted: 21 May 2009, 5:53:02 UTC - in response to Message 897650.

...
I know for a fact that Boinc version 6.6.28, the current recommended version, will make seperate work requests for MB and AP. ...

Not sure this is entirely accurate. 6.6.28 makes separate work requests for CUDA and non-CUDA. The non-CUDA work requests should be filled with a (20 / 1 ?) mix of MB / AP but always seem to be AP under "normal" running conditions. Not that I'm complaining - just clarifying the facts as I see them.

F.
____________

Profile Virtual Boss*
Volunteer tester
Avatar
Send message
Joined: 4 May 08
Posts: 417
Credit: 6,178,370
RAC: 33
Australia
Message 897708 - Posted: 21 May 2009, 10:31:53 UTC

Just wondering out loud.

Are all the older "tapes" that were split for MB before AP started still in existence, or were they deleted?

And if they are still around, can they be re-split for AP only?

Profile tullioProject donor
Send message
Joined: 9 Apr 04
Posts: 3708
Credit: 378,585
RAC: 571
Italy
Message 897710 - Posted: 21 May 2009, 10:47:56 UTC

I am using SuSE Linux 10.3 and its firewall, no antivirus. Recently I noticed attempts to connect to my box using ssh and I killed the sshd daemon.
Tullio
____________

Josef W. SegurProject donor
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4241
Credit: 1,046,990
RAC: 302
United States
Message 897878 - Posted: 21 May 2009, 20:44:27 UTC - in response to Message 897631.
Last modified: 21 May 2009, 20:52:36 UTC

Chelski wrote:
...
The file size difference between MB and AP is almost 22:1 - for each tape we can get about 22x in number of MB WU vs AP


Two factors you missed make the ratio very close to 40x. Although the mb_splitters make groups of WUs with 107.37 second duration they only advance about 85.7 seconds for the next group; the time overlap of about 20 seconds ensures we don't miss signals at the seams between WUs. AP WUs don't have any overlap, for short single pulses it wouldn't add any significant value. Second, data is packed 6 bits per byte in MB (plus there's a newline after each 64 data bytes), but full 8 bits per byte in AP.

I don't think the actual crunch rate is this high in average (of course, if we sample now, it could be higher because we are starving the project of AP WUs). And overall we need to get the ratio to be bigger than 22:1 to start correcting the backlog. A back of envelope estimate of my cuda machine does 80 MBs per day on CUDA and 4 APs per day (20:1) ratio, so yes, I think I'm part of the problem. Will be interesting if some of the crunchers with Dreadnought class CUDA machines if they are doing much better.

The only sensible thing for the project is to adjust work delivery so a given set of 'tapes' is handled at nearly the same rate by both sets of splitters. If everything were working as it should, they'd deliver 40 MB tasks for every AP task. That would take 3 slots for AP_v5 and 96 for MB in the 100 shared memory slots between the Feeder and Scheduler. (and 1 for any original AP reissues) But there are more tasks needed to get an AP_v5 WU finished, so 5 slots for AP_v5 and 94 for MB would allow for the needed reissues. I'd expect the AP_v5 reissue rate to drop, then its number of slots could be further reduced.

Note that having 94 or 96 slots for MB should mean fairly rare cases where the MB slots get emptied between Feeder cycles. The delivery rate for MB should go up to near what it was before AP was released.

Edit: If it seems I'm advocating fewer AP WUs, that isn't really so. If the MB delivery had been keeping up, there wouldn't be 137 'tapes' waiting to be done by the mb_splitters and recalling data from NERSC HPSS would have started sooner. Work delivery should have remained reasonably stable for both MB and AP.
Joe

Chelski
Avatar
Send message
Joined: 3 Jan 00
Posts: 121
Credit: 8,824,102
RAC: 613
Malaysia
Message 898113 - Posted: 22 May 2009, 6:27:53 UTC - in response to Message 897878.
Last modified: 22 May 2009, 6:28:59 UTC

Joe wrote:
Two factors you missed make the ratio very close to 40x. Although the mb_splitters make groups of WUs with 107.37 second duration they only advance about 85.7 seconds for the next group; the time overlap of about 20 seconds ensures we don't miss signals at the seams between WUs. AP WUs don't have any overlap, for short single pulses it wouldn't add any significant value. Second, data is packed 6 bits per byte in MB (plus there's a newline after each 64 data bytes), but full 8 bits per byte in AP.

Thanks, I didn't know the overlap in MB and the increased packing density of AP,so used the direct filesize comparison which is obviously underestimated the severity of the problem.

Joe wrote:
Edit: If it seems I'm advocating fewer AP WUs, that isn't really so. If the MB delivery had been keeping up, there wouldn't be 137 'tapes' waiting to be done by the mb_splitters and recalling data from NERSC HPSS would have started sooner. Work delivery should have remained reasonably stable for both MB and AP.

Agreed, this will not reduce the AP WUs since each tape will still give the fixed number of AP WUs and a fixed number of MB WUs. At least this workaround will correct the imbalance and the effect will be that clients that does not specify limitation to one kind of work will receive MBs and APs in the correct ratio.

A quick gander at the last hour's results returned was only 30:1 MB:AP, so we are still a long way to go in bringing balance back to the Force.

Maybe controversially inflate the MB credits for a limited duration (e.g. putting CPU MB and CUDA MB at parity for a start) so that those people who bothers to specify preferences for the CPU cruncing will help clear some of the MB backlog?
____________

1 · 2 · Next

Message boards : Number crunching : More AP WU's please!

Copyright © 2014 University of California