Status / Credit (Nov 03 2010)


log in

Advanced search

Message boards : Technical News : Status / Credit (Nov 03 2010)

Previous · 1 · 2 · 3 · 4 · 5 · Next
Author Message
Profile APCyberax
Volunteer tester
Send message
Joined: 6 Jun 01
Posts: 29
Credit: 2,000,348
RAC: 0
United Kingdom
Message 1046030 - Posted: 4 Nov 2010, 17:33:32 UTC - in response to Message 1046017.

Yes Pics

and lets same some tech porn to.

Maybe a nice advanced status page. CPU graphs and Disk IO/network usage.

Give us a picture of just how much work you get though them servers. might make the people can are worried about downtime think twice :)


____________

Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1389
Credit: 74,079
RAC: 0
United States
Message 1046035 - Posted: 4 Nov 2010, 17:56:39 UTC

Maybe there's some confusion about this credit granting process.

By cleanup I wasn't referring to any corruption. In fact, the data in the databases is/are as clean as ever, it's just that both the mysql (boinc) and informix (science) databases are too darn big. This mega-outage is to get new servers on line for both that can handle the job. Before we can do that, we'd like to reduce the size of the databases to make the transitions easier. Of course there's some gunk in the databases pertaining to things like ghost wu's, orphaned results or signals, etc. Nothing new - but when the databases are quiet and vastly reduced in size this is a golden opportunity to analyse and understand these various long-standing issues - that's what I'm talking about cleanup.

Meanwhile, we've been dealing with all kinds of long planned and unplanned outages, causing users much grief with timeouts. This credit granting procedure ensures that people are getting the credit they deserve for work done without having to wait for wingmen who never show up. And we're not throwing out any science.

By the way, download servers are starting up so we're starting to clear out *that* queue.

- Matt


____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Profile soft^spirit
Avatar
Send message
Joined: 18 May 99
Posts: 6374
Credit: 28,631,059
RAC: 3
United States
Message 1046037 - Posted: 4 Nov 2010, 18:00:00 UTC - in response to Message 1046035.

Maybe there's some confusion about this credit granting process.

By cleanup I wasn't referring to any corruption. In fact, the data in the databases is/are as clean as ever, it's just that both the mysql (boinc) and informix (science) databases are too darn big. This mega-outage is to get new servers on line for both that can handle the job. Before we can do that, we'd like to reduce the size of the databases to make the transitions easier. Of course there's some gunk in the databases pertaining to things like ghost wu's, orphaned results or signals, etc. Nothing new - but when the databases are quiet and vastly reduced in size this is a golden opportunity to analyse and understand these various long-standing issues - that's what I'm talking about cleanup.

Meanwhile, we've been dealing with all kinds of long planned and unplanned outages, causing users much grief with timeouts. This credit granting procedure ensures that people are getting the credit they deserve for work done without having to wait for wingmen who never show up. And we're not throwing out any science.

By the way, download servers are starting up so we're starting to clear out *that* queue.

- Matt



YAY!! tight limits should give fast turn around.. and AWESOME!!
____________

Janice

Swibby Bear
Send message
Joined: 1 Aug 01
Posts: 236
Credit: 7,276,504
RAC: 24
United States
Message 1046048 - Posted: 4 Nov 2010, 18:51:21 UTC

Matt --

If only one download server were running, wouldn't that give a more efficient use of the internet connection and reduce or eliminate the dropped connections? Overall throughput might actually improve? And it might eliminate the ghost creations.

Whit

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8551
Credit: 50,446,850
RAC: 51,254
United Kingdom
Message 1046061 - Posted: 4 Nov 2010, 19:57:24 UTC

I've got my first allocations since the scheduler was switched back on, and I see:

1) The tasks I'm getting are all _0 and _1 - i.e. new work, we're not down to the resends yet.
2) A very high proportion of them are 'shorties'

That's hardly the most restful convalescence for poor Jocelyn, Anakin, and Vader - but I suppose it just represents the tapes the splitters were working on just before the last shutdown.

There's probably AP work in the mix as well. Hardly surprising that the download link went straight to 93 Mbit/s and has stayed there ever since.

To avoid creating yet more ghosts, might it be a good idea to periodically pause the 'new work allocation' side of the scheduler - now you have the tool to do that - while still allowing reporting? That would allow the download rate for allocated work to decay for a while, thus both allowing uploads/reporting to flow freely, and minimise the risk of work being trapped in limbo should, heaven forfend, the sudden rush of work trigger another nervous breakdown server-side.

zoom314Project donor
Avatar
Send message
Joined: 30 Nov 03
Posts: 46397
Credit: 36,744,870
RAC: 4,808
United States
Message 1046062 - Posted: 4 Nov 2010, 20:11:18 UTC - in response to Message 1046017.

PS I do expect pics of the new servers though:)


What? I'm gonna insist upon it! :-))

Ditto, Asahp please.
____________
My Facebook, War Commander, 2015

Profile Ingrid Brouwer
Avatar
Send message
Joined: 8 Jan 00
Posts: 6
Credit: 928,583
RAC: 580
Netherlands
Message 1046083 - Posted: 4 Nov 2010, 21:20:50 UTC - in response to Message 1046005.

<<<SNIP>>> PS I do expect pics of the new servers though:)

I agree LOL Would love to see them, in action preferably ;-D
[/img]
____________
Life is not measured by the number of breaths we take, but by the
moments that take our breath away.

Mike.Gibson
Send message
Joined: 13 Oct 07
Posts: 34
Credit: 192,696
RAC: 31
United Kingdom
Message 1046140 - Posted: 5 Nov 2010, 1:36:25 UTC

I think the priority of most crunchers is not the receipt of credits. It is to prevent the waste of crunching time that results from WUs being resent when the primary crunching has been held up by an outage.

There should be some sort of moratorium, say for 24 hours, after an outage to give enough time for completed work to be reported.

During that time only new work should be sent out, or secondaries where expiration was before the start of the latest outage.

Mike

Profile RottenMutt
Avatar
Send message
Joined: 15 Mar 01
Posts: 992
Credit: 207,654,737
RAC: 0
United States
Message 1046141 - Posted: 5 Nov 2010, 1:40:00 UTC - in response to Message 1046048.

Matt --

If only one download server were running, wouldn't that give a more efficient use of the internet connection and reduce or eliminate the dropped connections? Overall throughput might actually improve? And it might eliminate the ghost creations.

Whit


downloads seem to work better when both are up, observed.

____________

Profile RottenMutt
Avatar
Send message
Joined: 15 Mar 01
Posts: 992
Credit: 207,654,737
RAC: 0
United States
Message 1046145 - Posted: 5 Nov 2010, 1:43:24 UTC
Last modified: 5 Nov 2010, 1:45:11 UTC

yes much better with two.

Matt has been enabling and disabling one of the dl servers if ya haven't noticed.
____________

Profile RottenMutt
Avatar
Send message
Joined: 15 Mar 01
Posts: 992
Credit: 207,654,737
RAC: 0
United States
Message 1046155 - Posted: 5 Nov 2010, 2:28:39 UTC

dl's work better right after the second server is enabled, then comes to a crawl.
____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5833
Credit: 59,482,932
RAC: 48,056
Australia
Message 1046203 - Posted: 5 Nov 2010, 9:39:22 UTC - in response to Message 1046061.
Last modified: 5 Nov 2010, 9:39:56 UTC

I've got my first allocations since the scheduler was switched back on, and I see:

1) The tasks I'm getting are all _0 and _1 - i.e. new work, we're not down to the resends yet.
2) A very high proportion of them are 'shorties'

My first batch of new work was all initial issue VLARs, after a couple of them had been done i got one new shortie.
Since then all of my new work has been VLAR re-issues, not a shortie in sight.
____________
Grant
Darwin NT.

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3422
Credit: 46,798,097
RAC: 19,914
Russia
Message 1046204 - Posted: 5 Nov 2010, 9:55:03 UTC - in response to Message 1045930.
Last modified: 5 Nov 2010, 9:55:41 UTC

Does it make people feel good to get credit for work which probably now has no chance of contributing to the science? Has the project underestimated the quality of the participants?

To my mind, getting credit without a corresponding increase in consecutive valid is much like having a rotten tomato thrown at me versus having a fresh vine ripened tomato handed to me. It was probably inevitable that the project would need to cancel some work to make the transition to the new servers cleaner, but my vote if we had been asked would have favored no credit for unresolved work.

I bring this up to get others' opinions, in the hope that "I may have to do this granting again once this first round of cleanup is over." will be handled differently.
Joe

+1
with arbitrary credit granting (and giving credits for nothing is arbitrary) they lost their last (very small already) meaning.
Better leave credits in auto state and work on database cleanup, I'm sure there is enough work to do :P

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8551
Credit: 50,446,850
RAC: 51,254
United Kingdom
Message 1046212 - Posted: 5 Nov 2010, 11:10:22 UTC - in response to Message 1046203.

I've got my first allocations since the scheduler was switched back on, and I see:

1) The tasks I'm getting are all _0 and _1 - i.e. new work, we're not down to the resends yet.
2) A very high proportion of them are 'shorties'

My first batch of new work was all initial issue VLARs, after a couple of them had been done i got one new shortie.
Since then all of my new work has been VLAR re-issues, not a shortie in sight.

Yes, the newly-split work didn't last long, and it's been nothing but resent ghosts or other fallout ever since. Which is exactly what we needed to happen. I haven't seen any VLARs here, but I wouldn't expect to - I've selected not to request CPU work, so the CPUs work on other projects while I run CUDA.

Observation: the backlog of results 'ready to send' ran out, for both MB and AP, around 10:00 UTC, or a few minutes before. There's still a trickle, but effectively no new downloads will have been added to the download server queue since then.

The download pipe remained saturated at 93 Mbit/sec until about 10:50 UTC, the best part of an hour later. That gives us some idea of the relative speeds of the various project components: I think we need to use the remaining time before the new servers arrive to work out some way of helping the scheduler request/reply messages get through (and hence avoid creating a new set of ghosts), while Oscar and friend keep the pipe full.

Profile Fred J. Verster
Volunteer tester
Avatar
Send message
Joined: 21 Apr 04
Posts: 3248
Credit: 31,807,137
RAC: 3,335
Netherlands
Message 1046231 - Posted: 5 Nov 2010, 14:01:51 UTC - in response to Message 1046212.
Last modified: 5 Nov 2010, 14:53:36 UTC

In the past hours, starting probably late last night, local time (UTC+1), all SETI BĂȘta WU's are UPLoaded are new ones were D'lOaded. All 3 QUADS + CUDA are running SETI MB WU's.

Can't see, Result Page Option is turned off, deminiching SERVER load.
Haven't changed any setting in BOINC, since a few month. Most of the time it wasn't possible at all.....:)

But I do hope WU's are validated and granted credit, if a Canonnical Result is made.
(I think, giving Credit for work not done, isn't seving science, so why do it.)

And I agree with Raistmer, as to the 'value' of Credits, difference between projects are out of balance, I think.

About both D'Load Servers, turned on (As of 5 Nov 2010 14:30:05 UTC), all D'Loads are through almost immediatly, quite difference from last days or weeks.
____________

Profile platium
Avatar
Send message
Joined: 5 Jul 10
Posts: 212
Credit: 262,426
RAC: 0
United Kingdom
Message 1046240 - Posted: 5 Nov 2010, 14:59:24 UTC

i thought we were here to donate processer time, credits are interesting but not what we are here for, or am i alone in thinking the S.e.t.i programme is what counts

Profile Fred J. Verster
Volunteer tester
Avatar
Send message
Joined: 21 Apr 04
Posts: 3248
Credit: 31,807,137
RAC: 3,335
Netherlands
Message 1046271 - Posted: 5 Nov 2010, 16:36:55 UTC - in response to Message 1046240.
Last modified: 5 Nov 2010, 16:55:24 UTC


i thought we were here to donate processer time, credits are interesting but not what we are here for, or am i alone in thinking the S.e.t.i programme is what counts.


IMO, you're right and no your not alone, although, exceptions proved the rule, so 'they say', you could say, why not only count the tasks done, which would not have been fair, unlike 'most' Projects,
SETI@home Multibeam and Astropulse get their, info from a (piggybagging) of the Arecibo Observatorium Telescope.

Well Credits are a way to establish the amount of work done by your hosts, instead of using FLOPS* or of FLOP per WU, Jeff Cobb, has made a Formula determening the work done, FLOPS, but also Memory Bus, Harddisk, Network, all are taken into account, measuring work done for a WU.
Besides, an Incomplete Picture that it's hardly manageble, let alone handy, to Count only FLOPS, numbers would easily have 18, 21, 23 and even more then 25 digits.

From the Account Page, personal, all Projects Results.

Now that new WU's MB as AP have been given out, is this the last
[i]Task
flow, before the New Servers are UP & Running?
(*Whetstone is a more reliable way of measuring Floating Point Operations per Second).

____________

Josef W. SegurProject donor
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4252
Credit: 1,050,681
RAC: 250
United States
Message 1046275 - Posted: 5 Nov 2010, 16:44:32 UTC - in response to Message 1046212.

...
Observation: the backlog of results 'ready to send' ran out, for both MB and AP, around 10:00 UTC, or a few minutes before. There's still a trickle, but effectively no new downloads will have been added to the download server queue since then.

The download pipe remained saturated at 93 Mbit/sec until about 10:50 UTC, the best part of an hour later. That gives us some idea of the relative speeds of the various project components: I think we need to use the remaining time before the new servers arrive to work out some way of helping the scheduler request/reply messages get through (and hence avoid creating a new set of ghosts), while Oscar and friend keep the pipe full.

The 1148513 MB and 26888 AP in "ready to send" at the start plus about 12000 MB and 5000 AP reissues created during the saturation period would have taken nearly 19 hours of download with an actual throughput of 90 Mbps. I suspect ghost creation of 100000 or more.

With the queues drained as much as possible by other means, I hope the staff will try <resend_lost_results>1</resend_lost_results> in the project config.xml at least for short periods. Bursts of activity caused by that should be tolerable, and go a long way toward having the server "in progress" actually match what is on participants' hosts. This being Friday, I expect whoever is in the lab has other plans, but maybe next week?
Joe

Profile Chris SProject donor
Volunteer tester
Avatar
Send message
Joined: 19 Nov 00
Posts: 31774
Credit: 13,201,178
RAC: 35,582
United Kingdom
Message 1046299 - Posted: 5 Nov 2010, 18:50:21 UTC

it's just that both the mysql (boinc) and informix (science) databases are too darn big.


Matt, I did bring that very point up a some while ago, and you replied that there were many other applications out there with much bigger Informix databases than Seti had. I'm assuming therfore that this is not a limitation upon Informix, but on your old servers?

____________
Damsel Rescuer, Uli Devotee, Julie Supporter, Kitty sad,
ES99 Admirer, Raccoon Friend, Anniet fan, Hon Triumphvir


-BeNt-
Avatar
Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1046311 - Posted: 5 Nov 2010, 20:10:24 UTC - in response to Message 1046299.
Last modified: 5 Nov 2010, 20:11:29 UTC

it's just that both the mysql (boinc) and informix (science) databases are too darn big.


Matt, I did bring that very point up a some while ago, and you replied that there were many other applications out there with much bigger Informix databases than Seti had. I'm assuming therfore that this is not a limitation upon Informix, but on your old servers?


http://publib.boulder.ibm.com/infocenter/idshelp/v115/index.jsp?topic=/com.ibm.adref.doc/ids_adr_0719.htm

Judging by what IBM says on it's site about informix databases Seti had limitations with the hardware. IBM says the maximum databases that they can hold is 21 million, with 477,102,080 tables in a dynamic system, 32k threads, with a maximum database size of 4TB. (I wouldn't imagine Seti has larger than that going on, but you never know!)

So I suppose it would be safe to say it was a hardware limitation they are fighting. These new machines hopefully will be more than enough to handle things for a while, I don't see them ordering machines that wouldn't.
____________
Traveling through space at ~67,000mph!

Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Technical News : Status / Credit (Nov 03 2010)

Copyright © 2014 University of California