Status / Credit (Nov 03 2010)

Author	Message
Matt Lebofsky Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0	Message 1046035 - Posted: 4 Nov 2010, 17:56:39 UTC Maybe there's some confusion about this credit granting process. By cleanup I wasn't referring to any corruption. In fact, the data in the databases is/are as clean as ever, it's just that both the mysql (boinc) and informix (science) databases are too darn big. This mega-outage is to get new servers on line for both that can handle the job. Before we can do that, we'd like to reduce the size of the databases to make the transitions easier. Of course there's some gunk in the databases pertaining to things like ghost wu's, orphaned results or signals, etc. Nothing new - but when the databases are quiet and vastly reduced in size this is a golden opportunity to analyse and understand these various long-standing issues - that's what I'm talking about cleanup. Meanwhile, we've been dealing with all kinds of long planned and unplanned outages, causing users much grief with timeouts. This credit granting procedure ensures that people are getting the credit they deserve for work done without having to wait for wingmen who never show up. And we're not throwing out any science. By the way, download servers are starting up so we're starting to clear out that queue. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude ID: 1046035 ·

soft^spirit Send message Joined: 18 May 99 Posts: 6497 Credit: 34,134,168 RAC: 0	Message 1046037 - Posted: 4 Nov 2010, 18:00:00 UTC - in response to Message 1046035. Maybe there's some confusion about this credit granting process. By cleanup I wasn't referring to any corruption. In fact, the data in the databases is/are as clean as ever, it's just that both the mysql (boinc) and informix (science) databases are too darn big. This mega-outage is to get new servers on line for both that can handle the job. Before we can do that, we'd like to reduce the size of the databases to make the transitions easier. Of course there's some gunk in the databases pertaining to things like ghost wu's, orphaned results or signals, etc. Nothing new - but when the databases are quiet and vastly reduced in size this is a golden opportunity to analyse and understand these various long-standing issues - that's what I'm talking about cleanup. Meanwhile, we've been dealing with all kinds of long planned and unplanned outages, causing users much grief with timeouts. This credit granting procedure ensures that people are getting the credit they deserve for work done without having to wait for wingmen who never show up. And we're not throwing out any science. By the way, download servers are starting up so we're starting to clear out that queue. - Matt YAY!! tight limits should give fast turn around.. and AWESOME!! Janice ID: 1046037 ·

Swibby Bear Send message Joined: 1 Aug 01 Posts: 246 Credit: 7,945,093 RAC: 0	Message 1046048 - Posted: 4 Nov 2010, 18:51:21 UTC Matt -- If only one download server were running, wouldn't that give a more efficient use of the internet connection and reduce or eliminate the dropped connections? Overall throughput might actually improve? And it might eliminate the ghost creations. Whit ID: 1046048 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14680 Credit: 200,643,578 RAC: 874	Message 1046061 - Posted: 4 Nov 2010, 19:57:24 UTC I've got my first allocations since the scheduler was switched back on, and I see: 1) The tasks I'm getting are all _0 and _1 - i.e. new work, we're not down to the resends yet. 2) A very high proportion of them are 'shorties' That's hardly the most restful convalescence for poor Jocelyn, Anakin, and Vader - but I suppose it just represents the tapes the splitters were working on just before the last shutdown. There's probably AP work in the mix as well. Hardly surprising that the download link went straight to 93 Mbit/s and has stayed there ever since. To avoid creating yet more ghosts, might it be a good idea to periodically pause the 'new work allocation' side of the scheduler - now you have the tool to do that - while still allowing reporting? That would allow the download rate for allocated work to decay for a while, thus both allowing uploads/reporting to flow freely, and minimise the risk of work being trapped in limbo should, heaven forfend, the sudden rush of work trigger another nervous breakdown server-side. ID: 1046061 ·

zoom3+1=4 Volunteer tester Send message Joined: 30 Nov 03 Posts: 66398 Credit: 55,293,173 RAC: 49	Message 1046062 - Posted: 4 Nov 2010, 20:11:18 UTC - in response to Message 1046017. PS I do expect pics of the new servers though:) What? I'm gonna insist upon it! :-)) Ditto, Asahp please. Savoir-Faire is everywhere! The T1 Trust, T1 Class 4-4-4-4 #5550, America's First HST ID: 1046062 ·

Ingrid Brouwer Send message Joined: 8 Jan 00 Posts: 6 Credit: 1,562,268 RAC: 0	Message 1046083 - Posted: 4 Nov 2010, 21:20:50 UTC - in response to Message 1046005. <<<SNIP>>> PS I do expect pics of the new servers though:) I agree LOL Would love to see them, in action preferably ;-D [/img] Life is not measured by the number of breaths we take, but by the moments that take our breath away. ID: 1046083 ·

Mike.Gibson Send message Joined: 13 Oct 07 Posts: 34 Credit: 198,038 RAC: 0	Message 1046140 - Posted: 5 Nov 2010, 1:36:25 UTC I think the priority of most crunchers is not the receipt of credits. It is to prevent the waste of crunching time that results from WUs being resent when the primary crunching has been held up by an outage. There should be some sort of moratorium, say for 24 hours, after an outage to give enough time for completed work to be reported. During that time only new work should be sent out, or secondaries where expiration was before the start of the latest outage. Mike ID: 1046140 ·

RottenMutt Send message Joined: 15 Mar 01 Posts: 1011 Credit: 230,314,058 RAC: 0	Message 1046141 - Posted: 5 Nov 2010, 1:40:00 UTC - in response to Message 1046048. Matt -- If only one download server were running, wouldn't that give a more efficient use of the internet connection and reduce or eliminate the dropped connections? Overall throughput might actually improve? And it might eliminate the ghost creations. Whit downloads seem to work better when both are up, observed. ID: 1046141 ·

RottenMutt Send message Joined: 15 Mar 01 Posts: 1011 Credit: 230,314,058 RAC: 0	Message 1046145 - Posted: 5 Nov 2010, 1:43:24 UTC Last modified: 5 Nov 2010, 1:45:11 UTC yes much better with two. Matt has been enabling and disabling one of the dl servers if ya haven't noticed. ID: 1046145 ·

RottenMutt Send message Joined: 15 Mar 01 Posts: 1011 Credit: 230,314,058 RAC: 0	Message 1046155 - Posted: 5 Nov 2010, 2:28:39 UTC dl's work better right after the second server is enabled, then comes to a crawl. ID: 1046155 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13864 Credit: 208,696,464 RAC: 304	Message 1046203 - Posted: 5 Nov 2010, 9:39:22 UTC - in response to Message 1046061. Last modified: 5 Nov 2010, 9:39:56 UTC I've got my first allocations since the scheduler was switched back on, and I see: 1) The tasks I'm getting are all _0 and _1 - i.e. new work, we're not down to the resends yet. 2) A very high proportion of them are 'shorties' My first batch of new work was all initial issue VLARs, after a couple of them had been done i got one new shortie. Since then all of my new work has been VLAR re-issues, not a shortie in sight. Grant Darwin NT ID: 1046203 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1046204 - Posted: 5 Nov 2010, 9:55:03 UTC - in response to Message 1045930. Last modified: 5 Nov 2010, 9:55:41 UTC Does it make people feel good to get credit for work which probably now has no chance of contributing to the science? Has the project underestimated the quality of the participants? To my mind, getting credit without a corresponding increase in consecutive valid is much like having a rotten tomato thrown at me versus having a fresh vine ripened tomato handed to me. It was probably inevitable that the project would need to cancel some work to make the transition to the new servers cleaner, but my vote if we had been asked would have favored no credit for unresolved work. I bring this up to get others' opinions, in the hope that "I may have to do this granting again once this first round of cleanup is over." will be handled differently. Joe +1 with arbitrary credit granting (and giving credits for nothing is arbitrary) they lost their last (very small already) meaning. Better leave credits in auto state and work on database cleanup, I'm sure there is enough work to do :P ID: 1046204 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14680 Credit: 200,643,578 RAC: 874	Message 1046212 - Posted: 5 Nov 2010, 11:10:22 UTC - in response to Message 1046203. I've got my first allocations since the scheduler was switched back on, and I see: 1) The tasks I'm getting are all _0 and _1 - i.e. new work, we're not down to the resends yet. 2) A very high proportion of them are 'shorties' My first batch of new work was all initial issue VLARs, after a couple of them had been done i got one new shortie. Since then all of my new work has been VLAR re-issues, not a shortie in sight. Yes, the newly-split work didn't last long, and it's been nothing but resent ghosts or other fallout ever since. Which is exactly what we needed to happen. I haven't seen any VLARs here, but I wouldn't expect to - I've selected not to request CPU work, so the CPUs work on other projects while I run CUDA. Observation: the backlog of results 'ready to send' ran out, for both MB and AP, around 10:00 UTC, or a few minutes before. There's still a trickle, but effectively no new downloads will have been added to the download server queue since then. The download pipe remained saturated at 93 Mbit/sec until about 10:50 UTC, the best part of an hour later. That gives us some idea of the relative speeds of the various project components: I think we need to use the remaining time before the new servers arrive to work out some way of helping the scheduler request/reply messages get through (and hence avoid creating a new set of ghosts), while Oscar and friend keep the pipe full. ID: 1046212 ·

Fred J. Verster Volunteer tester Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0	Message 1046231 - Posted: 5 Nov 2010, 14:01:51 UTC - in response to Message 1046212. Last modified: 5 Nov 2010, 14:53:36 UTC In the past hours, starting probably late last night, local time (UTC+1), all SETI BÃªta WU's are UPLoaded are new ones were D'lOaded. All 3 QUADS + CUDA are running SETI MB WU's. Can't see, Result Page Option is turned off, deminiching SERVER load. Haven't changed any setting in BOINC, since a few month. Most of the time it wasn't possible at all.....:) But I do hope WU's are validated and granted credit, if a Canonnical Result is made. (I think, giving Credit for work not done, isn't seving science, so why do it.) And I agree with Raistmer, as to the 'value' of Credits, difference between projects are out of balance, I think. About both D'Load Servers, turned on (As of 5 Nov 2010 14:30:05 UTC), all D'Loads are through almost immediatly, quite difference from last days or weeks. ID: 1046231 ·

platium Send message Joined: 5 Jul 10 Posts: 212 Credit: 262,426 RAC: 0	Message 1046240 - Posted: 5 Nov 2010, 14:59:24 UTC i thought we were here to donate processer time, credits are interesting but not what we are here for, or am i alone in thinking the S.e.t.i programme is what counts ID: 1046240 ·

Fred J. Verster Volunteer tester Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0	Message 1046271 - Posted: 5 Nov 2010, 16:36:55 UTC - in response to Message 1046240. Last modified: 5 Nov 2010, 16:55:24 UTC i thought we were here to donate processer time, credits are interesting but not what we are here for, or am i alone in thinking the S.e.t.i programme is what counts. IMO, you're right and no your not alone, although, exceptions proved the rule, so 'they say', you could say, why not only count the tasks done, which would not have been fair, unlike 'most' Projects, SETI@home Multibeam and Astropulse get their, info from a (piggybagging) of the Arecibo Observatorium Telescope. Well Credits are a way to establish the amount of work done by your hosts, instead of using FLOPS* or of FLOP per WU, Jeff Cobb, has made a Formula determening the work done, FLOPS, but also Memory Bus, Harddisk, Network, all are taken into account, measuring work done for a WU. Besides, an Incomplete Picture that it's hardly manageble, let alone handy, to Count only FLOPS, numbers would easily have 18, 21, 23 and even more then 25 digits. From the Account Page, personal, all Projects Results. Now that new WU's MB as AP have been given out, is this the last [i]Task flow, before the New Servers are UP & Running? (*Whetstone is a more reliable way of measuring Floating Point Operations per Second). ID: 1046271 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 1046275 - Posted: 5 Nov 2010, 16:44:32 UTC - in response to Message 1046212. ... Observation: the backlog of results 'ready to send' ran out, for both MB and AP, around 10:00 UTC, or a few minutes before. There's still a trickle, but effectively no new downloads will have been added to the download server queue since then. The download pipe remained saturated at 93 Mbit/sec until about 10:50 UTC, the best part of an hour later. That gives us some idea of the relative speeds of the various project components: I think we need to use the remaining time before the new servers arrive to work out some way of helping the scheduler request/reply messages get through (and hence avoid creating a new set of ghosts), while Oscar and friend keep the pipe full. The 1148513 MB and 26888 AP in "ready to send" at the start plus about 12000 MB and 5000 AP reissues created during the saturation period would have taken nearly 19 hours of download with an actual throughput of 90 Mbps. I suspect ghost creation of 100000 or more. With the queues drained as much as possible by other means, I hope the staff will try <resend_lost_results>1</resend_lost_results> in the project config.xml at least for short periods. Bursts of activity caused by that should be tolerable, and go a long way toward having the server "in progress" actually match what is on participants' hosts. This being Friday, I expect whoever is in the lab has other plans, but maybe next week? Joe ID: 1046275 ·

-BeNt- Send message Joined: 17 Oct 99 Posts: 1234 Credit: 10,116,112 RAC: 0	Message 1046311 - Posted: 5 Nov 2010, 20:10:24 UTC - in response to Message 1046299. Last modified: 5 Nov 2010, 20:11:29 UTC it's just that both the mysql (boinc) and informix (science) databases are too darn big. Matt, I did bring that very point up a some while ago, and you replied that there were many other applications out there with much bigger Informix databases than Seti had. I'm assuming therfore that this is not a limitation upon Informix, but on your old servers? http://publib.boulder.ibm.com/infocenter/idshelp/v115/index.jsp?topic=/com.ibm.adref.doc/ids_adr_0719.htm Judging by what IBM says on it's site about informix databases Seti had limitations with the hardware. IBM says the maximum databases that they can hold is 21 million, with 477,102,080 tables in a dynamic system, 32k threads, with a maximum database size of 4TB. (I wouldn't imagine Seti has larger than that going on, but you never know!) So I suppose it would be safe to say it was a hardware limitation they are fighting. These new machines hopefully will be more than enough to handle things for a while, I don't see them ordering machines that wouldn't. Traveling through space at ~67,000mph! ID: 1046311 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14680 Credit: 200,643,578 RAC: 874	Message 1046335 - Posted: 5 Nov 2010, 22:00:51 UTC - in response to Message 1046275. ... Observation: the backlog of results 'ready to send' ran out, for both MB and AP, around 10:00 UTC, or a few minutes before. There's still a trickle, but effectively no new downloads will have been added to the download server queue since then. The download pipe remained saturated at 93 Mbit/sec until about 10:50 UTC, the best part of an hour later. That gives us some idea of the relative speeds of the various project components: I think we need to use the remaining time before the new servers arrive to work out some way of helping the scheduler request/reply messages get through (and hence avoid creating a new set of ghosts), while Oscar and friend keep the pipe full. The 1148513 MB and 26888 AP in "ready to send" at the start plus about 12000 MB and 5000 AP reissues created during the saturation period would have taken nearly 19 hours of download with an actual throughput of 90 Mbps. I suspect ghost creation of 100000 or more. With the queues drained as much as possible by other means, I hope the staff will try <resend_lost_results>1</resend_lost_results> in the project config.xml at least for short periods. Bursts of activity caused by that should be tolerable, and go a long way toward having the server "in progress" actually match what is on participants' hosts. This being Friday, I expect whoever is in the lab has other plans, but maybe next week? Joe I was surprised that the query loading on Jocelyn showed no sign of abating when the backlog of work available for allocation ran dry - with downloads a mere 20% of what they were this morning, what are all those other queries? (static copy taken from http://bluenorthernsoftware.com/scarecrow/sahstats/db.php?t=48) ID: 1046335 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13864 Credit: 208,696,464 RAC: 304	Message 1046359 - Posted: 5 Nov 2010, 23:43:28 UTC - in response to Message 1046335. I was surprised that the query loading on Jocelyn showed no sign of abating when the backlog of work available for allocation ran dry - with downloads a mere 20% of what they were this morning, what are all those other queries? Validations, assimilations, deletions? Grant Darwin NT ID: 1046359 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.