Status / Credit (Nov 03 2010)

Message boards : Technical News : Status / Credit (Nov 03 2010)
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 1046035 - Posted: 4 Nov 2010, 17:56:39 UTC

Maybe there's some confusion about this credit granting process.

By cleanup I wasn't referring to any corruption. In fact, the data in the databases is/are as clean as ever, it's just that both the mysql (boinc) and informix (science) databases are too darn big. This mega-outage is to get new servers on line for both that can handle the job. Before we can do that, we'd like to reduce the size of the databases to make the transitions easier. Of course there's some gunk in the databases pertaining to things like ghost wu's, orphaned results or signals, etc. Nothing new - but when the databases are quiet and vastly reduced in size this is a golden opportunity to analyse and understand these various long-standing issues - that's what I'm talking about cleanup.

Meanwhile, we've been dealing with all kinds of long planned and unplanned outages, causing users much grief with timeouts. This credit granting procedure ensures that people are getting the credit they deserve for work done without having to wait for wingmen who never show up. And we're not throwing out any science.

By the way, download servers are starting up so we're starting to clear out *that* queue.

- Matt


-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 1046035 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6497
Credit: 34,134,168
RAC: 0
United States
Message 1046037 - Posted: 4 Nov 2010, 18:00:00 UTC - in response to Message 1046035.  

Maybe there's some confusion about this credit granting process.

By cleanup I wasn't referring to any corruption. In fact, the data in the databases is/are as clean as ever, it's just that both the mysql (boinc) and informix (science) databases are too darn big. This mega-outage is to get new servers on line for both that can handle the job. Before we can do that, we'd like to reduce the size of the databases to make the transitions easier. Of course there's some gunk in the databases pertaining to things like ghost wu's, orphaned results or signals, etc. Nothing new - but when the databases are quiet and vastly reduced in size this is a golden opportunity to analyse and understand these various long-standing issues - that's what I'm talking about cleanup.

Meanwhile, we've been dealing with all kinds of long planned and unplanned outages, causing users much grief with timeouts. This credit granting procedure ensures that people are getting the credit they deserve for work done without having to wait for wingmen who never show up. And we're not throwing out any science.

By the way, download servers are starting up so we're starting to clear out *that* queue.

- Matt



YAY!! tight limits should give fast turn around.. and AWESOME!!
Janice
ID: 1046037 · Report as offensive
Swibby Bear

Send message
Joined: 1 Aug 01
Posts: 246
Credit: 7,945,093
RAC: 0
United States
Message 1046048 - Posted: 4 Nov 2010, 18:51:21 UTC

Matt --

If only one download server were running, wouldn't that give a more efficient use of the internet connection and reduce or eliminate the dropped connections? Overall throughput might actually improve? And it might eliminate the ghost creations.

Whit
ID: 1046048 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14674
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1046061 - Posted: 4 Nov 2010, 19:57:24 UTC

I've got my first allocations since the scheduler was switched back on, and I see:

1) The tasks I'm getting are all _0 and _1 - i.e. new work, we're not down to the resends yet.
2) A very high proportion of them are 'shorties'

That's hardly the most restful convalescence for poor Jocelyn, Anakin, and Vader - but I suppose it just represents the tapes the splitters were working on just before the last shutdown.

There's probably AP work in the mix as well. Hardly surprising that the download link went straight to 93 Mbit/s and has stayed there ever since.

To avoid creating yet more ghosts, might it be a good idea to periodically pause the 'new work allocation' side of the scheduler - now you have the tool to do that - while still allowing reporting? That would allow the download rate for allocated work to decay for a while, thus both allowing uploads/reporting to flow freely, and minimise the risk of work being trapped in limbo should, heaven forfend, the sudden rush of work trigger another nervous breakdown server-side.
ID: 1046061 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 66203
Credit: 55,293,173
RAC: 49
United States
Message 1046062 - Posted: 4 Nov 2010, 20:11:18 UTC - in response to Message 1046017.  

PS I do expect pics of the new servers though:)


What? I'm gonna insist upon it! :-))

Ditto, Asahp please.
Savoir-Faire is everywhere!
The T1 Trust, T1 Class 4-4-4-4 #5550, America's First HST

ID: 1046062 · Report as offensive
Profile Ingrid Brouwer
Avatar

Send message
Joined: 8 Jan 00
Posts: 6
Credit: 1,562,268
RAC: 0
Netherlands
Message 1046083 - Posted: 4 Nov 2010, 21:20:50 UTC - in response to Message 1046005.  

<<<SNIP>>> PS I do expect pics of the new servers though:)

I agree LOL Would love to see them, in action preferably ;-D
[/img]
Life is not measured by the number of breaths we take, but by the
moments that take our breath away.
ID: 1046083 · Report as offensive
Mike.Gibson

Send message
Joined: 13 Oct 07
Posts: 34
Credit: 198,038
RAC: 0
United Kingdom
Message 1046140 - Posted: 5 Nov 2010, 1:36:25 UTC

I think the priority of most crunchers is not the receipt of credits. It is to prevent the waste of crunching time that results from WUs being resent when the primary crunching has been held up by an outage.

There should be some sort of moratorium, say for 24 hours, after an outage to give enough time for completed work to be reported.

During that time only new work should be sent out, or secondaries where expiration was before the start of the latest outage.

Mike
ID: 1046140 · Report as offensive
Profile RottenMutt
Avatar

Send message
Joined: 15 Mar 01
Posts: 1011
Credit: 230,314,058
RAC: 0
United States
Message 1046141 - Posted: 5 Nov 2010, 1:40:00 UTC - in response to Message 1046048.  

Matt --

If only one download server were running, wouldn't that give a more efficient use of the internet connection and reduce or eliminate the dropped connections? Overall throughput might actually improve? And it might eliminate the ghost creations.

Whit


downloads seem to work better when both are up, observed.

ID: 1046141 · Report as offensive
Profile RottenMutt
Avatar

Send message
Joined: 15 Mar 01
Posts: 1011
Credit: 230,314,058
RAC: 0
United States
Message 1046145 - Posted: 5 Nov 2010, 1:43:24 UTC
Last modified: 5 Nov 2010, 1:45:11 UTC

yes much better with two.

Matt has been enabling and disabling one of the dl servers if ya haven't noticed.
ID: 1046145 · Report as offensive
Profile RottenMutt
Avatar

Send message
Joined: 15 Mar 01
Posts: 1011
Credit: 230,314,058
RAC: 0
United States
Message 1046155 - Posted: 5 Nov 2010, 2:28:39 UTC

dl's work better right after the second server is enabled, then comes to a crawl.
ID: 1046155 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13835
Credit: 208,696,464
RAC: 304
Australia
Message 1046203 - Posted: 5 Nov 2010, 9:39:22 UTC - in response to Message 1046061.  
Last modified: 5 Nov 2010, 9:39:56 UTC

I've got my first allocations since the scheduler was switched back on, and I see:

1) The tasks I'm getting are all _0 and _1 - i.e. new work, we're not down to the resends yet.
2) A very high proportion of them are 'shorties'

My first batch of new work was all initial issue VLARs, after a couple of them had been done i got one new shortie.
Since then all of my new work has been VLAR re-issues, not a shortie in sight.
Grant
Darwin NT
ID: 1046203 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1046204 - Posted: 5 Nov 2010, 9:55:03 UTC - in response to Message 1045930.  
Last modified: 5 Nov 2010, 9:55:41 UTC

Does it make people feel good to get credit for work which probably now has no chance of contributing to the science? Has the project underestimated the quality of the participants?

To my mind, getting credit without a corresponding increase in consecutive valid is much like having a rotten tomato thrown at me versus having a fresh vine ripened tomato handed to me. It was probably inevitable that the project would need to cancel some work to make the transition to the new servers cleaner, but my vote if we had been asked would have favored no credit for unresolved work.

I bring this up to get others' opinions, in the hope that "I may have to do this granting again once this first round of cleanup is over." will be handled differently.
                                                                  Joe

+1
with arbitrary credit granting (and giving credits for nothing is arbitrary) they lost their last (very small already) meaning.
Better leave credits in auto state and work on database cleanup, I'm sure there is enough work to do :P
ID: 1046204 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14674
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1046212 - Posted: 5 Nov 2010, 11:10:22 UTC - in response to Message 1046203.  

I've got my first allocations since the scheduler was switched back on, and I see:

1) The tasks I'm getting are all _0 and _1 - i.e. new work, we're not down to the resends yet.
2) A very high proportion of them are 'shorties'

My first batch of new work was all initial issue VLARs, after a couple of them had been done i got one new shortie.
Since then all of my new work has been VLAR re-issues, not a shortie in sight.

Yes, the newly-split work didn't last long, and it's been nothing but resent ghosts or other fallout ever since. Which is exactly what we needed to happen. I haven't seen any VLARs here, but I wouldn't expect to - I've selected not to request CPU work, so the CPUs work on other projects while I run CUDA.

Observation: the backlog of results 'ready to send' ran out, for both MB and AP, around 10:00 UTC, or a few minutes before. There's still a trickle, but effectively no new downloads will have been added to the download server queue since then.

The download pipe remained saturated at 93 Mbit/sec until about 10:50 UTC, the best part of an hour later. That gives us some idea of the relative speeds of the various project components: I think we need to use the remaining time before the new servers arrive to work out some way of helping the scheduler request/reply messages get through (and hence avoid creating a new set of ghosts), while Oscar and friend keep the pipe full.
ID: 1046212 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1046231 - Posted: 5 Nov 2010, 14:01:51 UTC - in response to Message 1046212.  
Last modified: 5 Nov 2010, 14:53:36 UTC

In the past hours, starting probably late last night, local time (UTC+1), all SETI Bêta WU's are UPLoaded are new ones were D'lOaded. All 3 QUADS + CUDA are running SETI MB WU's.

Can't see, Result Page Option is turned off, deminiching SERVER load.
Haven't changed any setting in BOINC, since a few month. Most of the time it wasn't possible at all.....:)

But I do hope WU's are validated and granted credit, if a Canonnical Result is made.
(I think, giving Credit for work not done, isn't seving science, so why do it.)

And I agree with Raistmer, as to the 'value' of Credits, difference between projects are out of balance, I think.

About both D'Load Servers, turned on (As of 5 Nov 2010 14:30:05 UTC), all D'Loads are through almost immediatly, quite difference from last days or weeks.
ID: 1046231 · Report as offensive
Profile platium
Avatar

Send message
Joined: 5 Jul 10
Posts: 212
Credit: 262,426
RAC: 0
United Kingdom
Message 1046240 - Posted: 5 Nov 2010, 14:59:24 UTC

i thought we were here to donate processer time, credits are interesting but not what we are here for, or am i alone in thinking the S.e.t.i programme is what counts
ID: 1046240 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1046271 - Posted: 5 Nov 2010, 16:36:55 UTC - in response to Message 1046240.  
Last modified: 5 Nov 2010, 16:55:24 UTC


i thought we were here to donate processer time, credits are interesting but not what we are here for, or am i alone in thinking the S.e.t.i programme is what counts.


IMO, you're right and no your not alone, although, exceptions proved the rule, so 'they say', you could say, why not only count the tasks done, which would not have been fair, unlike 'most' Projects,
SETI@home Multibeam and Astropulse get their, info from a (piggybagging) of the Arecibo Observatorium Telescope.

Well Credits are a way to establish the amount of work done by your hosts, instead of using FLOPS* or of FLOP per WU, Jeff Cobb, has made a Formula determening the work done, FLOPS, but also Memory Bus, Harddisk, Network, all are taken into account, measuring work done for a WU.
Besides, an Incomplete Picture that it's hardly manageble, let alone handy, to Count only FLOPS, numbers would easily have 18, 21, 23 and even more then 25 digits.

From the Account Page, personal, all Projects Results.

Now that new WU's MB as AP have been given out, is this the last
[i]Task
flow, before the New Servers are UP & Running?
(*Whetstone is a more reliable way of measuring Floating Point Operations per Second).

ID: 1046271 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1046275 - Posted: 5 Nov 2010, 16:44:32 UTC - in response to Message 1046212.  

...
Observation: the backlog of results 'ready to send' ran out, for both MB and AP, around 10:00 UTC, or a few minutes before. There's still a trickle, but effectively no new downloads will have been added to the download server queue since then.

The download pipe remained saturated at 93 Mbit/sec until about 10:50 UTC, the best part of an hour later. That gives us some idea of the relative speeds of the various project components: I think we need to use the remaining time before the new servers arrive to work out some way of helping the scheduler request/reply messages get through (and hence avoid creating a new set of ghosts), while Oscar and friend keep the pipe full.

The 1148513 MB and 26888 AP in "ready to send" at the start plus about 12000 MB and 5000 AP reissues created during the saturation period would have taken nearly 19 hours of download with an actual throughput of 90 Mbps. I suspect ghost creation of 100000 or more.

With the queues drained as much as possible by other means, I hope the staff will try <resend_lost_results>1</resend_lost_results> in the project config.xml at least for short periods. Bursts of activity caused by that should be tolerable, and go a long way toward having the server "in progress" actually match what is on participants' hosts. This being Friday, I expect whoever is in the lab has other plans, but maybe next week?
                                                                  Joe
ID: 1046275 · Report as offensive
-BeNt-
Avatar

Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1046311 - Posted: 5 Nov 2010, 20:10:24 UTC - in response to Message 1046299.  
Last modified: 5 Nov 2010, 20:11:29 UTC

it's just that both the mysql (boinc) and informix (science) databases are too darn big.


Matt, I did bring that very point up a some while ago, and you replied that there were many other applications out there with much bigger Informix databases than Seti had. I'm assuming therfore that this is not a limitation upon Informix, but on your old servers?


http://publib.boulder.ibm.com/infocenter/idshelp/v115/index.jsp?topic=/com.ibm.adref.doc/ids_adr_0719.htm

Judging by what IBM says on it's site about informix databases Seti had limitations with the hardware. IBM says the maximum databases that they can hold is 21 million, with 477,102,080 tables in a dynamic system, 32k threads, with a maximum database size of 4TB. (I wouldn't imagine Seti has larger than that going on, but you never know!)

So I suppose it would be safe to say it was a hardware limitation they are fighting. These new machines hopefully will be more than enough to handle things for a while, I don't see them ordering machines that wouldn't.
Traveling through space at ~67,000mph!
ID: 1046311 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14674
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1046335 - Posted: 5 Nov 2010, 22:00:51 UTC - in response to Message 1046275.  

...
Observation: the backlog of results 'ready to send' ran out, for both MB and AP, around 10:00 UTC, or a few minutes before. There's still a trickle, but effectively no new downloads will have been added to the download server queue since then.

The download pipe remained saturated at 93 Mbit/sec until about 10:50 UTC, the best part of an hour later. That gives us some idea of the relative speeds of the various project components: I think we need to use the remaining time before the new servers arrive to work out some way of helping the scheduler request/reply messages get through (and hence avoid creating a new set of ghosts), while Oscar and friend keep the pipe full.

The 1148513 MB and 26888 AP in "ready to send" at the start plus about 12000 MB and 5000 AP reissues created during the saturation period would have taken nearly 19 hours of download with an actual throughput of 90 Mbps. I suspect ghost creation of 100000 or more.

With the queues drained as much as possible by other means, I hope the staff will try <resend_lost_results>1</resend_lost_results> in the project config.xml at least for short periods. Bursts of activity caused by that should be tolerable, and go a long way toward having the server "in progress" actually match what is on participants' hosts. This being Friday, I expect whoever is in the lab has other plans, but maybe next week?
                                                                  Joe

I was surprised that the query loading on Jocelyn showed no sign of abating when the backlog of work available for allocation ran dry - with downloads a mere 20% of what they were this morning, what are all those other queries?


(static copy taken from http://bluenorthernsoftware.com/scarecrow/sahstats/db.php?t=48)
ID: 1046335 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13835
Credit: 208,696,464
RAC: 304
Australia
Message 1046359 - Posted: 5 Nov 2010, 23:43:28 UTC - in response to Message 1046335.  

I was surprised that the query loading on Jocelyn showed no sign of abating when the backlog of work available for allocation ran dry - with downloads a mere 20% of what they were this morning, what are all those other queries?

Validations, assimilations, deletions?

Grant
Darwin NT
ID: 1046359 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Technical News : Status / Credit (Nov 03 2010)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.