tech news

Message boards : Number crunching : tech news
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
SCREAMING EAGLE

Send message
Joined: 24 Jan 04
Posts: 28
Credit: 268,976
RAC: 0
United States
Message 60429 - Posted: 4 Jan 2005, 18:37:00 UTC

January 4, 2005 - 18:00 UTC
We determined the increased database load (mentioned in the previous note below) was due to two indexes we added last week. Yesterday at noon (20:00 UTC) we stopped the project to drop these indexes - a procedure we expected to take an hour, but ended up taking twelve! The servers were restarted at midnight (08:00 UTC) and everything is back to operating at top speed.


hmm. we'll see.
ID: 60429 · Report as offensive
Nuadormrac
Volunteer tester
Avatar

Send message
Joined: 7 Apr 00
Posts: 136
Credit: 1,703,351
RAC: 0
United States
Message 60432 - Posted: 4 Jan 2005, 18:38:28 UTC

thx for the update...

ID: 60432 · Report as offensive
Profile Scribe
Avatar

Send message
Joined: 4 Nov 00
Posts: 137
Credit: 35,235
RAC: 0
United Kingdom
Message 60439 - Posted: 4 Jan 2005, 18:48:53 UTC

Well I cannot see any improvement on the granting of credit from pending, that has't moved for ages, and I am talking about the ones with 3 pending results.
ID: 60439 · Report as offensive
Dave(The Admiral)Nelson

Send message
Joined: 4 Jun 99
Posts: 415
Credit: 22,293,483
RAC: 1
United States
Message 60447 - Posted: 4 Jan 2005, 19:01:49 UTC - in response to Message 60439.  

> Well I cannot see any improvement on the granting of credit from pending, that
> has't moved for ages, and I am talking about the ones with 3 pending results.
>

NO, I have about 400.

Dave Nelson
ID: 60447 · Report as offensive
Dave(The Admiral)Nelson

Send message
Joined: 4 Jun 99
Posts: 415
Credit: 22,293,483
RAC: 1
United States
Message 60448 - Posted: 4 Jan 2005, 19:03:24 UTC - in response to Message 60447.  

Is there away to find the number pending without going through the list and adding them up?
Dave Nelson
ID: 60448 · Report as offensive
Nuadormrac
Volunteer tester
Avatar

Send message
Joined: 7 Apr 00
Posts: 136
Credit: 1,703,351
RAC: 0
United States
Message 60450 - Posted: 4 Jan 2005, 19:09:58 UTC

I would gather that's what "pending credit" is supposed to be about (well perhaps not the number of units) but it's been disabled since I left classic. Perhaps someone else can ellaborate.

I did notice a number of pending with 3 results of 3 sent (probably about 15 or 20, or so), albeit I did get some additional credit this morning.

ID: 60450 · Report as offensive
Profile Benher
Volunteer developer
Volunteer tester

Send message
Joined: 25 Jul 99
Posts: 517
Credit: 465,152
RAC: 0
United States
Message 60464 - Posted: 4 Jan 2005, 19:49:44 UTC - in response to Message 60450.  

> I would gather that's what "pending credit" is supposed to be about (well
> perhaps not the number of units) but it's been disabled since I left classic.
> Perhaps someone else can ellaborate.
>

Problem with the "pending" link for everyone's hosts, is that if it was turned on, clicking it would make a rather nasty (read as long, time consuming) database request of the main DB server.

As the validator already has a backlog because the DB can't give it records fast enough (to grant credit), having people click their [pending] button would probably send the DB server into fits ;)

SELECT * FROM workunit AS wu LEFT JOIN result AS res ON wu.id = res.workunitid WHERE res.hostid=[SOMEBODY], res.server_state=PENDING

SELECT sum(res.claimed_credit) FROM result AS res WHERE res.hostid=[SOMEBODY], res.server_state=PENDING

These are just off the cuff database queries, probably wouldn't work or could be more efficient.
ID: 60464 · Report as offensive
Profile mikey
Volunteer tester
Avatar

Send message
Joined: 17 Dec 99
Posts: 4215
Credit: 3,474,603
RAC: 0
United States
Message 60491 - Posted: 4 Jan 2005, 20:38:20 UTC - in response to Message 60464.  

> > I would gather that's what "pending credit" is supposed to be about
> (well
> > perhaps not the number of units) but it's been disabled since I left
> classic.
> > Perhaps someone else can ellaborate.
> >
>
> Problem with the "pending" link for everyone's hosts, is that if it was turned
> on, clicking it would make a rather nasty (read as long, time consuming)
> database request of the main DB server.
>
> As the validator already has a backlog because the DB can't give it records
> fast enough (to grant credit), having people click their [pending] button
> would probably send the DB server into fits ;)
>
Actually it would work IF you told people UP FRONT that the data was x hours old. Say 24 hours, then you could run the query and make the data available from a file, instead of the actual database. You could even put the file on a remote computer that is not so busy but runs a small program periodically to update itself. When it is updating just have it put up a message saying "database refreshing" or whatever.

ID: 60491 · Report as offensive
Hans Dorn
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 2262
Credit: 26,448,570
RAC: 0
Germany
Message 60497 - Posted: 4 Jan 2005, 20:48:43 UTC - in response to Message 60491.  

> Actually it would work IF you told people UP FRONT that the data was x hours
> old. Say 24 hours, then you could run the query and make the data available
> from a file, instead of the actual database. You could even put the file on a
> remote computer that is not so busy but runs a small program periodically to
> update itself. When it is updating just have it put up a message saying
> "database refreshing" or whatever.
>

Hmmm..., I'm not so sure this would work.
I guess just a little fraction of users is checking their stats regularly,
but you would have to generate those files for all users.

Regards Hans

ID: 60497 · Report as offensive
Walt Gribben
Volunteer tester

Send message
Joined: 16 May 99
Posts: 353
Credit: 304,016
RAC: 0
United States
Message 60556 - Posted: 4 Jan 2005, 23:08:52 UTC
Last modified: 4 Jan 2005, 23:10:01 UTC

> > Actually it would work IF you told people UP FRONT that the data was x hours
> > old. Say 24 hours, then you could run the query and make the data available
> > from a file, instead of the actual database. You could even put the file on a
> > remote computer that is not so busy but runs a small program periodically to
> > update itself. When it is updating just have it put up a message saying
> > "database refreshing" or whatever.
> >
>
> Hmmm..., I'm not so sure this would work.
> I guess just a little fraction of users is checking their stats regularly,
> but you would have to generate those files for all users.
>
> Regards Hans

What about generating the list only for users that request it? If just a small fraction check their stats then copy pending results only for them.

Use a separate database for these pending results, and keep a "queue" of these users that view them (or attempt to the first time). Track this using a new flag and "date last viewed" field in the users account statistics.

The new flag could be used in building the "pending results" item to link to a "queue this user" request or "view pending results" query to the new database. Keep track of the last date the pending results were actually viewed so if the user stops looking, they will get dequeued after some interval passes - a month? a week? two weeks?

No need to keep track of more than the stuff shown on the current results page, resilt id, workunit id, date sent, date reported, server state, outcome, client state, cpu time. Claimed credit and granted credit aren't needed as these are all pending. A new database gets built every day, first in a temporary location and then moved to the real location after the old one is deleted.
ID: 60556 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 60576 - Posted: 5 Jan 2005, 0:09:38 UTC - in response to Message 60556.  

> Use a separate database for these pending results, and keep a "queue" of these
> users that view them (or attempt to the first time). Track this using a new
> flag and "date last viewed" field in the users account statistics.

THis has been suggested. In essence you are describing a different data life-cycle and use separate tables for the various stages. The difficulty now is that this is a live system and it is very difficult to make changes that do not break the existing system yet give different performance. Case in point, the addition of two indexes caused a significant slow-down of the system.

Especially troubling to me personally is that this is a transaction oriented database and the solution to access speed is rarely improved with the addition of indexes as each insert, update, or delete will cause changes in the data record and all of the indexes. On some transaction databases there are many tables but rarely any indexes beyond the primary key.

> The new flag could be used in building the "pending results" item to link to a
> "queue this user" request or "view pending results" query to the new database.
> Keep track of the last date the pending results were actually viewed so if
> the user stops looking, they will get dequeued after some interval passes - a
> month? a week? two weeks?

An easier, safer, and just as meaningful would be to make the pending credit an attribute of the participant rather than the result. As results are returned, the bean is incremented, as results get processed, the bean gets decremented. SInce the updates to this table are isolated from the primary result table, queries will not interfere with the behavior of the result table. However, you still have the legacy problem. Also, this would not be perfectly accurate, but would in fact be an estimate that over time would likely drift ...


Just some more food for thought ...
ID: 60576 · Report as offensive
Walt Gribben
Volunteer tester

Send message
Joined: 16 May 99
Posts: 353
Credit: 304,016
RAC: 0
United States
Message 60592 - Posted: 5 Jan 2005, 0:49:29 UTC - in response to Message 60576.  


> THis has been suggested. In essence you are describing a different data
> life-cycle and use separate tables for the various stages. The difficulty now
> is that this is a live system and it is very difficult to make changes that do
> not break the existing system yet give different performance. Case in point,
> the addition of two indexes caused a significant slow-down of the system.
>

In essence not.

Not different stages, just one state - pending - and just a snapshot taken at a point in time to be used for querying pending results until another snapshot is taken. The snapshot idea taken from mikeys suggestion here and modified by Hans Dorn and me in subsequent messages.

The spun off data is not for all users just for users that requested pending data. This could be done with no changes to the current database and html change pointing the "pending results" link to a script the queried the new database, like the way stats work. If the query returns no records found then the user gets added to the "get pending results" queue. A daily task would be needed to spin off pending results for "queued users".

Right now theres very little difference between a "pending results only" query (that got disabled) and one for all results that the end-user keeps paging thru.

ID: 60592 · Report as offensive
Profile Benher
Volunteer developer
Volunteer tester

Send message
Joined: 25 Jul 99
Posts: 517
Credit: 465,152
RAC: 0
United States
Message 60705 - Posted: 5 Jan 2005, 4:18:51 UTC - in response to Message 60592.  


> In essence not.
>
> Not different stages, just one state - pending - and just a snapshot taken at
> a point in time to be used for querying pending results until another snapshot
> is taken. The snapshot idea taken from mikeys suggestion <a> href="http://setiweb.ssl.berkeley.edu/forum_reply.php?thread=7764&post=60576#60464">
> here[/url] and modified by Hans Dorn and me in subsequent messages.
>
> The spun off data is not for all users just for users that requested
> pending data. This could be done with no changes to the current
> database and html change pointing the "pending results" link to a script the
> queried the new database, like the way stats work. If the query returns no
> records found then the user gets added to the "get pending results" queue. A
> daily task would be needed to spin off pending results for "queued users".
>
> Right now theres very little difference between a "pending results only" query
> (that got disabled) and one for all results that the end-user keeps paging
> thru.

Walt,

Your idea would work...ammount of effort involved in implementing it is probably not small (from my programming some of the code). Effort involved vs. # of users who would use/benefit from it is also...percentagewize...probably not big.

So for ideas like this they get added to the taskbase, and if some volunteer programmer takes a fancy to the idea and wants to implement it, they go ahead and do so.
ID: 60705 · Report as offensive
Walt Gribben
Volunteer tester

Send message
Joined: 16 May 99
Posts: 353
Credit: 304,016
RAC: 0
United States
Message 60714 - Posted: 5 Jan 2005, 4:35:14 UTC - in response to Message 60705.  


> Walt,
>
> Your idea would work...ammount of effort involved in implementing it is
> probably not small (from my programming some of the code). Effort involved
> vs. # of users who would use/benefit from it is
> also...percentagewize...probably not big.
>
> So for ideas like this they get added to the taskbase, and if some volunteer
> programmer takes a fancy to the idea and wants to implement it, they go ahead
> and do so.

I agree that its not small, but shouldn't take all that much either. How would I get it into the taskbase? I might even try to work on it later in the month if I get some free time, whats the best way of doing that?
ID: 60714 · Report as offensive
Profile mikey
Volunteer tester
Avatar

Send message
Joined: 17 Dec 99
Posts: 4215
Credit: 3,474,603
RAC: 0
United States
Message 61267 - Posted: 6 Jan 2005, 14:28:23 UTC - in response to Message 60705.  

> > The spun off data is not for all users just for users that
> requested
> > pending data. This could be done with no changes to the current
> > database and html change pointing the "pending results" link to a script
> the
> > queried the new database, like the way stats work. If the query returns
> no
> > records found then the user gets added to the "get pending results"
> queue. A
> > daily task would be needed to spin off pending results for "queued
> users".
> >
I have another idea, why not just copy the database to another computer, whatever, and do the querys over there? I realize that this would involve another processor, but with PC's coming down in price a PC with a big enough hard drive could do the query in a small amount of time. The caveat would be that the data would be as old as the data copied over. Once a day for example.
That way "new" people doing first time querys wouldn't have to wait for their data to be included on their first query. You could even copy the WHOLE databse once a week and just those that request their data more often. Making the "new" peoples data upto a week, or whatever, old, but the next day, for example, it would be upto date.

ID: 61267 · Report as offensive
.
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 410
Credit: 16,559
RAC: 0
Message 61274 - Posted: 6 Jan 2005, 15:01:50 UTC - in response to Message 60576.  

Wow Paul!

I like your new avatar and board!! Timmy's excellent work, isn't it?
ID: 61274 · Report as offensive
Walt Gribben
Volunteer tester

Send message
Joined: 16 May 99
Posts: 353
Credit: 304,016
RAC: 0
United States
Message 61290 - Posted: 6 Jan 2005, 16:53:35 UTC - in response to Message 61267.  

> I have another idea, why not just copy the database to another computer,
> whatever, and do the querys over there? I realize that this would involve
> another processor, but with PC's coming down in price a PC with a big enough
> hard drive could do the query in a small amount of time. The caveat would be
> that the data would be as old as the data copied over. Once a day for
> example.
> That way "new" people doing first time querys wouldn't have to wait for their
> data to be included on their first query. You could even copy the WHOLE
> databse once a week and just those that request their data more often. Making
> the "new" peoples data upto a week, or whatever, old, but the next day, for
> example, it would be upto date.

I thought one of the problems they have was no space, power and cooling for additional hardware.

There are certainly a lot of possibilities, but I think old data is actually worse than having to wait a day. Admittedly thats a subjective opinion, but one based on all the complaints I keep reading in these forums. Just whats needed, something new to complain about - how old the "pending results" database copy is :)



ID: 61290 · Report as offensive
Profile Toby
Volunteer tester
Avatar

Send message
Joined: 26 Oct 00
Posts: 1005
Credit: 6,366,949
RAC: 0
United States
Message 61314 - Posted: 6 Jan 2005, 18:08:32 UTC - in response to Message 61267.  
Last modified: 6 Jan 2005, 18:09:39 UTC

> I have another idea, why not just copy the database to another computer,
> whatever, and do the querys over there? I realize that this would involve
> another processor, but with PC's coming down in price a PC with a big enough
> hard drive could do the query in a small amount of time. The caveat would be
> that the data would be as old as the data copied over. Once a day for
> example.
> That way "new" people doing first time querys wouldn't have to wait for their
> data to be included on their first query. You could even copy the WHOLE
> databse once a week and just those that request their data more often. Making
> the "new" peoples data upto a week, or whatever, old, but the next day, for
> example, it would be upto date.

It is called a replicated database and seti@home IS using one although I'm not sure for which parts of the system it is used. The replicated database reads the binary logs of the master database and updates itself constantly. Then it fields SELECT queries for the master database. The point you make about a "PC" with a big enough hard drive being able to do this is not entirely accurate however. In the past around here it hasn't usually been the CPU power or hard drive space that has been lacking but rather I/O bandwidth. Your typical PC can do anywhere from 10 to 60 MB/sec of hard drive I/O, depending on what you are doing (read vs write, sequential vs random access, etc). This is simply not enough for the kind of data seti@home is pushing around. They must use a RAID with multiple hard drives and a RAID controller, none of which are cheap (for the good stuff).

I don't really care if my credit is granted today or tomorrow or next week :) I know they are working on it and that is good enough for me. I know they are working on bringing some new high power hardware online that should go a long way toward solving these problems.
A member of The Knights Who Say NI!
For rankings, history graphs and more, check out:
My BOINC stats site
ID: 61314 · Report as offensive
Profile Benher
Volunteer developer
Volunteer tester

Send message
Joined: 25 Jul 99
Posts: 517
Credit: 465,152
RAC: 0
United States
Message 61321 - Posted: 6 Jan 2005, 18:24:20 UTC - in response to Message 60714.  
Last modified: 6 Jan 2005, 18:28:40 UTC

> I agree that its not small, but shouldn't take all that much either. How
> would I get it into the taskbase? I might even try to work on it later in the
> month if I get some free time, whats the best way of doing that?
>

Walt,

Task base is here

You can sort by priority, task #, expected implementation milestone, etc.

To submit code changes you join the BOINC developers mailing list, and post the DIFF of your changes (if long use attachment) as an email to the dev list.
David and team decide whether to implement, or a discussion thread might begin as to suggested changes, etc.

To get a good DIFF you would probably want to use CVS to get a working copy of the source (unless you want to change only one file). You have to choose which branch of the code you want...the "live" code which is used by the alpha site, or the "public" code which is 4.13.

ID: 61321 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 61364 - Posted: 6 Jan 2005, 20:50:10 UTC

Here's another one of my random posts to the message boards:

1. To clear things up, we *did* have a replica database for a few weeks there, and it was wonderful, but due to disk storage issues at the time we had to drop the replica, and now any machine we try to bring up as a replica can't keep up. However, we do have a honkin' new system which will (soon) become the master database, and the current system will then be the replica.

2. We're finding that the validation bottleneck is not in the database, so we're still trying to figure this all out. We did move the validate process to another machine which was less loaded, and the queue is draining at a perceivably faster rate, but not by much.

- Matt
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 61364 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : tech news


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.