Message boards :
Number crunching :
tech news
Message board moderation
Author | Message |
---|---|
SCREAMING EAGLE Send message Joined: 24 Jan 04 Posts: 28 Credit: 268,976 RAC: 0 |
January 4, 2005 - 18:00 UTC We determined the increased database load (mentioned in the previous note below) was due to two indexes we added last week. Yesterday at noon (20:00 UTC) we stopped the project to drop these indexes - a procedure we expected to take an hour, but ended up taking twelve! The servers were restarted at midnight (08:00 UTC) and everything is back to operating at top speed. hmm. we'll see. |
Nuadormrac Send message Joined: 7 Apr 00 Posts: 136 Credit: 1,703,351 RAC: 0 |
|
Scribe Send message Joined: 4 Nov 00 Posts: 137 Credit: 35,235 RAC: 0 |
Well I cannot see any improvement on the granting of credit from pending, that has't moved for ages, and I am talking about the ones with 3 pending results. |
Dave(The Admiral)Nelson Send message Joined: 4 Jun 99 Posts: 415 Credit: 22,293,483 RAC: 1 |
> Well I cannot see any improvement on the granting of credit from pending, that > has't moved for ages, and I am talking about the ones with 3 pending results. > NO, I have about 400. Dave Nelson |
Dave(The Admiral)Nelson Send message Joined: 4 Jun 99 Posts: 415 Credit: 22,293,483 RAC: 1 |
Is there away to find the number pending without going through the list and adding them up? Dave Nelson |
Nuadormrac Send message Joined: 7 Apr 00 Posts: 136 Credit: 1,703,351 RAC: 0 |
I would gather that's what "pending credit" is supposed to be about (well perhaps not the number of units) but it's been disabled since I left classic. Perhaps someone else can ellaborate. I did notice a number of pending with 3 results of 3 sent (probably about 15 or 20, or so), albeit I did get some additional credit this morning. |
Benher Send message Joined: 25 Jul 99 Posts: 517 Credit: 465,152 RAC: 0 |
> I would gather that's what "pending credit" is supposed to be about (well > perhaps not the number of units) but it's been disabled since I left classic. > Perhaps someone else can ellaborate. > Problem with the "pending" link for everyone's hosts, is that if it was turned on, clicking it would make a rather nasty (read as long, time consuming) database request of the main DB server. As the validator already has a backlog because the DB can't give it records fast enough (to grant credit), having people click their [pending] button would probably send the DB server into fits ;) SELECT * FROM workunit AS wu LEFT JOIN result AS res ON wu.id = res.workunitid WHERE res.hostid=[SOMEBODY], res.server_state=PENDING SELECT sum(res.claimed_credit) FROM result AS res WHERE res.hostid=[SOMEBODY], res.server_state=PENDING These are just off the cuff database queries, probably wouldn't work or could be more efficient. |
mikey Send message Joined: 17 Dec 99 Posts: 4215 Credit: 3,474,603 RAC: 0 |
> > I would gather that's what "pending credit" is supposed to be about > (well > > perhaps not the number of units) but it's been disabled since I left > classic. > > Perhaps someone else can ellaborate. > > > > Problem with the "pending" link for everyone's hosts, is that if it was turned > on, clicking it would make a rather nasty (read as long, time consuming) > database request of the main DB server. > > As the validator already has a backlog because the DB can't give it records > fast enough (to grant credit), having people click their [pending] button > would probably send the DB server into fits ;) > Actually it would work IF you told people UP FRONT that the data was x hours old. Say 24 hours, then you could run the query and make the data available from a file, instead of the actual database. You could even put the file on a remote computer that is not so busy but runs a small program periodically to update itself. When it is updating just have it put up a message saying "database refreshing" or whatever. |
Hans Dorn Send message Joined: 3 Apr 99 Posts: 2262 Credit: 26,448,570 RAC: 0 |
> Actually it would work IF you told people UP FRONT that the data was x hours > old. Say 24 hours, then you could run the query and make the data available > from a file, instead of the actual database. You could even put the file on a > remote computer that is not so busy but runs a small program periodically to > update itself. When it is updating just have it put up a message saying > "database refreshing" or whatever. > Hmmm..., I'm not so sure this would work. I guess just a little fraction of users is checking their stats regularly, but you would have to generate those files for all users. Regards Hans |
Walt Gribben Send message Joined: 16 May 99 Posts: 353 Credit: 304,016 RAC: 0 |
> > Actually it would work IF you told people UP FRONT that the data was x hours > > old. Say 24 hours, then you could run the query and make the data available > > from a file, instead of the actual database. You could even put the file on a > > remote computer that is not so busy but runs a small program periodically to > > update itself. When it is updating just have it put up a message saying > > "database refreshing" or whatever. > > > > Hmmm..., I'm not so sure this would work. > I guess just a little fraction of users is checking their stats regularly, > but you would have to generate those files for all users. > > Regards Hans What about generating the list only for users that request it? If just a small fraction check their stats then copy pending results only for them. Use a separate database for these pending results, and keep a "queue" of these users that view them (or attempt to the first time). Track this using a new flag and "date last viewed" field in the users account statistics. The new flag could be used in building the "pending results" item to link to a "queue this user" request or "view pending results" query to the new database. Keep track of the last date the pending results were actually viewed so if the user stops looking, they will get dequeued after some interval passes - a month? a week? two weeks? No need to keep track of more than the stuff shown on the current results page, resilt id, workunit id, date sent, date reported, server state, outcome, client state, cpu time. Claimed credit and granted credit aren't needed as these are all pending. A new database gets built every day, first in a temporary location and then moved to the real location after the old one is deleted. |
Paul D. Buck Send message Joined: 19 Jul 00 Posts: 3898 Credit: 1,158,042 RAC: 0 |
> Use a separate database for these pending results, and keep a "queue" of these > users that view them (or attempt to the first time). Track this using a new > flag and "date last viewed" field in the users account statistics. THis has been suggested. In essence you are describing a different data life-cycle and use separate tables for the various stages. The difficulty now is that this is a live system and it is very difficult to make changes that do not break the existing system yet give different performance. Case in point, the addition of two indexes caused a significant slow-down of the system. Especially troubling to me personally is that this is a transaction oriented database and the solution to access speed is rarely improved with the addition of indexes as each insert, update, or delete will cause changes in the data record and all of the indexes. On some transaction databases there are many tables but rarely any indexes beyond the primary key. > The new flag could be used in building the "pending results" item to link to a > "queue this user" request or "view pending results" query to the new database. > Keep track of the last date the pending results were actually viewed so if > the user stops looking, they will get dequeued after some interval passes - a > month? a week? two weeks? An easier, safer, and just as meaningful would be to make the pending credit an attribute of the participant rather than the result. As results are returned, the bean is incremented, as results get processed, the bean gets decremented. SInce the updates to this table are isolated from the primary result table, queries will not interfere with the behavior of the result table. However, you still have the legacy problem. Also, this would not be perfectly accurate, but would in fact be an estimate that over time would likely drift ... Just some more food for thought ... |
Walt Gribben Send message Joined: 16 May 99 Posts: 353 Credit: 304,016 RAC: 0 |
> THis has been suggested. In essence you are describing a different data > life-cycle and use separate tables for the various stages. The difficulty now > is that this is a live system and it is very difficult to make changes that do > not break the existing system yet give different performance. Case in point, > the addition of two indexes caused a significant slow-down of the system. > In essence not. Not different stages, just one state - pending - and just a snapshot taken at a point in time to be used for querying pending results until another snapshot is taken. The snapshot idea taken from mikeys suggestion here and modified by Hans Dorn and me in subsequent messages. The spun off data is not for all users just for users that requested pending data. This could be done with no changes to the current database and html change pointing the "pending results" link to a script the queried the new database, like the way stats work. If the query returns no records found then the user gets added to the "get pending results" queue. A daily task would be needed to spin off pending results for "queued users". Right now theres very little difference between a "pending results only" query (that got disabled) and one for all results that the end-user keeps paging thru. |
Benher Send message Joined: 25 Jul 99 Posts: 517 Credit: 465,152 RAC: 0 |
> In essence not. > > Not different stages, just one state - pending - and just a snapshot taken at > a point in time to be used for querying pending results until another snapshot > is taken. The snapshot idea taken from mikeys suggestion <a> href="http://setiweb.ssl.berkeley.edu/forum_reply.php?thread=7764&post=60576#60464"> > here[/url] and modified by Hans Dorn and me in subsequent messages. > > The spun off data is not for all users just for users that requested > pending data. This could be done with no changes to the current > database and html change pointing the "pending results" link to a script the > queried the new database, like the way stats work. If the query returns no > records found then the user gets added to the "get pending results" queue. A > daily task would be needed to spin off pending results for "queued users". > > Right now theres very little difference between a "pending results only" query > (that got disabled) and one for all results that the end-user keeps paging > thru. Walt, Your idea would work...ammount of effort involved in implementing it is probably not small (from my programming some of the code). Effort involved vs. # of users who would use/benefit from it is also...percentagewize...probably not big. So for ideas like this they get added to the taskbase, and if some volunteer programmer takes a fancy to the idea and wants to implement it, they go ahead and do so. |
Walt Gribben Send message Joined: 16 May 99 Posts: 353 Credit: 304,016 RAC: 0 |
> Walt, > > Your idea would work...ammount of effort involved in implementing it is > probably not small (from my programming some of the code). Effort involved > vs. # of users who would use/benefit from it is > also...percentagewize...probably not big. > > So for ideas like this they get added to the taskbase, and if some volunteer > programmer takes a fancy to the idea and wants to implement it, they go ahead > and do so. I agree that its not small, but shouldn't take all that much either. How would I get it into the taskbase? I might even try to work on it later in the month if I get some free time, whats the best way of doing that? |
mikey Send message Joined: 17 Dec 99 Posts: 4215 Credit: 3,474,603 RAC: 0 |
> > The spun off data is not for all users just for users that > requested > > pending data. This could be done with no changes to the current > > database and html change pointing the "pending results" link to a script > the > > queried the new database, like the way stats work. If the query returns > no > > records found then the user gets added to the "get pending results" > queue. A > > daily task would be needed to spin off pending results for "queued > users". > > I have another idea, why not just copy the database to another computer, whatever, and do the querys over there? I realize that this would involve another processor, but with PC's coming down in price a PC with a big enough hard drive could do the query in a small amount of time. The caveat would be that the data would be as old as the data copied over. Once a day for example. That way "new" people doing first time querys wouldn't have to wait for their data to be included on their first query. You could even copy the WHOLE databse once a week and just those that request their data more often. Making the "new" peoples data upto a week, or whatever, old, but the next day, for example, it would be upto date. |
. Send message Joined: 3 Apr 99 Posts: 410 Credit: 16,559 RAC: 0 |
Wow Paul! I like your new avatar and board!! Timmy's excellent work, isn't it? |
Walt Gribben Send message Joined: 16 May 99 Posts: 353 Credit: 304,016 RAC: 0 |
> I have another idea, why not just copy the database to another computer, > whatever, and do the querys over there? I realize that this would involve > another processor, but with PC's coming down in price a PC with a big enough > hard drive could do the query in a small amount of time. The caveat would be > that the data would be as old as the data copied over. Once a day for > example. > That way "new" people doing first time querys wouldn't have to wait for their > data to be included on their first query. You could even copy the WHOLE > databse once a week and just those that request their data more often. Making > the "new" peoples data upto a week, or whatever, old, but the next day, for > example, it would be upto date. I thought one of the problems they have was no space, power and cooling for additional hardware. There are certainly a lot of possibilities, but I think old data is actually worse than having to wait a day. Admittedly thats a subjective opinion, but one based on all the complaints I keep reading in these forums. Just whats needed, something new to complain about - how old the "pending results" database copy is :) |
Toby Send message Joined: 26 Oct 00 Posts: 1005 Credit: 6,366,949 RAC: 0 |
> I have another idea, why not just copy the database to another computer, > whatever, and do the querys over there? I realize that this would involve > another processor, but with PC's coming down in price a PC with a big enough > hard drive could do the query in a small amount of time. The caveat would be > that the data would be as old as the data copied over. Once a day for > example. > That way "new" people doing first time querys wouldn't have to wait for their > data to be included on their first query. You could even copy the WHOLE > databse once a week and just those that request their data more often. Making > the "new" peoples data upto a week, or whatever, old, but the next day, for > example, it would be upto date. It is called a replicated database and seti@home IS using one although I'm not sure for which parts of the system it is used. The replicated database reads the binary logs of the master database and updates itself constantly. Then it fields SELECT queries for the master database. The point you make about a "PC" with a big enough hard drive being able to do this is not entirely accurate however. In the past around here it hasn't usually been the CPU power or hard drive space that has been lacking but rather I/O bandwidth. Your typical PC can do anywhere from 10 to 60 MB/sec of hard drive I/O, depending on what you are doing (read vs write, sequential vs random access, etc). This is simply not enough for the kind of data seti@home is pushing around. They must use a RAID with multiple hard drives and a RAID controller, none of which are cheap (for the good stuff). I don't really care if my credit is granted today or tomorrow or next week :) I know they are working on it and that is good enough for me. I know they are working on bringing some new high power hardware online that should go a long way toward solving these problems. A member of The Knights Who Say NI! For rankings, history graphs and more, check out: My BOINC stats site |
Benher Send message Joined: 25 Jul 99 Posts: 517 Credit: 465,152 RAC: 0 |
> I agree that its not small, but shouldn't take all that much either. How > would I get it into the taskbase? I might even try to work on it later in the > month if I get some free time, whats the best way of doing that? > Walt, Task base is here You can sort by priority, task #, expected implementation milestone, etc. To submit code changes you join the BOINC developers mailing list, and post the DIFF of your changes (if long use attachment) as an email to the dev list. David and team decide whether to implement, or a discussion thread might begin as to suggested changes, etc. To get a good DIFF you would probably want to use CVS to get a working copy of the source (unless you want to change only one file). You have to choose which branch of the code you want...the "live" code which is used by the alpha site, or the "public" code which is 4.13. |
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
Here's another one of my random posts to the message boards: 1. To clear things up, we *did* have a replica database for a few weeks there, and it was wonderful, but due to disk storage issues at the time we had to drop the replica, and now any machine we try to bring up as a replica can't keep up. However, we do have a honkin' new system which will (soon) become the master database, and the current system will then be the replica. 2. We're finding that the validation bottleneck is not in the database, so we're still trying to figure this all out. We did move the validate process to another machine which was less loaded, and the queue is draining at a perceivably faster rate, but not by much. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.