Validator not caught up yet

Author	Message
rajausa Send message Joined: 19 Feb 01 Posts: 25 Credit: 797,337 RAC: 0	Message 64092 - Posted: 13 Jan 2005, 16:21:34 UTC Does anyone have recent completed WU and 3 or 4 people have completed it and have it still pending. I have many recent pending credit and still some others from weeks or longer ago that have still not got credit and 3 or 4 people have finished the WU. ID: 64092 ·

Walt Gribben Volunteer tester Send message Joined: 16 May 99 Posts: 353 Credit: 304,016 RAC: 0	Message 64146 - Posted: 13 Jan 2005, 17:58:06 UTC - in response to Message 64092. > Does anyone have recent completed WU and 3 or 4 people have completed it and > have it still pending. I have many recent pending credit and still some others > from weeks or longer ago that have still not got credit and 3 or 4 people have > finished the WU. > Did you check the results? I have one pending where all four people completed it, but the validation state for them is: "Checked, but no consensus yet". WU is here if you want to take a look yourself. ID: 64146 ·

rajausa Send message Joined: 19 Feb 01 Posts: 25 Credit: 797,337 RAC: 0	Message 64217 - Posted: 13 Jan 2005, 21:32:48 UTC - in response to Message 64146. > Did you check the results? I have one pending where all four people completed > it, but the validation state for them is: "Checked, but no consensus yet". WU > is <a> href="http://setiweb.ssl.berkeley.edu/workunit.php?wuid=7484599">here[/url] if > you want to take a look yourself. > Yes I have checked and i have many upon many WU pending in this state. Guess all i can do is wait. ID: 64217 ·

Benher Volunteer developer Volunteer tester Send message Joined: 25 Jul 99 Posts: 517 Credit: 465,152 RAC: 0	Message 64317 - Posted: 14 Jan 2005, 0:19:49 UTC I think the validator is falling behind...actually. The number on the status page is increasing. Not sure how the number is generated though, perhaps by schedulers adding up WUs that meet "validate" requirements...and the schedulers may have been restarted or something. ID: 64317 ·

Matt Lebofsky Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0	Message 64360 - Posted: 14 Jan 2005, 0:58:58 UTC The validator is currently falling behind again, due to three things off the top of my head: 0. The result/workunit tables are huge and growing more and more unwieldy. Everything in the server backend relies on these tables, so database reads are generally slowed down. 1. To correct #0, db_purge is running (the process that removes/archives finished workunit/results from the database). That's a lot of deletes which sloooows everything down. 2. I happen to also be running a backup from a snapshot of the database. So I'm reading from an active snapshot which is already dealing with significant numbers of copy-on-writes (competing with the deletes from #1). When we get a replica database up, these dependencies will be a thing of the past. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude ID: 64360 ·

Byron Leigh Hatch @ team Carl Sagan Volunteer tester Send message Joined: 5 Jul 99 Posts: 4548 Credit: 35,667,570 RAC: 4	Message 64570 - Posted: 14 Jan 2005, 4:52:29 UTC - in response to Message 64360. Last modified: 15 Jan 2005, 2:36:17 UTC Matt Lebofsky wrote ============================= > The validator is currently falling behind again, due to three things off the > top of my head: ============================== Don't worry Matt your doing great job !!! hi Matt , Matt ---- please allow me to -- say Hello and , thank you , for all your hard work and long hours you have put in , on: SETI@home 2 , Boinc , SETI@home Classic and setting up all the hardware, servers and Computers. Matt good luck with your , musical Band and -- your Music Matt --- I wish and your family: health _ happiness _ peace _ and _ prosperity _ in the , new Year , 2005 friendly and respectful byron ID: 64570 ·

STE\/E Volunteer tester Send message Joined: 29 Mar 03 Posts: 1137 Credit: 5,334,063 RAC: 0	Message 64730 - Posted: 14 Jan 2005, 9:04:52 UTC - in response to Message 64360. > The validator is currently falling behind again, due to three things off the > top of my head: > > 0. The result/workunit tables are huge and growing more and more unwieldy. > Everything in the server backend relies on these tables, so database reads are > generally slowed down. > > 1. To correct #0, db_purge is running (the process that removes/archives > finished workunit/results from the database). That's a lot of deletes which > sloooows everything down. > > 2. I happen to also be running a backup from a snapshot of the database. So > I'm reading from an active snapshot which is already dealing with significant > numbers of copy-on-writes (competing with the deletes from #1). > > When we get a replica database up, these dependencies will be a thing of the > past. > - Matt ============ No problem Matt, I love the Roller Coaster Ride ... hehe ID: 64730 ·

Alex Plantema Send message Joined: 23 Oct 99 Posts: 35 Credit: 247,181 RAC: 0	Message 64746 - Posted: 14 Jan 2005, 10:43:09 UTC - in response to Message 64360. Matt Lebofsky wrote: > 2. I happen to also be running a backup from a snapshot of the database. So > I'm reading from an active snapshot which is already dealing with significant > numbers of copy-on-writes (competing with the deletes from #1). Can you translate that? Again I found workunits deleted from the results list within hours after receiving credit. ID: 64746 ·

Paul D. Buck Volunteer tester Send message Joined: 19 Jul 00 Posts: 3898 Credit: 1,158,042 RAC: 0	Message 64752 - Posted: 14 Jan 2005, 11:29:16 UTC - in response to Message 64746. > Matt Lebofsky wrote: > > > 2. I happen to also be running a backup from a snapshot of the database. > So > > I'm reading from an active snapshot which is already dealing with > significant > > numbers of copy-on-writes (competing with the deletes from #1). It sounds like he created a copy of the database, but the "copy" is refreshed with new data each time that the source database is updated with each change of the source database. Other databases call this a replication of the database. In many cases the replication is made to ensure a recovery is possible if the source database dies. ID: 64752 ·

Alex Plantema Send message Joined: 23 Oct 99 Posts: 35 Credit: 247,181 RAC: 0	Message 64758 - Posted: 14 Jan 2005, 11:59:34 UTC - in response to Message 64752. Thanks Paul! ID: 64758 ·

Paul D. Buck Volunteer tester Send message Joined: 19 Jul 00 Posts: 3898 Credit: 1,158,042 RAC: 0	Message 65809 - Posted: 14 Jan 2005, 15:19:21 UTC - in response to Message 64758. > Thanks Paul! Sure! No problem ... Of course I cannot see how Matt can establish a good back up when the database keeps changing. We had this debate several months ago when I pointed out that I personally have never seen a "hot" backup that was able to be restored without errors. It got rather interesting there for awhile ... :) In theory, you can make a snapshot of a database and the transaction logs and then back that up and restore it at some future time. My problem has always been that it looks like you have a back-up of the database, but few people test the back up and test for data corruption. It is too hard. If you look around, like Oracle's documentation, you will see that they don't reccommend doing a "hot" back-up unless that is forced upon you by the situation. My experince confirmed this to *MY* satisfaction. Most of the DBAs that I worked with also did not like hot backups. We would make the database stable by removing users and ensuring that all of the pending transactions were committed, then we would make both a back-up and an export (Oracle databases). Then we would bring the database back up and allow connections. With reasonable hardware this is not a hard thing to do, and does not take that much time. ID: 65809 ·

karthwyne Volunteer tester Send message Joined: 24 May 99 Posts: 218 Credit: 5,750,702 RAC: 0	Message 65837 - Posted: 14 Jan 2005, 15:22:33 UTC - in response to Message 64360. > The validator is currently falling behind again, due to three things off the > top of my head: > No worries matt, we know you all are doing the best! we'll be here regardless and thanks to byron for all the kind words that he always has Micah S@h Berkeley's Staff Friends Club ID: 65837 ·

Byron Leigh Hatch @ team Carl Sagan Volunteer tester Send message Joined: 5 Jul 99 Posts: 4548 Credit: 35,667,570 RAC: 4	Message 67219 - Posted: 14 Jan 2005, 16:40:29 UTC - in response to Message 65837. Last modified: 14 Jan 2005, 16:46:08 UTC ------- I am very sorry to be off the topic , I do sincerely apologize byron -------- _ :( Micah -- [An'nai] -- wrote: =============== > and thanks to byron for all the kind words that he always has > > Micah > > ============== hi Micah , thank you very much , for your very kind words I very much appreciate your kind words _ :) Micah , please allow me to wish you and your family: health _ happiness _ peace _ and _ prosperity _ in the , new Year , 2005 friendly and respectful byron ------- again , I am very sorry to be off the topic , I do sincerely apologize byron -------- _ :( ID: 67219 ·

Matt Lebofsky Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0	Message 68591 - Posted: 14 Jan 2005, 17:19:15 UTC Regarding the "hot" database backup / snapshot stuff: When I do a backup of the database I do it off a snapshot. I stop all the projects, the web site, basically anything that touches the database. Then I flush the database and halt it. At this point it is in a perfectly stable state as it was cleanly shut down. Then I snapshot the filesystem which contains all the data. This is completely outside of the realm of mysql's control - it's strictly part of the OS and volume manager. Then I restart the database, the projects, etc. At this point, the snapshot is 0 bytes long, but every time a page of N bytes gets altered in the "live" database, the pre-altered N bytes gets added to the snapshot to preserve the original state when the snapshot was made. So while things like the data purge process is running and deleting huge chunks from the live database, the snapshot is busy just trying to maintain its original state by copying the unaltered database pages to itself. From the user perspective, the snapshot remains perfectly stable - but there's a lot of work going on under the hood to keep it this way. That's what I mean about copy-on-writes. Every write to the database results in a copy to the snapshot. Double the I/O, slowing everything down. There is no replica database server right now. If there was, then the backup would be completely painless. Instead of the above, I wouldn't have to stop the projects, web server, etc. at all. The replica database is simply a second instance of mysql running on another machine that, every time the master updates the database, it tells the replica to do the same exact thing. Yes, this is double the I/O but it is twice the machine, so that's not a problem. When it comes time for backup, I simply tell the replica to halt getting updates from the master. It'll be in a perfectly stable state while the master continues chugging along. I then do the backup on the halted replica (which is fast because there is no other I/O) or make a snapshot of it like the above (so we could turn it back on again). - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude ID: 68591 ·

Byron Leigh Hatch @ team Carl Sagan Volunteer tester Send message Joined: 5 Jul 99 Posts: 4548 Credit: 35,667,570 RAC: 4	Message 68811 - Posted: 14 Jan 2005, 17:44:09 UTC - in response to Message 68591. Last modified: 15 Jan 2005, 2:40:35 UTC Matt Lebofsky wrote ==================== > Regarding the "hot" database backup / snapshot stuff: > > When I do a backup of the database I do it off a snapshot. I stop all the > projects, the web site, basically anything that touches the database. Then I > flush the database and halt it. At this point it is in a perfectly stable > state as it was cleanly shut down. ==================== Don't worry Matt your doing great job !!! hi Matt , Matt ---- please allow me to -- say Hello and , thank you , for all your hard work and long hours you have put in , on: SETI@home 2 , Boinc , SETI@home Classic and setting up all the hardware, servers and Computers. Matt good luck with your , musical Band and -- your Music Matt --- I wish and your family: health _ happiness _ peace _ and _ prosperity _ in the , new Year , 2005 friendly and respectful byron ID: 68811 ·

Paul D. Buck Volunteer tester Send message Joined: 19 Jul 00 Posts: 3898 Credit: 1,158,042 RAC: 0	Message 68815 - Posted: 14 Jan 2005, 17:48:02 UTC - in response to Message 68591. Matt, > Regarding the "hot" database backup / snapshot stuff: > > When I do a backup of the database I do it off a snapshot. I stop all the > projects, the web site, basically anything that touches the database. Then I > flush the database and halt it. At this point it is in a perfectly stable > state as it was cleanly shut down. > > Then I snapshot the filesystem which contains all the data. This is completely > outside of the realm of mysql's control - it's strictly part of the OS and > volume manager. Then I restart the database, the projects, etc. > > At this point, the snapshot is 0 bytes long, but every time a page of N bytes > gets altered in the "live" database, the pre-altered N bytes gets added to the > snapshot to preserve the original state when the snapshot was made. > > So while things like the data purge process is running and deleting huge > chunks from the live database, the snapshot is busy just trying to maintain > its original state by copying the unaltered database pages to itself. From the > user perspective, the snapshot remains perfectly stable - but there's a lot of > work going on under the hood to keep it this way. > > That's what I mean about copy-on-writes. Every write to the database results > in a copy to the snapshot. Double the I/O, slowing everything down. > > There is no replica database server right now. If there was, then the backup > would be completely painless. Instead of the above, I wouldn't have to stop > the projects, web server, etc. at all. The replica database is simply a second > instance of mysql running on another machine that, every time the master > updates the database, it tells the replica to do the same exact thing. > > Yes, this is double the I/O but it is twice the machine, so that's not a > problem. When it comes time for backup, I simply tell the replica to halt > getting updates from the master. It'll be in a perfectly stable state while > the master continues chugging along. I then do the backup on the halted > replica (which is fast because there is no other I/O) or make a snapshot of it > like the above (so we could turn it back on again). Way cool! :) Now I am happy again ... :) So, yes, you are doing what I was saying, identify a particular checkpoint, which is when you turn off replication, and then backup that stable data set. Once that is done, syncrhonize the database and the replicated database, allow them to track each other, and then do another checkpoint. ID: 68815 ·

Byron Leigh Hatch @ team Carl Sagan Volunteer tester Send message Joined: 5 Jul 99 Posts: 4548 Credit: 35,667,570 RAC: 4	Message 68821 - Posted: 14 Jan 2005, 17:50:04 UTC - in response to Message 68815. Last modified: 17 Jan 2005, 17:20:03 UTC :) <A><B>hi Paul , nice to see you , how are you thanks very much . Paul - with out your very good explanation , I would be lost _ :)</B>[/url] friendly and respectful byron ID: 68821 ·

Paul D. Buck Volunteer tester Send message Joined: 19 Jul 00 Posts: 3898 Credit: 1,158,042 RAC: 0	Message 68861 - Posted: 14 Jan 2005, 18:05:48 UTC Matt, Now you are famous... I quoted you! :) I hope you don't mind ... It is the Glossary, for some strange reason it is in the definition of "Back-Up" ... ID: 68861 ·

Tom Gutman Send message Joined: 20 Jul 00 Posts: 48 Credit: 219,500 RAC: 0	Message 69227 - Posted: 14 Jan 2005, 22:14:55 UTC - in response to Message 65809. While not trivial, backing up an active database is not that difficult. PARS was doing that 30 years ago. It does require logging as well as the image capture, and was never recommended for when the system was at maximum activity. But there was no need to shut down the system, except for a few minutes for closure. And every RAID 1 system is capable of doing it. That is what happens when you put in a new disk to replace one of the mirrors. Yes, it's a bit easier and faster if you can shut down the data base for the purpose. For some data bases that is no problem -- they are only needed for limited time periods, perhaps 9-5 on work days. Other data bases are required to be available 24/7 with only very short outages (on the order of minutes, if that much) tolerated. ------- Tom Gutman ID: 69227 ·

Paul D. Buck Volunteer tester Send message Joined: 19 Jul 00 Posts: 3898 Credit: 1,158,042 RAC: 0	Message 69229 - Posted: 14 Jan 2005, 22:21:11 UTC - in response to Message 69227. > While not trivial, backing up an active database is not that difficult. PARS > was doing that 30 years ago. It does require logging as well as the image > capture, and was never recommended for when the system was at maximum > activity. But there was no need to shut down the system, except for a few > minutes for closure. I agree, not trivial ... I don't agree that it can be done safely. *Paul's* opinion. So, we won't reach agreement here. I just tell it like I saw it ... Your Mileage obviously varies from mine ... > And every RAID 1 system is capable of doing it. That is what happens when you > put in a new disk to replace one of the mirrors. RAID is a whole 'nother beast. All you are doing there is posting the same sector writes to two logical disks drives (usually also the logical drives map to physical drives on a 1 to 1 basis, but they don't have to). > Yes, it's a bit easier and faster if you can shut down the data base for the > purpose. For some data bases that is no problem -- they are only needed for > limited time periods, perhaps 9-5 on work days. Other data bases are required > to be available 24/7 with only very short outages (on the order of minutes, if > that much) tolerated. Agreed. ID: 69229 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.