Message boards :
Number crunching :
Validator not caught up yet
Message board moderation
Author | Message |
---|---|
rajausa Send message Joined: 19 Feb 01 Posts: 25 Credit: 797,337 RAC: 0 |
Does anyone have recent completed WU and 3 or 4 people have completed it and have it still pending. I have many recent pending credit and still some others from weeks or longer ago that have still not got credit and 3 or 4 people have finished the WU. |
Walt Gribben Send message Joined: 16 May 99 Posts: 353 Credit: 304,016 RAC: 0 |
> Does anyone have recent completed WU and 3 or 4 people have completed it and > have it still pending. I have many recent pending credit and still some others > from weeks or longer ago that have still not got credit and 3 or 4 people have > finished the WU. > Did you check the results? I have one pending where all four people completed it, but the validation state for them is: "Checked, but no consensus yet". WU is here if you want to take a look yourself. |
rajausa Send message Joined: 19 Feb 01 Posts: 25 Credit: 797,337 RAC: 0 |
> Did you check the results? I have one pending where all four people completed > it, but the validation state for them is: "Checked, but no consensus yet". WU > is <a> href="http://setiweb.ssl.berkeley.edu/workunit.php?wuid=7484599">here[/url] if > you want to take a look yourself. > Yes I have checked and i have many upon many WU pending in this state. Guess all i can do is wait. |
Benher Send message Joined: 25 Jul 99 Posts: 517 Credit: 465,152 RAC: 0 |
I think the validator is falling behind...actually. The number on the status page is increasing. Not sure how the number is generated though, perhaps by schedulers adding up WUs that meet "validate" requirements...and the schedulers may have been restarted or something. |
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
The validator is currently falling behind again, due to three things off the top of my head: 0. The result/workunit tables are huge and growing more and more unwieldy. Everything in the server backend relies on these tables, so database reads are generally slowed down. 1. To correct #0, db_purge is running (the process that removes/archives finished workunit/results from the database). That's a lot of deletes which sloooows everything down. 2. I happen to also be running a backup from a snapshot of the database. So I'm reading from an active snapshot which is already dealing with significant numbers of copy-on-writes (competing with the deletes from #1). When we get a replica database up, these dependencies will be a thing of the past. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Byron Leigh Hatch @ team Carl Sagan Send message Joined: 5 Jul 99 Posts: 4548 Credit: 35,667,570 RAC: 4 |
Matt Lebofsky wrote ============================= > The validator is currently falling behind again, due to three things off the > top of my head: ============================== Don't worry Matt your doing great job !!! hi Matt , Matt ---- please allow me to -- say Hello and , thank you , for all your hard work and long hours you have put in , on: SETI@home 2 , Boinc , SETI@home Classic and setting up all the hardware, servers and Computers. Matt good luck with your , musical Band and -- your Music Matt --- I wish and your family: health _ happiness _ peace _ and _ prosperity _ in the , new Year , 2005 friendly and respectful byron |
STE\/E Send message Joined: 29 Mar 03 Posts: 1137 Credit: 5,334,063 RAC: 0 |
> The validator is currently falling behind again, due to three things off the > top of my head: > > 0. The result/workunit tables are huge and growing more and more unwieldy. > Everything in the server backend relies on these tables, so database reads are > generally slowed down. > > 1. To correct #0, db_purge is running (the process that removes/archives > finished workunit/results from the database). That's a lot of deletes which > sloooows everything down. > > 2. I happen to also be running a backup from a snapshot of the database. So > I'm reading from an active snapshot which is already dealing with significant > numbers of copy-on-writes (competing with the deletes from #1). > > When we get a replica database up, these dependencies will be a thing of the > past. > - Matt ============ No problem Matt, I love the Roller Coaster Ride ... hehe |
Alex Plantema Send message Joined: 23 Oct 99 Posts: 35 Credit: 247,181 RAC: 0 |
Matt Lebofsky wrote: > 2. I happen to also be running a backup from a snapshot of the database. So > I'm reading from an active snapshot which is already dealing with significant > numbers of copy-on-writes (competing with the deletes from #1). Can you translate that? Again I found workunits deleted from the results list within hours after receiving credit. |
Paul D. Buck Send message Joined: 19 Jul 00 Posts: 3898 Credit: 1,158,042 RAC: 0 |
> Matt Lebofsky wrote: > > > 2. I happen to also be running a backup from a snapshot of the database. > So > > I'm reading from an active snapshot which is already dealing with > significant > > numbers of copy-on-writes (competing with the deletes from #1). It sounds like he created a copy of the database, but the "copy" is refreshed with new data each time that the source database is updated with each change of the source database. Other databases call this a replication of the database. In many cases the replication is made to ensure a recovery is possible if the source database dies. |
Alex Plantema Send message Joined: 23 Oct 99 Posts: 35 Credit: 247,181 RAC: 0 |
Thanks Paul! |
Paul D. Buck Send message Joined: 19 Jul 00 Posts: 3898 Credit: 1,158,042 RAC: 0 |
> Thanks Paul! Sure! No problem ... Of course I cannot see how Matt can establish a good back up when the database keeps changing. We had this debate several months ago when I pointed out that *I* personally have never seen a "hot" backup that was able to be restored without errors. It got rather interesting there for awhile ... :) In theory, you can make a snapshot of a database and the transaction logs and then back that up and restore it at some future time. My problem has always been that it looks like you have a back-up of the database, but few people test the back up and test for data corruption. It is too hard. If you look around, like Oracle's documentation, you will see that they don't reccommend doing a "hot" back-up unless that is forced upon you by the situation. My experince confirmed this to *MY* satisfaction. Most of the DBAs that I worked with also did not like hot backups. We would make the database stable by removing users and ensuring that all of the pending transactions were committed, then we would make both a back-up and an export (Oracle databases). Then we would bring the database back up and allow connections. With reasonable hardware this is not a hard thing to do, and does not take that much time. |
karthwyne Send message Joined: 24 May 99 Posts: 218 Credit: 5,750,702 RAC: 0 |
> The validator is currently falling behind again, due to three things off the > top of my head: > No worries matt, we know you all are doing the best! we'll be here regardless and thanks to byron for all the kind words that he always has Micah S@h Berkeley's Staff Friends Club |
Byron Leigh Hatch @ team Carl Sagan Send message Joined: 5 Jul 99 Posts: 4548 Credit: 35,667,570 RAC: 4 |
------- I am very sorry to be off the topic , I do sincerely apologize byron -------- _ :( Micah -- [An'nai] -- wrote: =============== > and thanks to byron for all the kind words that he always has > > Micah > > ============== hi Micah , thank you very much , for your very kind words I very much appreciate your kind words _ :) Micah , please allow me to wish you and your family: health _ happiness _ peace _ and _ prosperity _ in the , new Year , 2005 friendly and respectful byron ------- again , I am very sorry to be off the topic , I do sincerely apologize byron -------- _ :( |
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
Regarding the "hot" database backup / snapshot stuff: When I do a backup of the database I do it off a snapshot. I stop all the projects, the web site, basically anything that touches the database. Then I flush the database and halt it. At this point it is in a perfectly stable state as it was cleanly shut down. Then I snapshot the filesystem which contains all the data. This is completely outside of the realm of mysql's control - it's strictly part of the OS and volume manager. Then I restart the database, the projects, etc. At this point, the snapshot is 0 bytes long, but every time a page of N bytes gets altered in the "live" database, the pre-altered N bytes gets added to the snapshot to preserve the original state when the snapshot was made. So while things like the data purge process is running and deleting huge chunks from the live database, the snapshot is busy just trying to maintain its original state by copying the unaltered database pages to itself. From the user perspective, the snapshot remains perfectly stable - but there's a lot of work going on under the hood to keep it this way. That's what I mean about copy-on-writes. Every write to the database results in a copy to the snapshot. Double the I/O, slowing everything down. There is no replica database server right now. If there was, then the backup would be completely painless. Instead of the above, I wouldn't have to stop the projects, web server, etc. at all. The replica database is simply a second instance of mysql running on another machine that, every time the master updates the database, it tells the replica to do the same exact thing. Yes, this is double the I/O but it is twice the machine, so that's not a problem. When it comes time for backup, I simply tell the replica to halt getting updates from the master. It'll be in a perfectly stable state while the master continues chugging along. I then do the backup on the halted replica (which is fast because there is no other I/O) or make a snapshot of it like the above (so we could turn it back on again). - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Byron Leigh Hatch @ team Carl Sagan Send message Joined: 5 Jul 99 Posts: 4548 Credit: 35,667,570 RAC: 4 |
Matt Lebofsky wrote ==================== > Regarding the "hot" database backup / snapshot stuff: > > When I do a backup of the database I do it off a snapshot. I stop all the > projects, the web site, basically anything that touches the database. Then I > flush the database and halt it. At this point it is in a perfectly stable > state as it was cleanly shut down. ==================== Don't worry Matt your doing great job !!! hi Matt , Matt ---- please allow me to -- say Hello and , thank you , for all your hard work and long hours you have put in , on: SETI@home 2 , Boinc , SETI@home Classic and setting up all the hardware, servers and Computers. Matt good luck with your , musical Band and -- your Music Matt --- I wish and your family: health _ happiness _ peace _ and _ prosperity _ in the , new Year , 2005 friendly and respectful byron |
Paul D. Buck Send message Joined: 19 Jul 00 Posts: 3898 Credit: 1,158,042 RAC: 0 |
Matt, > Regarding the "hot" database backup / snapshot stuff: > > When I do a backup of the database I do it off a snapshot. I stop all the > projects, the web site, basically anything that touches the database. Then I > flush the database and halt it. At this point it is in a perfectly stable > state as it was cleanly shut down. > > Then I snapshot the filesystem which contains all the data. This is completely > outside of the realm of mysql's control - it's strictly part of the OS and > volume manager. Then I restart the database, the projects, etc. > > At this point, the snapshot is 0 bytes long, but every time a page of N bytes > gets altered in the "live" database, the pre-altered N bytes gets added to the > snapshot to preserve the original state when the snapshot was made. > > So while things like the data purge process is running and deleting huge > chunks from the live database, the snapshot is busy just trying to maintain > its original state by copying the unaltered database pages to itself. From the > user perspective, the snapshot remains perfectly stable - but there's a lot of > work going on under the hood to keep it this way. > > That's what I mean about copy-on-writes. Every write to the database results > in a copy to the snapshot. Double the I/O, slowing everything down. > > There is no replica database server right now. If there was, then the backup > would be completely painless. Instead of the above, I wouldn't have to stop > the projects, web server, etc. at all. The replica database is simply a second > instance of mysql running on another machine that, every time the master > updates the database, it tells the replica to do the same exact thing. > > Yes, this is double the I/O but it is twice the machine, so that's not a > problem. When it comes time for backup, I simply tell the replica to halt > getting updates from the master. It'll be in a perfectly stable state while > the master continues chugging along. I then do the backup on the halted > replica (which is fast because there is no other I/O) or make a snapshot of it > like the above (so we could turn it back on again). Way cool! :) Now I am happy again ... :) So, yes, you are doing what I was saying, identify a particular checkpoint, which is when you turn off replication, and then backup that stable data set. Once that is done, syncrhonize the database and the replicated database, allow them to track each other, and then do another checkpoint. |
Byron Leigh Hatch @ team Carl Sagan Send message Joined: 5 Jul 99 Posts: 4548 Credit: 35,667,570 RAC: 4 |
:) <A><B>hi Paul , nice to see you , how are you thanks very much . Paul - with out your very good explanation , I would be lost _ :)</B>[/url] friendly and respectful byron |
Paul D. Buck Send message Joined: 19 Jul 00 Posts: 3898 Credit: 1,158,042 RAC: 0 |
Matt, Now you are famous... I quoted you! :) I hope you don't mind ... It is the Glossary, for some strange reason it is in the definition of "Back-Up" ... |
Tom Gutman Send message Joined: 20 Jul 00 Posts: 48 Credit: 219,500 RAC: 0 |
While not trivial, backing up an active database is not that difficult. PARS was doing that 30 years ago. It does require logging as well as the image capture, and was never recommended for when the system was at maximum activity. But there was no need to shut down the system, except for a few minutes for closure. And every RAID 1 system is capable of doing it. That is what happens when you put in a new disk to replace one of the mirrors. Yes, it's a bit easier and faster if you can shut down the data base for the purpose. For some data bases that is no problem -- they are only needed for limited time periods, perhaps 9-5 on work days. Other data bases are required to be available 24/7 with only very short outages (on the order of minutes, if that much) tolerated. ------- Tom Gutman |
Paul D. Buck Send message Joined: 19 Jul 00 Posts: 3898 Credit: 1,158,042 RAC: 0 |
> While not trivial, backing up an active database is not that difficult. PARS > was doing that 30 years ago. It does require logging as well as the image > capture, and was never recommended for when the system was at maximum > activity. But there was no need to shut down the system, except for a few > minutes for closure. I agree, not trivial ... I don't agree that it can be done safely. *Paul's* opinion. So, we won't reach agreement here. I just tell it like I saw it ... Your Mileage obviously varies from mine ... > And every RAID 1 system is capable of doing it. That is what happens when you > put in a new disk to replace one of the mirrors. RAID is a whole 'nother beast. All you are doing there is posting the same sector writes to two logical disks drives (usually also the logical drives map to physical drives on a 1 to 1 basis, but they don't have to). > Yes, it's a bit easier and faster if you can shut down the data base for the > purpose. For some data bases that is no problem -- they are only needed for > limited time periods, perhaps 9-5 on work days. Other data bases are required > to be available 24/7 with only very short outages (on the order of minutes, if > that much) tolerated. Agreed. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.