Message boards :
Number crunching :
We have lost data for the 2nd time!
Message board moderation
Author | Message |
---|---|
Purdy Send message Joined: 3 Apr 99 Posts: 76 Credit: 42 RAC: 0 |
In the ‘real world’ RAID disks with databases crash all the time. This does not mean all data is lost. Normally database systems are recovered after 1-2 hours and only few transactions are lost. BOINC is taking days and days to recover and is loosing 8 days of data after a simple hardware failure. Databases normally use backups and data logs to roll forward and recover all transactions up to the last minute. This is one of the main purposes of using databases to recover quickly and with minimum loss of data. We have two possibilities: 1) MySQL used by BOINC is a useless RDBMS and can not guaranty data integrity and recovery. 2) BOINC does not have any DBAs and the developers do not have a clue about databases' restart and recovery procedures. Berkeley you can not just say "Oh it was a major hardware failure there was nothing we could have done" . . . We are not that stupid! |
Darth Dogbytes™ Send message Joined: 30 Jul 03 Posts: 7512 Credit: 2,021,148 RAC: 0 |
And the answer is...? 1, or 2, or all the above, or none of the above? The Polls are open. Account frozen... |
EclipseHA Send message Joined: 28 Jul 99 Posts: 1018 Credit: 530,719 RAC: 0 |
Basically, the "high end SNAP box" should not have caused this problem if it were running correctly, be it 1) or 2). The real question is, was it running correctly? If not, why was the project "opened up" again? Heck, the SNAP box is hot swapable, and should recover from a disk failure! That's one of it's basic functions! (it's HW RAID, after all!) I'll guess there's more to this failure than anyone outside the "inner circle" will ever hear! For example, why didn't they copy the DB off the SW raid they were using before SNAP? Considering the timing of the copy to SNAP, and the system being back, they would only have lost one day's worth of data! (unless they had beta/alpha running during the time "production" was down!) During that Day, Matt kept tring to figure out why the forums were getting screwed up! (alas, those messages from Matt are lost....) |
JAF Send message Joined: 9 Aug 00 Posts: 289 Credit: 168,721 RAC: 0 |
Whatever method they use, 7 or 8 days of data loss is really (to me) unacceptable (if that's what it turns out to be). This looks like a project that is in "panic mode"; making hardware and software changes on the fly. Hardware problems should be planned for, since they are going to happen. Major software problems should have been caught in beta testing. Its not like the number of potential participants was unknown; the classic Seti numbers were known. To me, for running my three computers, 4/7, for a week, and finding out I wasted my time and money (energy) because the Boinc Seti team did not have an adequate data backup plan, is very disappointing. And then not having the "balls" to acknowledge it is even worse. I hope I am wrong. |
EclipseHA Send message Joined: 28 Jul 99 Posts: 1018 Credit: 530,719 RAC: 0 |
> Whatever method they use, 7 or 8 days of data loss is really (to me) > unacceptable (if that's what it turns out to be). > > This looks like a project that is in "panic mode"; making hardware and > software changes on the fly. > > Hardware problems should be planned for, since they are going to happen. Major > software problems should have been caught in beta testing. Its not like the > number of potential participants was unknown; the classic Seti numbers were > known. Remember, they have less than 1/10 of the active Classic user base right now... That's how far "out of the box" they were. There's no excuse (active is ~200,000, while there are ~5m registered!) > To me, for running my three computers, 4/7, for a week, and finding out I > wasted my time and money (energy) because the Boinc Seti team did not have an > adequate data backup plan, is very disappointing. And then not having the > "balls" to acknowledge it is even worse. Look at the specs on their new SNAP box.... If a disk get's toasted, throw in a new one, and data's recovered. If they didn't have faith in the SNAP box, they could have brought the project down for a few hours (as if we'd notice with the amount of downtime!) and spin the DB off to tape or backup storage! Seti/Boinc has never had the "balls" to admit they had a problem - look back thru the news... It's almost never "them", but it's the HW, the phase of the moon, too many users, the SNAP box, etc, etc, etc! > > I hope I am wrong. > You're not....... |
PT Send message Joined: 19 May 99 Posts: 231 Credit: 902,910 RAC: 0 |
It is very obvious that they’ve failed “Big Time†in backup procedures. It’s a shame that they spoil so much work – not talking about faith and credibility in this project. I’m starting to get very annoyed and thinking about to skip the SETI project in total. That’s the feelings I have today! I’ve been trough many huge projects but never seen such a “screwed up†as this one! |
GlaBotKi Send message Joined: 28 Aug 99 Posts: 1 Credit: 29,713 RAC: 0 |
> It is very obvious that they’ve failed “Big Time†in backup procedures. It’s a > shame that they spoil so much work – not talking about faith and credibility > in this project. > I’m starting to get very annoyed and thinking about to skip the SETI project > in total. That’s the feelings I have today! > I’ve been trough many huge projects but never seen such a “screwed up†as this > one! > > >I think, you are absolutely right. It seems to be an adolescent joke, but not a serious project. Michael |
Ingleside Send message Joined: 4 Feb 03 Posts: 1546 Credit: 15,832,022 RAC: 13 |
> BOINC is taking days and days to > recover and is loosing 8 days of data after a simple hardware failure. > Uhm, they lost 1 day of data, remember the move started 11. August, and they wasn't up and running again before 16. August and had crashed the 17. August. Ok, the forums was up 13. but losing some forum-posts isn't really a problem. Not knowing much about databases or raid can't really say anything of good/bad database or anything, but atleast in my understanding the raid shouldn't get corrupted unless atleast 2 disks crashed... |
Mattewan Send message Joined: 12 Jan 02 Posts: 14 Credit: 3,281,397 RAC: 0 |
i would personally say, give them a break its a new project, thats why normal SETI is still running you have to expect problems when a new project like this starts ok maybe the dataloss was eccessive, and could have been prevented, but its at such an early stage in the project that i would personally say it doesnt really matter too much if this continues to happen 2 years down the line then fair enough |
Ramón Bultó y Belén Perales Send message Joined: 28 Feb 00 Posts: 1 Credit: 16,606 RAC: 0 |
> you have to expect problems when a new project like this starts That's what the beta phase is for. > ok maybe the dataloss was eccessive, and could have been prevented, but its > at such an early stage in the project that i would personally say it doesnt > really matter too much Then don't open it up to everyone! > if this continues to happen 2 years down the line then fair enough I don't think many users will keep their faith on the project if this continues for a month or so. We have had a lot of patience already. Ramón Bultó |
John McLeod VII Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0 |
> > you have to expect problems when a new project like this starts > That's what the beta phase is for. > > > ok maybe the dataloss was eccessive, and could have been prevented, but > its > at such an early stage in the project that i would personally say it > doesnt > > really matter too much > Then don't open it up to everyone! > > > if this continues to happen 2 years down the line then fair enough > I don't think many users will keep their faith on the project if this > continues for a month or so. We have had a lot of patience already. > Ramón Bultó > The servers were not being stressed until they did open it up to everyone. It is impossible to fix what is apparently working correctly. |
Bakareth Send message Joined: 31 Aug 01 Posts: 44 Credit: 7,619,743 RAC: 0 |
Saucer of milk anyone??? It must be a great help to the Berkeley team to have many of their supporters (yeah, remember we are supporting a non-profit research project here) doing nothing but bitch about their work. Give them a break or get lost. I, for one, am fed-up reading your rants. Robert |
Matthew Baker Send message Joined: 15 May 99 Posts: 15 Credit: 307,219 RAC: 0 |
Oh the humanity! |
Purdy Send message Joined: 3 Apr 99 Posts: 76 Credit: 42 RAC: 0 |
News August 20, 2004 "We are currently working on getting the alpha/beta projects working again, as well as getting new workunits generated so that when we restart the public SETI@home project there will be work to send out to the clients. Another note about the database restoration: All user profile/preferences updates between August 13th and 18th were lost as well." Berkeley why can you be honest and say "We have also lost all the results and workunits you have been crunching for the last 9 days"? |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.