Should we start over from scratch?

Message boards : Number crunching : Should we start over from scratch?
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Geek@Play
Volunteer tester
Avatar

Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,146,931
RAC: 0
United States
Message 76594 - Posted: 4 Feb 2005, 23:03:09 UTC

It appears to me, after 3 attempts to migrate the DB to the new server hardware, that the DB is greatly compromised and that parts of it may not be recoverable.

Since admin has been unable to migrate the DB to the new server successfully, I propose the following. Wheel in the new hardware, hook it up and start it up creating a new DB. Users would be asked to reattach each client computer. Admin would then have all the time they need to unravel the old DB and credit those users with the credit they have earned from the old DB.

Anyone else agree with this?

Boinc....Boinc....Boinc....Boinc....
ID: 76594 · Report as offensive
Profile Speedy67 & Friends
Volunteer tester
Avatar

Send message
Joined: 14 Jul 99
Posts: 335
Credit: 1,178,138
RAC: 0
Netherlands
Message 76596 - Posted: 4 Feb 2005, 23:06:30 UTC - in response to Message 76594.  
Last modified: 4 Feb 2005, 23:07:12 UTC

> Anyone else agree with this?

As long as all the pending credits are rewarded too.. it will be quite a job to reattach all of my client computers.. they're spread over a 75 mile radius.. :)




ID: 76596 · Report as offensive
Hans Dorn
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 2262
Credit: 26,448,570
RAC: 0
Germany
Message 76605 - Posted: 4 Feb 2005, 23:27:22 UTC - in response to Message 76594.  

> It appears to me, after 3 attempts to migrate the DB to the new server
> hardware, that the DB is greatly compromised and that parts of it may not be
> recoverable.
>
> Since admin has been unable to migrate the DB to the new server successfully,
> I propose the following. Wheel in the new hardware, hook it up and start it
> up creating a new DB. Users would be asked to reattach each client computer.
> Admin would then have all the time they need to unravel the old DB and credit
> those users with the credit they have earned from the old DB.
>
> Anyone else agree with this?
>
>

Yep. Just move over the user accounts and discard all work in progress.

Regards Hans

ID: 76605 · Report as offensive
Divide Overflow
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 365
Credit: 131,684
RAC: 0
United States
Message 76650 - Posted: 5 Feb 2005, 2:09:42 UTC - in response to Message 76594.  

> Anyone else agree with this?

Others might, but I certainly don't.


ID: 76650 · Report as offensive
Hans Dorn
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 2262
Credit: 26,448,570
RAC: 0
Germany
Message 76653 - Posted: 5 Feb 2005, 2:20:27 UTC - in response to Message 76650.  

> > Anyone else agree with this?
>
> Others might, but I certainly don't.
>
>

OK, I guess this suggestion may sound a bit harsh. But since the data on the old servers isn't lost,
it may be possible to slowly move it over to the new hardware later.

Regards Hans

ID: 76653 · Report as offensive
EclipseHA

Send message
Joined: 28 Jul 99
Posts: 1018
Credit: 530,719
RAC: 0
United States
Message 76655 - Posted: 5 Feb 2005, 2:24:50 UTC - in response to Message 76650.  

The real question is "what is the DB busy doing"

- Per the news, it's too busy to add new WU's from the splitter
- It also seems that it's not busy validating, as that's not happening
- Is is busy sending out WU's? (guess not, as they are rare)
- How about doing transition? - well, I guess not
- does the file_deleter have anything to delete if things aren't validated or transitioned?
- did the Borg assimilator the DB?

It just seems the DB is all tied up doing something other than what the DB should be doing! (kind of like finding "important stuff" when the Honey-do list gets too long)

If the DB is spending all it's time refusing connections and doing nothing more, there does seem to be a big bug in the DB code!



ID: 76655 · Report as offensive
Profile Rom Walton (BOINC)
Volunteer tester
Avatar

Send message
Joined: 28 Apr 00
Posts: 579
Credit: 130,733
RAC: 0
United States
Message 76756 - Posted: 5 Feb 2005, 8:19:26 UTC

The DB is currently IO bound because a backup job is in progress.

I would guess it is part of the migration process.

The machine the DB currently runs on is pretty old. It's one of the Sun D220R (2 x 440MHz Sparc, 2 GB RAM). The machine it is being migrated too is a new dual proc opteron ( 2.4GHz, I think ) with 6GB-8GB of ram. It is expandable to four processors and 16GB of ram if we need it.

I just think that we finally topped out on what the old machine was capable of handling, heck I just bought a PocketPC with a 600+MHz RISC processor that is probably able to keep up with it. I'm surprised it lasted this long.

----- Rom
BOINC Development Team, U.C. Berkeley
My Blog
ID: 76756 · Report as offensive
Profile Speedy67 & Friends
Volunteer tester
Avatar

Send message
Joined: 14 Jul 99
Posts: 335
Credit: 1,178,138
RAC: 0
Netherlands
Message 76763 - Posted: 5 Feb 2005, 9:22:31 UTC - in response to Message 76655.  

> The real question is "what is the DB busy doing"
>
> - Per the news, it's too busy to add new WU's from the splitter
> - It also seems that it's not busy validating, as that's not happening
> - Is is busy sending out WU's? (guess not, as they are rare)
> - How about doing transition? - well, I guess not
> - does the file_deleter have anything to delete if things aren't validated or
> transitioned?
> - did the Borg assimilator the DB?
>
> It just seems the DB is all tied up doing something other than what the DB
> should be doing! (kind of like finding "important stuff" when the Honey-do
> list gets too long)
>
> If the DB is spending all it's time refusing connections and doing nothing
> more, there does seem to be a big bug in the DB code!
>

This was in the Technial News section on Janauary 26, and referred to a couple times later as it is still happening:

--quote--
Meanwhile the current database is being artificially slowed for reasons we have yet to determine. Basically, something internal to mysql caused it to suddenly read 5 megabytes/sec from the data disks. This started last Friday and hasn't stopped since. Even when there are no queries happening there are major amounts of disk I/O. Everything is working, just a little slower than it should.
--unquote--

So maybe the last one of your options is the right one. :)




ID: 76763 · Report as offensive
Profile FloridaBear
Avatar

Send message
Joined: 28 Mar 02
Posts: 117
Credit: 6,480,773
RAC: 0
United States
Message 76824 - Posted: 5 Feb 2005, 17:09:30 UTC

--quote--
Basically, something internal to mysql caused it to suddenly read 5 megabytes/sec from the data disks.
--quote--

I was actually kind of amused by this when I first read it. My striped RAID (2x120 GB SATA 7200 RPM) in my PC benchmarks at 99 MB/sec for sequential reads ("only" 53 random) in Sandra, so 5 MB/sec is practically nothing these days. All from a 2-year old $100 mainboard and two $100 hard drives. More evidence that their hardware is just not up to the task. I'm very happy to hear that they're moving away from Sparc...the bang/buck is so much better with Intel or AMD now. I've dealt with Sun for years, and just can't understand why they are still selling 0.5 GHz CPUs (64-bit or not). It sure seems to me that they have fallen behind in the hardware game...perhaps that's one reason why their stock chart looks so awful ;)

ID: 76824 · Report as offensive
EclipseHA

Send message
Joined: 28 Jul 99
Posts: 1018
Credit: 530,719
RAC: 0
United States
Message 76985 - Posted: 6 Feb 2005, 3:00:13 UTC - in response to Message 76824.  

> --quote--
> Basically, something internal to mysql caused it to suddenly read 5
> megabytes/sec from the data disks.
> --quote--
>
> I was actually kind of amused by this when I first read it. My striped RAID
> (2x120 GB SATA 7200 RPM) in my PC benchmarks at 99 MB/sec for sequential reads
> ("only" 53 random) in Sandra, so 5 MB/sec is practically nothing these days.
> All from a 2-year old $100 mainboard and two $100 hard drives. More evidence
> that their hardware is just not up to the task. I'm very happy to hear that
> they're moving away from Sparc..

I guess you weren't running the project when they tried to move to the Snap Applicance box for the DB... That was a hoot!

Something they can't ID hitting the DB at 5mb/sec? It's either the Borg or there's a bug in some code they cant find! As MySql is used by many, I doubt that by itself, it would suddenly start slamming the DB!

Dollars to dounuts, it's some code boinc code that is scanning the DB and ignoring (or not seeing) an error return, and like the Eveready Bunny, just keeps going, and going, and going.....
ID: 76985 · Report as offensive
EclipseHA

Send message
Joined: 28 Jul 99
Posts: 1018
Credit: 530,719
RAC: 0
United States
Message 76987 - Posted: 6 Feb 2005, 3:06:20 UTC - in response to Message 76756.  

> The DB is currently IO bound because a backup job is in progress.
>
> I would guess it is part of the migration process.
>
> The machine the DB currently runs on is pretty old. It's one of the Sun D220R
> (2 x 440MHz Sparc, 2 GB RAM). The machine it is being migrated too is a new
> dual proc opteron ( 2.4GHz, I think ) with 6GB-8GB of ram. It is expandable
> to four processors and 16GB of ram if we need it.
>
> I just think that we finally topped out on what the old machine was capable of
> handling, heck I just bought a PocketPC with a 600+MHz RISC processor that is
> probably able to keep up with it. I'm surprised it lasted this long.
>
>

Rom.. It's interesting that you didn't note the unexplained 5mb/sec hits that others on this thread are referencing! Maybe it's not that the HW is "topped out"! :)
ID: 76987 · Report as offensive
Profile ghstwolf
Volunteer tester
Avatar

Send message
Joined: 14 Oct 04
Posts: 322
Credit: 55,806
RAC: 0
United States
Message 76992 - Posted: 6 Feb 2005, 3:37:08 UTC

Maybe I'm missing something important, that would stop this from working. But why can't the raid array from the current DB machine be pulled, then clone the data. The new DB machine is ready to run, so everything is in it now, add one drive clone data, remove, and repeat. Or (if possible) install the array all at once (then clone to the new DBs array).

The only issue I see, is it means shutting everything down. This isn't for a couple hours either, maybe a day or so. Feel free to tell me what is wrong with this plan, other than it means shutting down (we are already at a crawl).


Still looking for something profound or inspirational to place here.
ID: 76992 · Report as offensive
Profile Rom Walton (BOINC)
Volunteer tester
Avatar

Send message
Joined: 28 Apr 00
Posts: 579
Credit: 130,733
RAC: 0
United States
Message 77009 - Posted: 6 Feb 2005, 6:00:58 UTC - in response to Message 76987.  

> Rom.. It's interesting that you didn't note the unexplained 5mb/sec hits that
> others on this thread are referencing! Maybe it's not that the HW is "topped
> out"! :)

I didn't try to explain it because I haven’t been part of the investigation team, therefore I don’t have any information on it.

David has had me focusing in on the client-side part of the software stack, after we get the next release out the door, he might re-task me, who knows.

I figure that if we went through another round of optimization with the server-side stack we could probably get another 100% out of the software, but it still wouldn’t be enough to handle the load of classic shutting down. So in any event, it was still logical to upgrade the database server. If we top out the ability of that server when it is completely decked out, then we’ll have to scale horizontally.

Personally, I would really love to redesign the database interface based in stored procedures, but until mysql supports them its not possible.

Guess we’ll see what the future holds.

----- Rom
BOINC Development Team, U.C. Berkeley
My Blog
ID: 77009 · Report as offensive
Profile Jim Baize
Volunteer tester

Send message
Joined: 6 May 00
Posts: 758
Credit: 149,536
RAC: 0
United States
Message 77142 - Posted: 6 Feb 2005, 17:25:57 UTC - in response to Message 77009.  

Rom,

I think a lot of us look to you as being all knowing about BOINC partly because you come here and talk to us. We do appreciate your feed back on the projects and your work and dedication.

Thank you.

Jim

> I didn't try to explain it because I haven’t been part of the investigation
> team, therefore I don’t have any information on it.
>
> David has had me focusing in on the client-side part of the software stack,
> after we get the next release out the door, he might re-task me, who knows.
>
> I figure that if we went through another round of optimization with the
> server-side stack we could probably get another 100% out of the software, but
> it still wouldn’t be enough to handle the load of classic shutting down. So
> in any event, it was still logical to upgrade the database server. If we top
> out the ability of that server when it is completely decked out, then we’ll
> have to scale horizontally.
>
> Personally, I would really love to redesign the database interface based in
> stored procedures, but until mysql supports them its not possible.
>
> Guess we’ll see what the future holds.
>
>
ID: 77142 · Report as offensive
Profile FloridaBear
Avatar

Send message
Joined: 28 Mar 02
Posts: 117
Credit: 6,480,773
RAC: 0
United States
Message 78082 - Posted: 10 Feb 2005, 15:05:20 UTC - in response to Message 77142.  

> Rom,
>
> I think a lot of us look to you as being all knowing about BOINC partly
> because you come here and talk to us. We do appreciate your feed back on the
> projects and your work and dedication.
>
> Thank you.
>
> Jim

Absolutely. I think feedback from the development team is vital to the health of a project like this, and you've certainly done your part. A big thank you from me too :)

ID: 78082 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 78096 - Posted: 10 Feb 2005, 16:22:26 UTC - in response to Message 76985.  


> Something they can't ID hitting the DB at 5mb/sec? It's either the Borg or
> there's a bug in some code they cant find! As MySql is used by many, I doubt
> that by itself, it would suddenly start slamming the DB!

Just so I understand this:

In your distinguished career as a developer, you've never seen a system that behaved in strange and inexplicable ways right up to the moment that you and/or your team finally saw the reason?
ID: 78096 · Report as offensive

Message boards : Number crunching : Should we start over from scratch?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.