Question , what does this mean he transitioners keep falling behind because the DB is I/O bound

Message boards : Number crunching : Question , what does this mean he transitioners keep falling behind because the DB is I/O bound
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Sir Ulli
Volunteer tester
Avatar

Send message
Joined: 21 Oct 99
Posts: 2246
Credit: 6,136,250
RAC: 0
Germany
Message 12505 - Posted: 26 Jul 2004, 23:40:36 UTC

i think they are using SCSI Hardware, so this problem have not to be a Problem, or have SUN problems with the Drivers, i am out with this since Win 95...

problems with I/O Output... i thinks...


Greetings from Germany NRW
Ulli [/url]
ID: 12505 · Report as offensive
Profile AthlonRob
Volunteer developer
Avatar

Send message
Joined: 18 May 99
Posts: 378
Credit: 7,041
RAC: 0
United States
Message 12522 - Posted: 27 Jul 2004, 0:47:35 UTC - in response to Message 12505.  

> i think they are using SCSI Hardware, so this problem have not to be a
> Problem, or have SUN problems with the Drivers, i am out with this since Win
> 95...
>
> problems with I/O Output... i thinks...

Just because it's SCSI doesn't mean it's really all that fast... a single disk can only go so fast. You're looking at what... around 30MB/sec sustained data transfer with a top-of-the-line single disk?

In a RAID array (as they're planning on setting up) you'll be able to write to more than one disk at a time, speeding up the process... hopefully.

I guess we'll see on Friday if this *does* actually solve the problem.

Rob
ID: 12522 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 12567 - Posted: 27 Jul 2004, 2:58:30 UTC - in response to Message 12522.  

>
> Just because it's SCSI doesn't mean it's really all that fast... a single disk
> can only go so fast. You're looking at what... around 30MB/sec sustained data
> transfer with a top-of-the-line single disk?
>
> In a RAID array (as they're planning on setting up) you'll be able to write to
> more than one disk at a time, speeding up the process... hopefully.
>
> I guess we'll see on Friday if this *does* actually solve the problem.

Database systems are always I/O bound as the first limit condition. The science applications are, usually, compute bound as a contrast (we will ignore the memory bandwith issues for the moment - which can be a limit condition within your computer) when you look at them from a sysem perspective.

This is complicated by several additional factors. One of them is that they are using MySQL which is not a very mature product when contrasted with Oracle (for example) and iin the current incarnation (version) do not have advanced features like:

1) Row level locking
2) Triggers and stored propcedures
3) binary/bitmapped indexes
4) table partitioning

The choice of MySQL is logical in that the licence costs for BOINC projects is (and properly should be) considered as one of the most important design factors of the system as a whole. As a consequence there limits placed on the developers in what they can and cannot do from a design perspective.

In theory, you should be able to write SQL standard code and run against any database. Real world, not a chance. I did a code generator for schemas that I developed for Oracle's Designer product because it is one of the best tools to make a logical database design. Unfortunately, the code generator even for small test schemas I never got one to work, which is why I wrote my own, at any rate, the actual code generators were very different. And that is for nothing much, just the table creates. So, a "portable" deesign is not likely.

Now, I am going to walk out on a limb here, but, I feel that the developers did not do a very good design. This is not the first time I have said this. But there was no apparant consideration of I/O issues with the database if you look at the schema. The schema is very small and this is usually good. Except with little attention paid to the SQL almost all of the queries were written as a "SELECT *" which means that the entire row of data is read even for qureies that are going to be using as few as one column.

Making matters worse is the design issue of the heavy traffic against the "Results" table. With only table locking, every update means that no other update can happen elsewhere in that table at the same time. This is a result of, in my opinion, a poor appreciation of a data life-cycle for the information.

What can be done?

Well, you can throw hardware at it, and that looks like that is what they are going to do. Using a RAID configuration with "striping" allows greater speeds in the "physical I/O" that occurs between the computer and the disk storage systems. Throwing more memory into the computer allows a larger cache so that more of the "Logical I/O" can occur against more tables at the same time. It also allows you to cache other things like indexes. Adding multiple controller cards for the disk I/O gives you more bandwidth do do reads and writes. Adding 15K RPM disk drives reduces rotational latency, making them Ultra-320's gives you the fastest SCSI transfer speeds...

Looking at the code for the DB does not show any changes based on the July 26th upload.

In Paul's opinion, for what it is worth, the database design is very poor. Since the schema has not changed from the first time that I critizised it, well, either I am wrong, or...

Understand, I am just a fuzzy eared participant like you and I have no more data than the schema. But, before I became disabled this was what I did for a living. Creating a logical database design that the physical DBA would then tune for "best" performance on the target machine. But you cannot create a logical design with out thinking through the issues of what happens when you go live and start putting data into the database.

With children in the room, I shant give you my opinion on the process by which this project went "live". But do admit that I also did not see any external evidence that any sort of "load testing" occurred during the Beta. Of course, from one perspective, this is still Beta software. And I will admit that I had not even considered that I would put all of my machines back to SETI@Home Classic like I have. Predictor is still in Alpha and I have little wish to do more testing in that I cannot follow up problems like a tester should. When that project, or another, does go into production, well, I will consider going there ...

At any rate, does this answer your question? or make new ones?

I do cover some of these issues in the FAQ, Glossary (rotational latency is defined there :) ... ) ... and the other documents (including two of my old college lectures on SQL and PL/SQL).

If you have more questions or want more of my opinion, post and if I don't answer, e-mail me and I will give you a personal reply.
ID: 12567 · Report as offensive
Pathogen

Send message
Joined: 17 May 99
Posts: 34
Credit: 13,549
RAC: 0
United States
Message 12570 - Posted: 27 Jul 2004, 3:10:11 UTC - in response to Message 12567.  

> Now, I am going to walk out on a limb here, but, I feel that the developers
> did not do a very good design.
***
I have to agree with you, Paul. When I read this quote on the main page, I was shocked:
"The good news is that the scheduler is now running as a fast cgi which means it maintains a persistent connection to the DB."

Uh, it just occurred to them that this app needed to have a persistent connection to the DB? And doing it with Fast CGI? What was it before? The scheduler should have been written using Java servlets and Tomcat or JBoss as the container.
ID: 12570 · Report as offensive
Ingleside
Volunteer developer

Send message
Joined: 4 Feb 03
Posts: 1546
Credit: 15,832,022
RAC: 13
Norway
Message 12707 - Posted: 27 Jul 2004, 12:28:25 UTC - in response to Message 12570.  

> I have to agree with you, Paul. When I read this quote on the main page, I was
> shocked:
> "The good news is that the scheduler is now running as a fast cgi which means
> it maintains a persistent connection to the DB."
>
> Uh, it just occurred to them that this app needed to have a persistent
> connection to the DB? And doing it with Fast CGI? What was it before? The
> scheduler should have been written using Java servlets and Tomcat or JBoss as
> the container.

News archive:
"During the down time we also converted the scheduler into a fast-CGI application. Upon bringing the system back up the new fast-CGI scheduler decided to choke on something that it hadn't choked on in Alpha."

Not being a programmer, can't comment on good/bad database or Java better/worse than CGI.

But looking on the very slowly increasing number of beta-testers, many more interested in their score and therefore only crunching a couple wu before going back to classic, probably the only way to really load-test was to release it...
ID: 12707 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 12774 - Posted: 27 Jul 2004, 15:15:43 UTC - in response to Message 12707.  

> > I have to agree with you, Paul. When I read this quote on the main page,
> I was
> > shocked:

Thank you :)


> News archive:
> "During the down time we also converted the scheduler into a fast-CGI
> application. Upon bringing the system back up the new fast-CGI scheduler
> decided to choke on something that it hadn't choked on in Alpha."
>
> Not being a programmer, can't comment on good/bad database or Java
> better/worse than CGI.

Java, depending on the flavor, is generally an interpreted language. There are flavors out there that can be compiled into native code and thus made to run faster.

> But looking on the very slowly increasing number of beta-testers, many more
> interested in their score and therefore only crunching a couple wu before
> going back to classic, probably the only way to really load-test was to
> release it...

Well, you can simulate loads artifically by taking the beta results as they are returned and "fake" additional loads. In other words, if there were, say, 2,500 beta testers; and the current population of active SETI@Home classic participants are 25,000; then each time one of the beta testers returned one result you make 10 "clone" results amd process those also.

Paranoid as I am, I probably would have used a much higher multiple. Also there are other load tests that are more "static" where you just load up the tables with data that is "dummy" but to make the indexes and disk content large enough you can see how the system operates with a large static load.

Personally, I am more irate with the "simplistic" attitude of the coders that did not, as I said, do any of the "natural" paranoid programming efforts that would have reduced I/O.

The one I really cannot understand is the Results table, being probably the most update intensive table in the whole database, and it has 10 indexes. Now that is just plain, um, plain, ah, hmmm, well maybe I should just say that it does not make sense to me at all. The only tables that you put 10 indexes on is one that does not have a high ratio of inserts and updates compared to SELECTs. Indexes only speed up queries, with the trade-off that they slow down everything else. In general, trasaction tables have one or two indexes including the primary key value. A data warehouse table, which is read intensive (SELECTs) is the only place that you start to add indexes because the data is now in a 'static" state and is rarely (usually never) updated.

As soon as I was at the point where I was adding the third index I would have been looking at my data life-cycle and making a change to the schema. I would also be looking hard at the tables because they are very "wide" and that is also not very normal for a relational database. I am not saying that the columns are not necessary, but I would have been looking hard at them to see if there were ways of normalizing out stored values.

Though there combinations are potentially large, I would have been looking to see if there were ways to
normalize out things like os_name/os_version etc. so that the data width of the row is smaller. There are also far too many varchar(254) values and BLOBs in the base records.

In relational database design the goal is to make the rows as "narrow" as possible. This means aggressive "pruning" of the number of columns and each columns data size. For example, the "venue" in the Host table is a varchar(254) and the current venues are "home", "work", "school". Here you create a venue table as:

Venue
------
venue-id small-int
venue-name varchar(254)

this is a slowly changing table, and likely will not grow beyond 10-20 venues, but it allows us to move a varchar out of the base table (in this case Hosts) that has a very large "logical" width. Even though the varchars allow a variable amount of data, and this data is not padded out to the full width, most databases with varchar may not store the varchar as part of the base row. So you have an increase of the logical I/O to read the varchars.

One of the "myths" (urban ledgens) of relational databases is that you "de-normalize for performance" when the opposite is usually true.

Again, this is only Paul's opinion, yours may differ...

ID: 12774 · Report as offensive
Janus
Volunteer developer

Send message
Joined: 4 Dec 01
Posts: 376
Credit: 967,976
RAC: 0
Denmark
Message 12833 - Posted: 27 Jul 2004, 19:17:36 UTC - in response to Message 12774.  

Hopefully the java/Tomcat part was a joke?

The thing about SELECT * is well known. It is done this way to avoid creating problems that are difficult to debug. When the code cools off a bit it will be optimized to only contain the needed fields. (I know it sounds weird but I didn't decide it...)

@Paul: Could you email me the list with the problems you found in their db-layout? I would very much like to take a look at it when I get the time.
ID: 12833 · Report as offensive
Pathogen

Send message
Joined: 17 May 99
Posts: 34
Credit: 13,549
RAC: 0
United States
Message 12850 - Posted: 27 Jul 2004, 21:18:49 UTC - in response to Message 12833.  

> Hopefully the java/Tomcat part was a joke?
***
Not at all. Java is used on the enterprise level by many companies and Tomcat is a very nice container. But if something more "buff" was needed, JBoss would do nicely.
ID: 12850 · Report as offensive
Ingleside
Volunteer developer

Send message
Joined: 4 Feb 03
Posts: 1546
Credit: 15,832,022
RAC: 13
Norway
Message 12867 - Posted: 27 Jul 2004, 22:31:38 UTC - in response to Message 12774.  

> Java, depending on the flavor, is generally an interpreted language. There
> are flavors out there that can be compiled into native code and thus made to
> run faster.
>

Well, for me never learned C, the only immediately apparent difference between the current and previous scheduler is a compiler-argument:

#ifdef _USING_FCGI_
#include "fcgi_stdio.h"
#endif

The rest seems to be the same more or less gibberish C-language, that in my guess is somehow jumbled together by a compiler to an exe or a dll just like boinc_gui.exe....

Someone not yet understanding the difference by *.h & *.c-files, and still looking for a Function or Procedure, or for that matter 10 Goto 30. ;)
ID: 12867 · Report as offensive
Washii

Send message
Joined: 8 Aug 02
Posts: 4
Credit: 111,624
RAC: 0
United States
Message 12872 - Posted: 27 Jul 2004, 22:40:14 UTC
Last modified: 27 Jul 2004, 22:41:54 UTC

Paul D. Buck, you said:
"Java, depending on the flavor, is generally an interpreted language. There are flavors out there that can be compiled into native code and thus made to run faster."

Isn't the whole point to Java the ability to work on any platform, as long as you have the platform's Java Virtual Machine? By 'native code,' do you mean using some type of JIT (Just-In-Time) compiler, which would speed it up? Or something along the lines of compiling the Java code to something fast like C or C++ for the particular system?

To me, it would seem that changing it to native code might not be good in the long run. In the case of hardware changes because something breaks or the fine sponsors of SETI@Home give up nifty new equipment, there could very well be incompatibility in the 'native' code because of a slight platform shift. Of course, my understanding could be rather skewed, as I'm still a budding new Java student.

-Alex
ID: 12872 · Report as offensive
Pathogen

Send message
Joined: 17 May 99
Posts: 34
Credit: 13,549
RAC: 0
United States
Message 12895 - Posted: 27 Jul 2004, 23:39:15 UTC - in response to Message 12872.  

>Isn't the whole point to Java the ability to work on any platform, as long
> as you have the platform's Java Virtual Machine?
***
That is not necc. the "point" of Java but it is definitely one of the strengths of it.

> By 'native code,' do you mean
> using some type of JIT (Just-In-Time) compiler, which would speed it up?
***
No, he means compiling it to object code for the real machine/OS as opposed to bytecode for the JVM.

> To me, it would seem that changing it to native code might not be good in
> the long run. In the case of hardware changes because something breaks or the
> fine sponsors of SETI@Home give up nifty new equipment, there could very well
> be incompatibility in the 'native' code because of a slight platform shift.
***
I'm not a big fan of compiling Java to native object code and frankly, don't think it is necc. in most cases. As you noted, doing so could break the app if the platform is later modified, requiring a recompilation. It should also be noted that by using C or C++, that is the current situation with Seti right now... Contrary to myth, Java is much faster than it used to be. Certainly it is not as fast as C but then again, C is not as fast as FORTRAN for numerical apps, yet Seti is not written in FORTRAN, or is it? IMHO, the slight performance hit you take by using Java is far outweighed by code portability as well as the feature set that becomes availible to you: servlets, JDBC, JSPs etc.etc.


ID: 12895 · Report as offensive
EclipseHA

Send message
Joined: 28 Jul 99
Posts: 1018
Credit: 530,719
RAC: 0
United States
Message 12897 - Posted: 27 Jul 2004, 23:47:03 UTC - in response to Message 12872.  

First off, Paul, you've done a GREAT job explaining your views! Seems the devs should have listened when you raised flags in the beta!

Only "table locks"! (on the result table especially) No wonder it's I/O bound and the transitioners can't keep up! It's a single threaded process to access the table!

Now to the post...

> Paul D. Buck, you said:

>
> Isn't the whole point to Java the ability to work on any platform, as long
> as you have the platform's Java Virtual Machine? By 'native code,' do you mean
> using some type of JIT (Just-In-Time) compiler, which would speed it up?

Being "interperative", the Java VM actually executes "byte codes" - kind of like a "generic" assembly language. Some of the "JIT" stuff says "ok I've executed this before, so I'll drop chunks of assembly (machine) code in it's place.

Truely compiled code does not need a "byte code" interperater of any sort, as the "compile" generates native machine code for the functions needed - like C or C++ do!



>Or
> something along the lines of compiling the Java code to something fast like C
> or C++ for the particular system?

Like I said, really generating "machine code".
>
> To me, it would seem that changing it to native code might not be good in
> the long run. In the case of hardware changes because something breaks or the
> fine sponsors of SETI@Home give up nifty new equipment, there could very well
> be incompatibility in the 'native' code because of a slight platform shift.

You woldn't re-write the Java to machine code - you'd use a compiler!
The Java itself is not tried to the HW - if there's a "compiler" for the new HW, compiled Java could be run there too - worst case is that it's back to "interpereted", but if "compiled" java is required, a compiler for the replacement HW probably should be a requirement anyway! The Java source code with be just as portable, be it interperative or compiled.

BASIC started as an "interperative" language, but vendors started to offer "BASIC compilers" to speed things up. (in the days of the XT and when 4.77 mhz was fast!) (long before VB was a twinkle in MS's eyes)

Before anybody jumps down my throat on this, google "IBM Visual Age" for Java and Smalltalk. I was up to my elbows in the Vm's for this product for a few years, (I mean coding - not just using!)
ID: 12897 · Report as offensive
Profile Darth Dogbytes™
Volunteer tester

Send message
Joined: 30 Jul 03
Posts: 7512
Credit: 2,021,148
RAC: 0
United States
Message 12899 - Posted: 28 Jul 2004, 0:02:51 UTC
Last modified: 28 Jul 2004, 0:13:04 UTC

You sound like a PRO PER (Pro Se) jail house lawyer splitting hairs with a judge. The fact that you are sometimes correct doesn't mitigate the fact, that if you didn't have something or someone to complain about, you'd have the social standing just like a convict; i.e. a "zero." Your postings are akin to forcing everyone to watch you mentally masterbate.

Have a nice day...
Account frozen...
ID: 12899 · Report as offensive
EclipseHA

Send message
Joined: 28 Jul 99
Posts: 1018
Credit: 530,719
RAC: 0
United States
Message 12902 - Posted: 28 Jul 2004, 0:25:56 UTC - in response to Message 12899.  
Last modified: 28 Jul 2004, 0:43:37 UTC

> You sound like a PRO PER (Pro Se) jail house lawyer splitting hairs with a
> judge. The fact that you are sometimes correct doesn't mitigate the fact,
> that if you didn't have something or someone to complain about, you'd have the
> social standing just like a convict; i.e. a "zero." Your postings are akin to
> forcing everyone to watch you mentally masterbate.
>


I take it you're talking to me.. What do I care what you say? You've proved to be a bad source of information by claiming that "all is happy in Boinc land, and if you don't think so, you're a zero"!

Isn't a big PART of getting this crap to work to have people that WILL complain when it's broken? If no one complained, we'd have to deal with the current state for 5 years! If you think the "current state" is OK, you're doing a great deal of damage to the project! You must be a kid to post the gif you did.. Time to buy school supplies!

You remind me of the folks during beta that felt it was a huge priviledge to be part of the beta and wouldn't say a bad word about it!

I've been here since Jun 22, and am still here! It's folks like Dogfart that just don't understand what other's say, and chastise those who don't agree with his world view, that slow down progress!

(BTW, C2 seems to have failed almost as bad as "New Coke"!)

ID: 12902 · Report as offensive
Profile Darth Dogbytes™
Volunteer tester

Send message
Joined: 30 Jul 03
Posts: 7512
Credit: 2,021,148
RAC: 0
United States
Message 12907 - Posted: 28 Jul 2004, 0:46:46 UTC - in response to Message 12902.  
Last modified: 28 Jul 2004, 3:54:03 UTC


> I take it you're talking to me.. What do I care what you say? You've proved
> to be a bad source of information by claiming that "all is happy in Boinc
> land, and if you don't think so, you're a zero"!
>
>
> Isn't a big PART of getting this crap to work to have people that WILL
> complain when it's broken? If no one complained, we'd have to deal with the
> current state for 5 years! If you think the "current state" is OK, you're
> doing a great deal of damage to the project!
>
> You remind me of the folks during beta that felt it was a huge priviledge to
> be part of the beta and wouldn't say a bad word about it!
>
>
The developement of every client and/or application is evoluntionary. Just as human beings started out as simians. But you and people like you don't really strive for excellance, they demand perfection. Even excellance takes time, it doesn't evole immediately. Not only do you demand perfection you want it NOW.

Over in Beta, you never discussed a problem or issue, you attacked. Every post you made was to belittle Dev's and testers alike. I on the other hand reported problems, took issue with many things at many times, but unless the other person was otherwise, I was at least civil. I didn't consider it a privilage to be a Beta tester, it was at many times frustrating, but I was volunteering my energy for something I wanted to see accomplished. You missed the mark however, I did consider it a privilage to be an Alpha tester.

I don't think that the current state of the project is "OK." There will be many changes to come. The project will grow and mature over time, and if anyone remembers "you" it will be as a fart in the winds of time. That too will be evolutionary.


Boinc ALPHA/BETA Tester
Account frozen...
ID: 12907 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 12909 - Posted: 28 Jul 2004, 0:54:07 UTC

Wow,

good discussion. let me take them in order...

janus,

By always programming with explicit column calls as the initial design changes nothing breaks because of movement of column order and in some cases with some databases the changing of data types. So, you always code in the columns because it costs nothing to do when you are writing the selects. Also, this gives you an automatic and intrinsic limit to the I/O. This means that later, like now, we would be looking at those bottle necks that are still killing performance but everything else has be written to not be inefficient by design.

At the current time, in another thread, Rom stated that they are increasing the I/O channel by about a factor of three. Yet, many are not getting work units, we still have many pages turned off in the web site, and little evolution of the queries/schema. There was a note posted some where that there was a change to the transitioner to make it more efficient, but we were still seeing transitioner problems days after that change was made.

Last point, if I write a query that specifies exactly what data I want for the specific purpose I am coding, I fail to see how explicitly listing the columns will make the code easier to debug. In my experience, it is exactly the opposite. When we do "select star" we are announcing to the world that we have not a clue what data we need for the exact task at hand and I don't want to think about that. This is especially puzzling in that at the next level of code up, they explicitly have to read the record into the local routine variables. So, in the worst case I saw, dealing with pending credit, all of the record except one column was pulled from ... wait for it .... the results table.

Like I said earlier. This is going to be the most active table because for each work unit you will have at least one and possibly up to 10 results. Including errors, rejects, do-overs, quorum fillers, etc.

Janus, I probably should just write something up and post it in my stuff ... Then anyone who has an interest can look at it. I will have to think on that a bit. The biggest problem is that (in my opinion) the database is so bad that it staggers even my worst nightmare. The other thing is that my analysis is probably not going to influence any change(s).

Ingleside,

The .h files are "headers" which are intended to allow the compiler to "know" about other parts of the code, the module/procedure/function calls and the data structures so that the compilation can proceed. Ada used a similar mechanism called the Package Specification so that modules can refer to other modules without having to have that other module compiled. Older programming languages had the problem that a module could not refer to another module until that module had been compiled. So module B used module A, "A" had to be compiled before you could compile "B". Which raises the problem that "A" might need to call "B". Now, with circular references you cannot make the compilation.

Alex, Az and Pathogen

I agree with almost all of what you said with just a couple more thoughts thrown in.

The original idea of Java was to be interpreted just as the earlier language called Pascal. The executable code of either system was not a native binary executable module but one that was written for the "virtual" machine (Though there was a microprocessor that did use p-code as its native instruction stream). The portability vs. execution speed is one that is much debated and there are no "best" solutions. If there were, we would be using Ada for a lot more of our critical systems. :)

In theory, using a native instruction stream that uses the instruction set architecture (ISA) of each native machine should give you the best and fastest executing code. The problem is that in the presence of flaws in the system, who cares if we get the answer faster if it is wrong. The problem is that we may or may not recognize that the answer is wrong. Early computers used a system of complex instruction sets which were very rich and capable. The problem was, very few of those instructions got used either because the compiler writers did not know about them or did not feel that they were worth the trouble to use.

So, RISC was born and a new day dawned. Yet in fact, few of the RISC processors in fact had a lower number of instructions than the CISC computers they were going to replace, and just as few executed the entire instruction set within the one cycle standard as proposed by the inventors of the concept.

Even stranger is some HP research in that they wrote an interpreter for one of their Minicomputer's ISA so that they could test some software concepts. The machine that they actually ran it on was ... wait for it ... the machine that they were going to emulate. The most interesting thing that came out was that the interpreter of the instruction streams would run many programs up to 20% faster on the interpreter, even with the additional overhead of the interpreter software. The HP web site as of a few months ago still had articles on it if you are interested, the project's name was "Dynamo".

Anyway, most of the rest of the points I though were well discussed. as in the contrast between the interpreted, JIT and the rest.

Engineering is always about compromise. MySQL was a rational choice because of licensing costs for the intended market of academic institutions doing research using BOINC. To be honest I don't like C, C++ is a nightmare bolted onto an obscenity. Java is not much better. These languages are designed by geeks (which I am, but only on SQL) and do not lend themselves to clarity (see my note in the definition of C++) or understanding. I am not sure that we would have been better off with any other language as the primary programming language for the system. The point was made that for a science application, historically, FORTRAN is the language of choice and would have been mine except we come back to licensing costs and GNU C++ is free ... so again it was a rational choice.

I think that those were rational choices and almost certainly the right choices. And the choices I would have made.

Relational databases have a "friction" when mapped to many OO languages, which C++ is purported to be, and I think this was little appreciated. I think the decision to go "live", if that is what we truly are, and not just in Gamma testing, was ill advised. I also think that there was not enough utilization of the resources in the Beta test and too much resistance to reviews of the product. I don't mean some of the hysterical discussions. Heck, hang around a board long enough you learn who knows what they are talking about and who does not. When JKeck or Jon McLeod the 23rd (I think that is what he is :) ) speaks, I listen very carefully. But there were discussions about the issues and many of them I think would have benefited the project. Perhaps the developer did listen and I am just not aware of those instances where they did.

I think static loading of the database during beta was a reasonable and relatively straight-forward thing to do, and it was not done. Heck, you have a live model with SETI@Home Classic. Just taking those numbers and adding a workstation that fed in 'x' new users a day, and returned 'y' dummy results would have been easy to extrapolate from the old application.

Well, I am not sure I should post this. but what the heck, life is the pits, then you die, then you get dirt on you face and then the worms eat you. The only thing we can be grateful for is that it happens in that order.





ID: 12909 · Report as offensive
Ingleside
Volunteer developer

Send message
Joined: 4 Feb 03
Posts: 1546
Credit: 15,832,022
RAC: 13
Norway
Message 12936 - Posted: 28 Jul 2004, 2:00:47 UTC - in response to Message 12909.  

> At the current time, in another thread, Rom stated that they are increasing
> the I/O channel by about a factor of three. Yet, many are not getting work
> units, we still have many pages turned off in the web site, and little
> evolution of the queries/schema. There was a note posted some where that
> there was a change to the transitioner to make it more efficient, but we were
> still seeing transitioner problems days after that change was made.

Atleast here I can correct you. ;)

"the database currentlly needs a 3Mbit write pipe to the disks and we are not achiving that with the current disk subsystem.
The new disk subsystem is supposed to have a 10Mbit write pipe capability."

In an earlier post Rom mentioned this:
"I continued to investigate ways of bringing down the database server load through the use of batching requests, while another group of people are preparing to move the database to another storage medium that should increase our overall system throughput by a factor of 5 or 6."

Throwing this together, and adding before hitting the limit they accepted 200k results/day and handed out 600k/day... If the batch-processing doubled the output the new hardware will bring the total up to 2 M results/day. "Classic" at its top AFAIK was around 1.8M results/day...


If they also currently runs with a "debug"-database with much uncessesary info starting to actually optimize this should increase the throughput...
ID: 12936 · Report as offensive
EclipseHA

Send message
Joined: 28 Jul 99
Posts: 1018
Credit: 530,719
RAC: 0
United States
Message 12941 - Posted: 28 Jul 2004, 2:20:24 UTC - in response to Message 12914.  
Last modified: 28 Jul 2004, 2:26:44 UTC

> > (BTW, C2 seems to have failed almost as bad as "New Coke"!)
>

No mention of Boinc here, just a note of interest...

Kind of like the meaningless pics that other's post!


Are you paranoid?
ID: 12941 · Report as offensive
Pilgrim
Volunteer tester

Send message
Joined: 5 May 03
Posts: 34
Credit: 12,239
RAC: 0
United States
Message 12943 - Posted: 28 Jul 2004, 2:25:45 UTC - in response to Message 12909.  

> Wow,
>
> good discussion.
>
Hey Paul:

I worked with IBM's mainframe DB2 product from the mid 1980's to the mid 1990's, using IBM's Assembler language (some Fortran too).

I worked on database design and optimization, table normalization, etc., for a large corporation with tables that had many millions of rows, and a foolish number of columns.

Reading some of your comments on SQL confirmed some thoughts I have had with the DB problems.....but I had no idea if my knowledge was pertinent to the other languages/products in use today.

Is Table Normalization still taught with today's languages, and if so, are the principles any different than back "then"? How would a designer approach this project differently than they would a mainframe DB2 project?

And I sure could have used you on my design team.
ID: 12943 · Report as offensive
Pathogen

Send message
Joined: 17 May 99
Posts: 34
Credit: 13,549
RAC: 0
United States
Message 12947 - Posted: 28 Jul 2004, 2:46:58 UTC - in response to Message 12909.  

>To be honest I don't like C, C++ is a nightmare bolted onto an obscenity. Java is not much better. >These languages are designed by geeks (which I am, but only on SQL) and do not lend themselves to >clarity (see my note in the definition of C++) or understanding. I am not sure that we would have >been better off with any other language as the primary programming language for the system.
***
Hi Paul. A lot of good points you made. Just to clarify my points, however, about Java since I am a Java Developer i.e. geek. I was not necc. saying the entire app should have been written in Java. All this rpc stuff, however, should have been. Java excels are network code and things like the scheduler are just begging for J2EE technology like servlets IMHO. I'm no fan of C or C++ either but IF the Seti developers absolutely needed to they could have used JNI to call code in either of those langauges. I also think in general Java makes for much clearer code and forces developers to think more in OO terms. Granted, you can still write a procedural program in Java if you want to for some reason but in C and C++, that is the norm (unfortunately) for a LOT of code out there. Anyway... I don't want to get into a language flame-war. I just thought since you are a self-proclaimed SQL guy, I'd offer the Java perspective...
ID: 12947 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : Question , what does this mean he transitioners keep falling behind because the DB is I/O bound


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.