Message boards :
Number crunching :
Seti, you need to give Standard Credit to GPUs
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Did the analyses come up with a result and if so what was it? I'd assume that 5.5 page document is the result, but I haven't seen it - I only got the executive summary. I'll leave it to the other two to report on their work, so it doesn't get lost in translation. What I will say is that, in common with all complicated programming efforts, any proposed remedies will need testing in the field to confirm that we're on the right track (and to demonstrate to Dr. A that the work is sound). Testing credit awards to a live BOINC community is tricky, but we do have an 'in principle' offer from a neutral BOINC project (i.e. not one in the SETI group) to test the progress so far. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
Did the analyses come up with a result and if so what was it?I only got the executive summary. It's borked. Grant Darwin NT |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Ooh. Do you want to start from scratch ? or do you want the preliminary findings for independent review ? In the interests of openness, I'm good with supplying the 5.5 pages of analyses, if you want to start from that. They're engineering heavy, but well documented. the 5.5 pages refers to an email describing four identified design issues so far, that was sent to Eric, and Bernd from albert@home, before Christmas. That was the set of key findings, Richard mentioned earlier, of myself and another walking credit.cpp and credit related mechanisms with an Engineering (control systems) approach. Probably trimming the email fluff will bring it to 3-4 pages. If so interested, There's also a related spreadsheet document that models possible fixes for the two main current issues, scaling and stability, replacing the undamped averages with PID feedback control. A less detailed observational starting direction might be to simply to look at the definition of the cobblestone scale, apply it to some real MB & AP tasks, and note the discrepancies by factors of about 3 and 1.5 times respectively. Doing so might be a simpler conceptual introduction to the 2 key issues, depending on your familiarity with classical control systems theory. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Did the analyses come up with a result and if so what was it?I only got the executive summary. Pretty much, lol. No bugs actually found :). 4 core (2 key) identified design issues. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Wiggo Send message Joined: 24 Jan 00 Posts: 34744 Credit: 261,360,520 RAC: 489 |
Seeing as the latest "down scaling" seems to have happened at the same time that the latest Nvidia GTX 7xx series cards started crunching here, do you think that this could be a factor Jason or just a coincidence? Cheers. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Seeing as the latest "down scaling" seems to have happened at the same time that the latest Nvidia GTX 7xx series cards started crunching here, do you think that this could be a factor Jason or just a coincidence? Actually there have been 2 'major' downscaling events, the first at CreditNew's introduction that was more subtle (as the stock CPU apps didn't change) and the second traceable to introduction of AVX into the Stock multibeam CPU application at V7's introduction. The logic, with the way estimates are performed now, actually forces GPUs from dictating the overall scaling, by virtue of CPU using whetstone FPU bench for vectorised applications giving a radical underestimate of operation counts. Instabilities in the system can certainly make it look like there are causal links elsewhere, but the resulting behaviour appears to be chaotic, as opposed to deterministic. That commonly happens when you put Scientists and Mathemeticians in charge of developing control systems... things oscillate, ram against the safeties, then explode. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Actually there have been 2 'major' downscaling events, the first at CreditNew's introduction ... Remember, that was late May or early June 2010. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
the 5.5 pages refers to an email describing four identified design issues so far, that was sent to Eric, and Bernd from albert@home, before Christmas. That was the set of key findings, Richard mentioned earlier, of myself and another walking credit.cpp and credit related mechanisms with an Engineering (control systems) approach. Probably trimming the email fluff will bring it to 3-4 pages. If so interested, There's also a related spreadsheet document that models possible fixes for the two main current issues, scaling and stability, replacing the undamped averages with PID feedback control. That email (dated 25 November) - which I helped to draft, and which carries my name as co-signatory - actually prints at under 4 pages. Hence the confusion - my apologies. I assumed you were referring to a more detailed report with, perhaps, function names and line numbers where the offending 'findings' were found. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
That email (dated 25 November) - which I helped to draft, and which carries my name as co-signatory - actually prints at under 4 pages. Hence the confusion - my apologies. I assumed you were referring to a more detailed report with, perhaps, function names and line numbers where where the offending 'findings' were found. IIRC (it's been a while), that particular detail is on the unrelayed issue register spreadsheet first page, though relevance of line numbers & filenames can be tenuous when dealing with design flaws, as opposed to bugs outright. That's mostly because when dealing with design issues, the issues are somewhat abstract (more precisely a different level of abstraction). It's usually conceptual flaws that apply rather than implementation code. In those regards, I don't know if Boinc's mechanisms as designed are formally documented using standard (software engineering) tools like UML etc, or perhaps more appropriately formal protocol specifications etc. [..but i've never seen any...] "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
That email (dated 25 November) - which I helped to draft, and which carries my name as co-signatory - actually prints at under 4 pages. Hence the confusion - my apologies. I assumed you were referring to a more detailed report with, perhaps, function names and line numbers where where the offending 'findings' were found. Fair enough. It's been a while on this side of the globe too, and I'm in that awkward gap between beer and bedtime. But I'd assume "2. Scaling averages undamped", at least, has an identifiable location. Let's see if we can take it forward when the team is fully re-assembled after the holidays, next week. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
That email (dated 25 November) - which I helped to draft, and which carries my name as co-signatory - actually prints at under 4 pages. Hence the confusion - my apologies. I assumed you were referring to a more detailed report with, perhaps, function names and line numbers where where the offending 'findings' were found. Several 'locations' yes. Pretty much 80% of credit.cpp and some connected definitions elsewhere. [search the server in code comments for the word 'misnomer' will bring up one key place] [Edit:] having a quick look, you want everything that feeds [into and out of] this: //credit.cpp, line 580with r.flops_estimate being the 'misnomer'. It's Boinc FPU Whetstone in disguise. Two scalings get applied after that, neither is damped. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
rob smith Send message Joined: 7 Mar 03 Posts: 22200 Credit: 416,307,556 RAC: 380 |
I agree totally with Guy - many "simple fixes" end up causing bigger headaches than the one they set out to fix. If you want any examples take a look at the news feeds around banking software problems many of which have been caused by applying a "simple fix" to cure a minor problem but the "simple fix" has brought some major part of a bank's software to its knees and resulted in thousands of p'd off customers. While Richard has identified the problem he has not (to the best of my understanding) worked out a viable long term solution that Dr.A will accept to cure the imbalance that is the result of design decisions made by.... Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
While Richard has identified the problem he has not (to the best of my understanding) worked out a viable long term solution that Dr.A will accept to cure the imbalance that is the result of design decisions made by.... Not me. Like the rest of you, I just observed the symptoms: the other two tracked down the details. From that point, I'm just a messenger/rapporteur, though I do try to contribute to the overall planning of the long-term trajectory. From what I've heard in my privileged seat rather closer to the heart of the action, there isn't actually much wrong with the original design concept. Where it's fallen down is in the detail of the implementation. That's where the distinction between scientist and engineer comes into play: I've seen it happen first-hand in business too, where the role of scientist is played by the entrepreneur. I've seen some very sharp business ideas, in danger of falling flat on their faces for lack of any attention to the details of implementation. What we have to do now is to avoid making the same mistake ourselves. Now we have a better idea of where the current implementation fails, our next job is to act as midwife for a better implementation. I say midwife, because I don't think it necessarily involves writing the code ourselves: there are people of goodwill on that ProjectPeople list I posted yesterday who could probably write the code more quickly (and in better conformance with 'house style') - simply because of their familiarity with the areas of code concerned. (We're talking about the programs which run on BOINC servers and link to BOINC databases, which is a bit different from the application optimisations our volunteers here spend so much time on). If we can convince those experienced BOINC developers to come on board and contribute to converting the engineering drawings into a functional prototype implementation (and I think we're 90% of the way there), and then take that prototype out for an instrumented test drive, the resulting charts and figures will be the best possible tool for convincing David Anderson, not that his design was wrong, but that it could work even better. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
In other words. The disease is known, the medicine also, now we need to make the patient accept the treatment, which is the most difficult part. For the good of the project (SETI and others) I realy hope Richard, Jason and the other team players who work on the problem, could be success in their task and finally convince Dr. A & Co of the issues and fix the code. The creditnew ideia is fantastic, providing a way to balance the credit between the Boinc universe, what is wrong is the implementation. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Yes, and as Rob says, what is needed is a full, thorough, course of treatment, that goes to the root of the problem. If we just end up with a quick fix for SETI, I'll feel that we have failed the community at large. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
I agree, a long term treatment for the entire Boinc itself is the right thing to do, not just a SETI patch. That could posibility make creditnew be used by other projects at least with less resistance. |
Batter Up Send message Joined: 5 May 99 Posts: 1946 Credit: 24,860,347 RAC: 0 |
All I ask is give me something I can measure in less than three months. It makes making adjustment next to imposable. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Go to Free-DC, it´s gives you a good view how your host is performing. http://stats.free-dc.org/stats.php?page=users&proj=sah&sort=yesterday |
rcthardcore Send message Joined: 23 Nov 08 Posts: 48 Credit: 1,306,006 RAC: 0 |
Definitely using my EVGA GTX 780TI. Going to push it to its limits, whatever they may be. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.