Seti, you need to give Standard Credit to GPUs

Message boards : Number crunching : Seti, you need to give Standard Credit to GPUs
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1463305 - Posted: 11 Jan 2014, 22:59:44 UTC - in response to Message 1463297.  

Did the analyses come up with a result and if so what was it?

Cheers.

I'd assume that 5.5 page document is the result, but I haven't seen it - I only got the executive summary. I'll leave it to the other two to report on their work, so it doesn't get lost in translation.

What I will say is that, in common with all complicated programming efforts, any proposed remedies will need testing in the field to confirm that we're on the right track (and to demonstrate to Dr. A that the work is sound). Testing credit awards to a live BOINC community is tricky, but we do have an 'in principle' offer from a neutral BOINC project (i.e. not one in the SETI group) to test the progress so far.
ID: 1463305 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1463320 - Posted: 11 Jan 2014, 23:28:06 UTC - in response to Message 1463305.  
Last modified: 11 Jan 2014, 23:28:49 UTC

Did the analyses come up with a result and if so what was it?
I only got the executive summary.

It's borked.
Grant
Darwin NT
ID: 1463320 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1463338 - Posted: 12 Jan 2014, 0:01:14 UTC - in response to Message 1463258.  
Last modified: 12 Jan 2014, 0:13:15 UTC

Ooh. Do you want to start from scratch ? or do you want the preliminary findings for independent review ? In the interests of openness, I'm good with supplying the 5.5 pages of analyses, if you want to start from that. They're engineering heavy, but well documented.


Upon a quick scan of credit.cpp, I see that previous results weigh the current results, which explains why one of the two identical crunchers I have is way behind the other, given that it had much slower GPUs in it for a while.

What is this 5.5 page analyses you speak of? I'm not sure I need to start from the beginning. I'm not totally bewildered by C/C++. I've downloaded the code for SAH but I've never really been able to muster up the motivation to start analyzing it. In this case, credit.cpp is all I need for the moment. I may look at it more closely in a little while.


the 5.5 pages refers to an email describing four identified design issues so far, that was sent to Eric, and Bernd from albert@home, before Christmas. That was the set of key findings, Richard mentioned earlier, of myself and another walking credit.cpp and credit related mechanisms with an Engineering (control systems) approach. Probably trimming the email fluff will bring it to 3-4 pages. If so interested, There's also a related spreadsheet document that models possible fixes for the two main current issues, scaling and stability, replacing the undamped averages with PID feedback control.

A less detailed observational starting direction might be to simply to look at the definition of the cobblestone scale, apply it to some real MB & AP tasks, and note the discrepancies by factors of about 3 and 1.5 times respectively. Doing so might be a simpler conceptual introduction to the 2 key issues, depending on your familiarity with classical control systems theory.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1463338 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1463339 - Posted: 12 Jan 2014, 0:02:39 UTC - in response to Message 1463320.  

Did the analyses come up with a result and if so what was it?
I only got the executive summary.

It's borked.


Pretty much, lol. No bugs actually found :). 4 core (2 key) identified design issues.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1463339 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1463345 - Posted: 12 Jan 2014, 0:13:42 UTC

Seeing as the latest "down scaling" seems to have happened at the same time that the latest Nvidia GTX 7xx series cards started crunching here, do you think that this could be a factor Jason or just a coincidence?

Cheers.
ID: 1463345 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1463351 - Posted: 12 Jan 2014, 0:23:29 UTC - in response to Message 1463345.  
Last modified: 12 Jan 2014, 0:24:38 UTC

Seeing as the latest "down scaling" seems to have happened at the same time that the latest Nvidia GTX 7xx series cards started crunching here, do you think that this could be a factor Jason or just a coincidence?

Cheers.


Actually there have been 2 'major' downscaling events, the first at CreditNew's introduction that was more subtle (as the stock CPU apps didn't change) and the second traceable to introduction of AVX into the Stock multibeam CPU application at V7's introduction.

The logic, with the way estimates are performed now, actually forces GPUs from dictating the overall scaling, by virtue of CPU using whetstone FPU bench for vectorised applications giving a radical underestimate of operation counts.

Instabilities in the system can certainly make it look like there are causal links elsewhere, but the resulting behaviour appears to be chaotic, as opposed to deterministic.

That commonly happens when you put Scientists and Mathemeticians in charge of developing control systems... things oscillate, ram against the safeties, then explode.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1463351 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1463352 - Posted: 12 Jan 2014, 0:26:00 UTC - in response to Message 1463351.  

Actually there have been 2 'major' downscaling events, the first at CreditNew's introduction ...

Remember, that was late May or early June 2010.
ID: 1463352 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1463356 - Posted: 12 Jan 2014, 0:38:12 UTC - in response to Message 1463338.  
Last modified: 12 Jan 2014, 0:47:06 UTC

the 5.5 pages refers to an email describing four identified design issues so far, that was sent to Eric, and Bernd from albert@home, before Christmas. That was the set of key findings, Richard mentioned earlier, of myself and another walking credit.cpp and credit related mechanisms with an Engineering (control systems) approach. Probably trimming the email fluff will bring it to 3-4 pages. If so interested, There's also a related spreadsheet document that models possible fixes for the two main current issues, scaling and stability, replacing the undamped averages with PID feedback control.

A less detailed observational starting direction might be to simply to look at the definition of the cobblestone scale, apply it to some real MB & AP tasks, and note the discrepancies by factors of about 3 and 1.5 times respectively. Doing so might be a simpler conceptual introduction to the 2 key issues, depending on your familiarity with classical control systems theory.

That email (dated 25 November) - which I helped to draft, and which carries my name as co-signatory - actually prints at under 4 pages. Hence the confusion - my apologies. I assumed you were referring to a more detailed report with, perhaps, function names and line numbers where the offending 'findings' were found.
ID: 1463356 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1463358 - Posted: 12 Jan 2014, 0:49:08 UTC - in response to Message 1463356.  
Last modified: 12 Jan 2014, 0:54:37 UTC

That email (dated 25 November) - which I helped to draft, and which carries my name as co-signatory - actually prints at under 4 pages. Hence the confusion - my apologies. I assumed you were referring to a more detailed report with, perhaps, function names and line numbers where where the offending 'findings' were found.


IIRC (it's been a while), that particular detail is on the unrelayed issue register spreadsheet first page, though relevance of line numbers & filenames can be tenuous when dealing with design flaws, as opposed to bugs outright. That's mostly because when dealing with design issues, the issues are somewhat abstract (more precisely a different level of abstraction). It's usually conceptual flaws that apply rather than implementation code. In those regards, I don't know if Boinc's mechanisms as designed are formally documented using standard (software engineering) tools like UML etc, or perhaps more appropriately formal protocol specifications etc. [..but i've never seen any...]
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1463358 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1463364 - Posted: 12 Jan 2014, 0:56:46 UTC - in response to Message 1463358.  

That email (dated 25 November) - which I helped to draft, and which carries my name as co-signatory - actually prints at under 4 pages. Hence the confusion - my apologies. I assumed you were referring to a more detailed report with, perhaps, function names and line numbers where where the offending 'findings' were found.

IIRC (it's been a while), that particular detail is on the unrelayed issue register spreadsheet first page, though relevance of line numbers & filenames can be tenuous when dealing with design flaws, as opposed to bugs outright. That's mostly because when dealing with design issues, the issues are somewhat abstract (more precisely a different level of abstraction), it's usually conceptual flaws that apply rather than implementation code. In those regards, I don't know if Boinc's mechanisms as designed are formally documented using standard tools like UML etc, or perhaps more appropriately formal protocol specifications etc.

Fair enough. It's been a while on this side of the globe too, and I'm in that awkward gap between beer and bedtime.

But I'd assume "2. Scaling averages undamped", at least, has an identifiable location. Let's see if we can take it forward when the team is fully re-assembled after the holidays, next week.
ID: 1463364 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1463366 - Posted: 12 Jan 2014, 1:01:21 UTC - in response to Message 1463364.  
Last modified: 12 Jan 2014, 1:20:19 UTC

That email (dated 25 November) - which I helped to draft, and which carries my name as co-signatory - actually prints at under 4 pages. Hence the confusion - my apologies. I assumed you were referring to a more detailed report with, perhaps, function names and line numbers where where the offending 'findings' were found.

IIRC (it's been a while), that particular detail is on the unrelayed issue register spreadsheet first page, though relevance of line numbers & filenames can be tenuous when dealing with design flaws, as opposed to bugs outright. That's mostly because when dealing with design issues, the issues are somewhat abstract (more precisely a different level of abstraction), it's usually conceptual flaws that apply rather than implementation code. In those regards, I don't know if Boinc's mechanisms as designed are formally documented using standard tools like UML etc, or perhaps more appropriately formal protocol specifications etc.

Fair enough. It's been a while on this side of the globe too, and I'm in that awkward gap between beer and bedtime.

But I'd assume "2. Scaling averages undamped", at least, has an identifiable location. Let's see if we can take it forward when the team is fully re-assembled after the holidays, next week.


Several 'locations' yes. Pretty much 80% of credit.cpp and some connected definitions elsewhere. [search the server in code comments for the word 'misnomer' will bring up one key place]

[Edit:] having a quick look, you want everything that feeds [into and out of] this:
//credit.cpp, line 580
double raw_pfc = (r.elapsed_time * r.flops_estimate);
with r.flops_estimate being the 'misnomer'. It's Boinc FPU Whetstone in disguise.

Two scalings get applied after that, neither is damped.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1463366 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22200
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1463456 - Posted: 12 Jan 2014, 9:31:31 UTC

I agree totally with Guy - many "simple fixes" end up causing bigger headaches than the one they set out to fix. If you want any examples take a look at the news feeds around banking software problems many of which have been caused by applying a "simple fix" to cure a minor problem but the "simple fix" has brought some major part of a bank's software to its knees and resulted in thousands of p'd off customers.
While Richard has identified the problem he has not (to the best of my understanding) worked out a viable long term solution that Dr.A will accept to cure the imbalance that is the result of design decisions made by....
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1463456 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1463465 - Posted: 12 Jan 2014, 10:09:43 UTC - in response to Message 1463456.  

While Richard has identified the problem he has not (to the best of my understanding) worked out a viable long term solution that Dr.A will accept to cure the imbalance that is the result of design decisions made by....

Not me. Like the rest of you, I just observed the symptoms: the other two tracked down the details. From that point, I'm just a messenger/rapporteur, though I do try to contribute to the overall planning of the long-term trajectory.

From what I've heard in my privileged seat rather closer to the heart of the action, there isn't actually much wrong with the original design concept. Where it's fallen down is in the detail of the implementation. That's where the distinction between scientist and engineer comes into play: I've seen it happen first-hand in business too, where the role of scientist is played by the entrepreneur. I've seen some very sharp business ideas, in danger of falling flat on their faces for lack of any attention to the details of implementation.

What we have to do now is to avoid making the same mistake ourselves. Now we have a better idea of where the current implementation fails, our next job is to act as midwife for a better implementation. I say midwife, because I don't think it necessarily involves writing the code ourselves: there are people of goodwill on that ProjectPeople list I posted yesterday who could probably write the code more quickly (and in better conformance with 'house style') - simply because of their familiarity with the areas of code concerned. (We're talking about the programs which run on BOINC servers and link to BOINC databases, which is a bit different from the application optimisations our volunteers here spend so much time on).

If we can convince those experienced BOINC developers to come on board and contribute to converting the engineering drawings into a functional prototype implementation (and I think we're 90% of the way there), and then take that prototype out for an instrumented test drive, the resulting charts and figures will be the best possible tool for convincing David Anderson, not that his design was wrong, but that it could work even better.
ID: 1463465 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1463483 - Posted: 12 Jan 2014, 10:48:26 UTC
Last modified: 12 Jan 2014, 10:51:51 UTC

In other words. The disease is known, the medicine also, now we need to make the patient accept the treatment, which is the most difficult part. For the good of the project (SETI and others) I realy hope Richard, Jason and the other team players who work on the problem, could be success in their task and finally convince Dr. A & Co of the issues and fix the code. The creditnew ideia is fantastic, providing a way to balance the credit between the Boinc universe, what is wrong is the implementation.
ID: 1463483 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1463485 - Posted: 12 Jan 2014, 10:53:23 UTC - in response to Message 1463483.  

Yes, and as Rob says, what is needed is a full, thorough, course of treatment, that goes to the root of the problem. If we just end up with a quick fix for SETI, I'll feel that we have failed the community at large.
ID: 1463485 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1463491 - Posted: 12 Jan 2014, 11:07:36 UTC

I agree, a long term treatment for the entire Boinc itself is the right thing to do, not just a SETI patch. That could posibility make creditnew be used by other projects at least with less resistance.
ID: 1463491 · Report as offensive
Batter Up
Avatar

Send message
Joined: 5 May 99
Posts: 1946
Credit: 24,860,347
RAC: 0
United States
Message 1463659 - Posted: 12 Jan 2014, 19:00:56 UTC

All I ask is give me something I can measure in less than three months. It makes making adjustment next to imposable.
ID: 1463659 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1463720 - Posted: 12 Jan 2014, 22:10:22 UTC

Go to Free-DC, it´s gives you a good view how your host is performing.

http://stats.free-dc.org/stats.php?page=users&proj=sah&sort=yesterday
ID: 1463720 · Report as offensive
rcthardcore

Send message
Joined: 23 Nov 08
Posts: 48
Credit: 1,306,006
RAC: 0
United States
Message 1464885 - Posted: 16 Jan 2014, 5:14:10 UTC

Definitely using my EVGA GTX 780TI. Going to push it to its limits, whatever they may be.
ID: 1464885 · Report as offensive
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Seti, you need to give Standard Credit to GPUs


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.