Update on Linux 64 -Nividia-V8-MB ?????

Message boards : Number crunching : Update on Linux 64 -Nividia-V8-MB ?????
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1771530 - Posted: 14 Mar 2016, 10:35:41 UTC - in response to Message 1771529.  

Estimates don't matter.
as per previous example if the % of resource share is 50, then BOINC will schedule tasks, so that the project gets 50% of available time. [task availability permitting] IIRC GPUs have a factor attached. it's been a long time that I walked that area.


Estimates matter, at both server issue, and for client side scheduling. In fact they are key to avoiding the boom/bust behaviour seen throughout. They are the main direct cause of oscillation in the mechanism.

It's purely time allocation. I don't have the time right now to shove the code under your nose.
We are talking about a BOINC v7 with the revamped work fetch and scheduling mechanism.


With estimates all over the shop, anything after that is nonsensical... As even with fixed timeslots, you aren;t requesting the right amount of work.

that affects cache, not the crunch time allocated to a project.
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1771530 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1771531 - Posted: 14 Mar 2016, 10:37:49 UTC - in response to Message 1771530.  

that affects cache, not the crunch time allocated to a project.


Not the crunch time, the server and clients estimates of your crunch time. Don't confuse reality with a crappy simulation (much as it might seem appropriate sometimes).
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1771531 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1771545 - Posted: 14 Mar 2016, 11:44:58 UTC - in response to Message 1770924.  

Guys, if you've quite finished, I think all this goes back to...

Guys, don't know what are you talking about...when there's no job for my Quadro 1700 on Ubuntu 14.04LTS x64:
http://setiathome.berkeley.edu/results.php?hostid=7784676
:/

All it works is on SETi@home BETA, which is throttled on 10%...while SETi@home is on 100%!
;)

We're interpreting those %ages as a resource share question, yes? Can't be adding up to 110%.

Resource share, in the long term and without external constraint (like no application being available on the main project yet, hence no tasks) is maintained by Client work fetch decisions - overworked/underworked projects fetch fewer/more workunits respectively.

The whitepaper on this subject - ClientSchedOctTen - talks in terms of REC:

The recent estimated credit REC(P) f a project P is maintained by the client...

The estimated credit for a T-second segment of job execution is given by

T * ninstances(P) * peak_flops(P)

For typical hardware, a project with GPU applications will accrue REC far faster than a project with CPU applications only (because of the higher peak_flops). That would (and does) depress the work fetch priority greatly, skewing the time allocation between projects. Projects with CPU applications only divide their CPU time (in the long-term average) in accordance with Resource Share: projects with GPU applications tend to do no CPU work at all, because the REC contribution from even a short period of GPU crunching swamps the CPU projects.

That's from the white paper. I suspect the actual implementation (which would require reference back to the notes from the code-walking which took place three years ago) differs in at least two respects:

1) The white paper talks in terms of a single 'scheduling priority': in practice, there are two priority values maintained - one to decide which project to request work from next, and the other to decide which cached task to run next.

2) The white paper talks in terms of 'peak_flops' only. Unless there's some push back from the server in terms of APR, we know that the client only knows 'benchmark' speeds for CPUs (which underestimate throughput), and 'marketing' speeds for GPUs (which overestimate throughput).

That would tend to suggest that the client would form an even more skewed assessment of the Resource Share achieved in practice between CPU and GPU projects.
ID: 1771545 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1771547 - Posted: 14 Mar 2016, 11:53:36 UTC - in response to Message 1771545.  

Thanks Richard, Yes Chaos is a harsh mistress. Here is a video on the subject Julia Saori tweeted me: https://youtu.be/fUsePzlOmxw
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1771547 · Report as offensive
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1771578 - Posted: 14 Mar 2016, 15:28:39 UTC

Peak flops are from benchmarks iirc yes.

As I said, if you have both GPU and CPU it gets messy.
But still, all the numbers for scheduling are exclusively client generated, with no input from server.

And I suppose he meant he assigned resource shares of 10 and 100 respectively.
However, if a project doesn't have the right kind of app for a system, it's quite irrelevant how big the share is... That would only kick in if at some point an app became available.

If you want to see what's going on under the hood, you need to enable work_fetch_debug and cpu_sched_debug.
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1771578 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1771702 - Posted: 15 Mar 2016, 7:18:29 UTC - in response to Message 1771578.  
Last modified: 15 Mar 2016, 8:05:52 UTC

...all the numbers for scheduling are exclusively client generated, with no input from server.


Exactly, the client side projected_flops, which is the key estimate component of CreditNew, a closed loop control system that estimates how long the tasks should take. It is read server side in estimate_duration(), which is then scaled by the available fraction.

[Edit2:] Correction: projected_flops is updated server side as data accumulates

[Edit:] My issues with the mechanism don't directly connect to resource share, or what is done with the resulting throughput and elapsed estimate(s), but directly to noticeable classic control systems engineering instabilities in those estimates that propagate through the mechanism.

Before host+app convergence: Slow convergence
Small change after host+app convergence: overshoot and/or ringing, or nothing, then sudden jolt.
---> sensitivity to initial conditions
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1771702 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1771705 - Posted: 15 Mar 2016, 8:07:30 UTC - in response to Message 1771702.  

All of which is fascinating, in a slow-motion-car-crash sort of way, but doesn't really address KLiK's question about not getting Linux 64 -Nividia-V8-MB tasks (because of the app not being ready yet), or how many he will run when it is ready.
ID: 1771705 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1771706 - Posted: 15 Mar 2016, 8:11:56 UTC - in response to Message 1771705.  
Last modified: 15 Mar 2016, 8:15:54 UTC

True, Reposting My prior posts, which I beleive address that:
[Edit: made link clickable]
If you're looking for or expecting stock deployment on main, there are some problems to solve first, at outlined in the previous post.


Probing Cuda build went to beta last week, and we determined more Cuda versions are needed to cover older Linux distribution kernels, Cuda 4.2 looking like the weapon of choice to cover the majority on that platform.

Minimum requirements for the Cuda6 build at http://jgopt.org/download.html are marginally better understood, though stock release requires completion there (to get the scheduler issuing the right things to the right hosts).

Not much since that time due to work commitments, but at least have a path to follow. Will be Juggling some more and feeding Eric further builds of mine and TBar's come weekend (which applies to both Linux and MacOSX)


Since then, Some slow progress, though more beta packages are available when Eric's ready for them (and will upload to my site later)
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1771706 · Report as offensive
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1771715 - Posted: 15 Mar 2016, 9:44:02 UTC

Yes we know all that in its glory, estimates, credit APR, the whole rigmarole.
That's not my point.

The question was 'how does resource share affect how boinc distributes available resources across projects?'

And that, in steady state operation, is a simple % time allocation, with cpu and gpu time weighted ( factored) according to peak_flops from CPU benchmark and manufacturer peak_flops for GPU.
That's how it's done and the whole messy inadequate newcredit loop doesn't come into it at that point. The system has its very own shortcomings, but Credit and estimates are not part of it, and you got that wrong in your initial answer to KLiK.

Estimates do indeed come into play when rr_sim finds HP/EDF is needed. But only to schedule priority.
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1771715 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1771717 - Posted: 15 Mar 2016, 10:05:48 UTC - in response to Message 1771715.  

At some point I'd like to look into the code to find out whether Resource Share - which is a function of REC - is thereby a function of peak flops (as the white paper says), or of some guesstimated fraction of peak flops, or of something derived from the real world via APR.

But that requires a codewalk, and is properly for another thread.
ID: 1771717 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1771718 - Posted: 15 Mar 2016, 10:06:01 UTC - in response to Message 1771715.  

Estimates do indeed come into play when rr_sim finds HP/EDF is needed. But only to schedule priority.


And also to inhibit additional work fetch, which is a problem if the project+app on board has unconverged overinflated estimates (likely on a new host+app).

Possible to give a stuck anon platform project+app a kick with a realistic <flops> entry, but given KliK (I imagine) is running beta as stock, he'd likely have to just keep processing beta work and hope times converge, then resource share and work fetch should go to normal.

(assuming he installed the Linux app under anon platform, which I don't think he had from what I could tell, and posted about stock delays accordingly)
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1771718 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1771719 - Posted: 15 Mar 2016, 10:07:33 UTC - in response to Message 1771718.  

KliK (I imagine) is running beta as stock

He was when I checked yesterday.
ID: 1771719 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1771720 - Posted: 15 Mar 2016, 10:16:20 UTC - in response to Message 1771717.  

At some point I'd like to look into the code to find out whether Resource Share - which is a function of REC - is thereby a function of peak flops (as the white paper says), or of some guesstimated fraction of peak flops, or of something derived from the real world via APR.

But that requires a codewalk, and is properly for another thread.


Will have to scan the most recent changes at leisure (since I resynced my Git clone Earlier from a 6 month old clone), but if unchanged since then it should still be avp->flops, which (from memory) was initialised to a fraction of peak flops, or read from an app_info <flops> entry. That'll need checking though, since avp->flops is accessed or manipulated in about 19 of the cpp files in client, including RR_SIM and work fetch.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1771720 · Report as offensive
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1771722 - Posted: 15 Mar 2016, 10:43:40 UTC - in response to Message 1771720.  

At some point I'd like to look into the code to find out whether Resource Share - which is a function of REC - is thereby a function of peak flops (as the white paper says), or of some guesstimated fraction of peak flops, or of something derived from the real world via APR.

But that requires a codewalk, and is properly for another thread.


Will have to scan the most recent changes at leisure (since I resynced my Git clone Earlier from a 6 month old clone), but if unchanged since then it should still be avp->flops, which (from memory) was initialised to a fraction of peak flops, or read from an app_info <flops> entry. That'll need checking though, since avp->flops is accessed or manipulated in about 19 of the cpp files in client, including RR_SIM and work fetch.


iirc REC draws off manufacturer GPU peak_flops and benchmark CPU peak_flops.

we are talking global flops for scheduling not app flops.
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1771722 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1771724 - Posted: 15 Mar 2016, 11:10:59 UTC - in response to Message 1771722.  
Last modified: 15 Mar 2016, 12:05:48 UTC

It would seem to me odd, to use an imaginary figure, when a real one is on hand, but whatever.

I've traced the REC updates, and they point to a new mystical number I haven't found the origin of yet called 'relative_speed'

[Edit:] a bit later, looks like relative speed is initialised to
coprocs.coprocs[i].count*0.2*coprocs.coprocs[i].peak_flops/cpu_flops
giving a unitless ratio of estimated GPU speed to host single cpu core speed (likely Whetstone)
Then later for REC accounting is turned into total peak estimate by multiplying by host_pfpops (Whetstone).

So the same as CreditNew's initial [new project app] estimate, only 4 x faster. Probably won't matter for work fetch itself single project, as it should over-request due to optimism, though I can easily see those overrequests clogging up project switching if fulfilled. Typically single instance GPU app efficiency is in the realm of 4-10%. (10% being the compute efficiency of the extremely highly optimised CUFFT library functions at the sizes we use, for example)

I do not know why one might want to use a 5% efficiency estimate on one hand, but a 20% efficiency estimated on the other. That's not in the comments.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1771724 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1771728 - Posted: 15 Mar 2016, 12:31:28 UTC
Last modified: 15 Mar 2016, 12:32:19 UTC

Decice peak flops 9638.
Apr 930-1000 (varying).
Whar is the Real performance?
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1771728 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1771729 - Posted: 15 Mar 2016, 12:36:32 UTC - in response to Message 1771728.  
Last modified: 15 Mar 2016, 12:41:25 UTC

Decice peak flops 9638.
Apr 930-1000 (varting).
Whar is the Real performance?


For your 'special' application ~10% of theoretical peak is about right
(So about 1TFlop, [the averaging used is pretty volatile though])
With other improvements you mentioned to me, plus some of my own special sauce, I believe we will approach/pass 20% in the future. That's with some new algorithms though.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1771729 · Report as offensive
fractal
Volunteer tester

Send message
Joined: 5 Mar 16
Posts: 5
Credit: 1,000,547
RAC: 0
Message 1771829 - Posted: 16 Mar 2016, 5:12:49 UTC - in response to Message 1766610.  

Hi!

Works for me.

Thanks for the config. It works for me as well on a pair of 750TI's using driver 355.11 and ubuntu 12.04

I found that a single work unit per card kept GPU utilization around 80%. Two units per card kept the GPU utilization around 90% and three units per card keep the cards at 98/99% without increasing run times by 3x over single.

One gotcha was the need to "chmod +x setiathome_x41zi_x86_64-pc-linux-gnu_cuda60" after "p7zip -d setiathome_x41zi_x86_64-pc-linux-gnu_cuda60.7z" so boinc would run it.
ID: 1771829 · Report as offensive
KLiK
Volunteer tester

Send message
Joined: 31 Mar 14
Posts: 1304
Credit: 22,994,597
RAC: 60
Croatia
Message 1771840 - Posted: 16 Mar 2016, 6:46:35 UTC - in response to Message 1771705.  

All of which is fascinating, in a slow-motion-car-crash sort of way, but doesn't really address KLiK's question about not getting Linux 64 -Nividia-V8-MB tasks (because of the app not being ready yet), or how many he will run when it is ready.

well, we (those of us running Linux also) have to run BETAs to develop an app which works in v8... ;)

but, don't know why the BOINC didn't pick any of v7 AP?! :/


non-profit org. Play4Life in Zagreb, Croatia, EU
ID: 1771840 · Report as offensive
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1772094 - Posted: 17 Mar 2016, 9:28:15 UTC - in response to Message 1771840.  

All of which is fascinating, in a slow-motion-car-crash sort of way, but doesn't really address KLiK's question about not getting Linux 64 -Nividia-V8-MB tasks (because of the app not being ready yet), or how many he will run when it is ready.

well, we (those of us running Linux also) have to run BETAs to develop an app which works in v8... ;)

but, don't know why the BOINC didn't pick any of v7 AP?! :/

host ID?
If there is a suitable app, have you checked your preferences that you allow AP?
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1772094 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : Number crunching : Update on Linux 64 -Nividia-V8-MB ?????


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.