Posts by jason_gee


log in
1) Message boards : Number crunching : V7 VLARs & Kepler GPUs? (Message 1380584)
Posted 4 days ago by Profile jason_gee
That could be explain why this hosts are more afected by the video lag from the vlars?


Possibly. Make sure when you compare again they run with the same apps, same settings, and same drivers. *sometimes* there are also ways to reassign interrupts if they are shared across slots, but only on the highest end mobos I think with extra slots(so you can move slots). If something like that turned out to be the case (genuine hardware limit) then it would justify hardware changes.

With DPCs it can be any device [driver] causing peaks, like chipset, even USB device or controller, sound card, or more likely network. As these things are software now, one poor driver can ruin the lot ( and a device that works doesn't mean it has a 'good' driver. )

On my machine that had DPC issues, I just disable everything I don't use in device manager, then those devices & slow drivers are out of the picture.
2) Message boards : Number crunching : V7 VLARs & Kepler GPUs? (Message 1380565)
Posted 4 days ago by Profile jason_gee
Fixed those Links, worth a look Juan

I haven't Black Vyper'd a rig in a long time.
Just don't have the time or ambition to lately.
When things were more CPU based, I'd spend a lot of time doing it.
Good to know he is still around.

The advice there was always sound.
Yes, it IS possible to kill a lot of unnecessary overhead in Windows...LOL.


With your 'different rig, same hardware/setup, different behaviour' things happening Mark, the DPC Latency one will be worth a play with.
3) Message boards : Number crunching : V7 VLARs & Kepler GPUs? (Message 1380558)
Posted 4 days ago by Profile jason_gee
Fixed those Links, worth a look Juan
4) Message boards : Number crunching : V7 VLARs & Kepler GPUs? (Message 1380554)
Posted 4 days ago by Profile jason_gee
Personally I think because of the huge variety of hardware combinations and software types available and used these days that both BOINC and SETI (and most other projects connected to BIONC) can no longer be classed as an easy "set & forget" proposition for just anyone and that they should be classed as "for above average type users" these days as demonstrated by the problems that are happening these last couple of years.

Much more information must now be considered (no has to be considered) on both BOINC and its' projects' pages so that new users can be much more informed on the use of the software and the problems that could be faced and more links as to where to get help (the current pages really are useless for most beginners now) plus also stipulate that YMMV as demonstrated by this subject alone as well as links that users can use to check that their rigs are doing the right thing.

Cheers.


I agree that as gpGPU takes more and more hold, the learning curve needs streamlining. With 20-20 hindsight, it's easy to see solving the, as yet relatively unaddressed, issue of Rogue GPU hosts, is likely to require a three pronged approach. End-User education, New tools for diagnostics & monitoring, application side Integrity checks.
5) Message boards : Number crunching : V7 VLARs & Kepler GPUs? (Message 1380551)
Posted 4 days ago by Profile jason_gee
Did anyone see VLAR's crunch in a time which was anywhere near* the original estimate?

* For near I will accept a figure that is lower than twice initial estimate.

And quick note, since we stopped doing VLAR's the APR for the Opt app on my computer has gone up from ~180 to ~200. (the CUDA50 figure is ~170)



I did, but I modified my own Boinc to converge estimates locally very quickly without overshoot or ringing, So I probably don't count. (or any of the authorised alpha testers of that still using it)
6) Message boards : Number crunching : V7 VLARs & Kepler GPUs? (Message 1380550)
Posted 5 days ago by Profile jason_gee
...I wish i know how to "brutaly tweak" my hosts, but there is only one Jason, and why we all depends on your knowledge.

For general system operation, these two sites I mentioned will be a good start:
BlackVyper
DPC latency checker

I realy belive sonner or later you will find and fix why the problem apears and share that info with us "mere mortals" as usual.


Haha, thanks. The problem is actually well understood, and demonstrated with the difference between 6.08 (unusable at all for VLAR) & 6.09 (operational at VLAR though not fantastic....) that nVidia was trying to solve it. One of their recommendations is to not use Cuda on an active display :-D... well it's us that pushes that boundary chossing to use GeForce for gpGPU instead of Tesla Compute clusters with no displays.

It's the 'fixing' part that is the tough one, and the sharing of knowledge there would have a limited nerdy audience ;)
7) Message boards : Number crunching : V7 VLARs & Kepler GPUs? (Message 1380497)
Posted 5 days ago by Profile jason_gee
Please my friend on plain english...(you know, even plain english is dificult to to me to understand i´m sorry)... the grey elepanth is here... by beer stock is totaly drain... i know it´s all his foult. All i could understant was, change the MB/CPU to a better one... is that what you mean? I´m almost ready to order few new X79 MB to change my old P8B75-MLE MB´s just need a little push from my master guru. Few new´s 690 & 670 are alllready on the way to here...



LoL, no, I wasn't suggesting a better mobo (mine is old). But don't let that stop you spending money if it makes you feel good :D.

Probably the main picture to draw from my post is that my system (OS and drivers) is 'brutally tweaked' compared to average systems, with careful diagnostics etc, so runs well (supremely stable) and fast despite no OC at the moment.

I normally OC it, but wanted to keep power usage down on the CPU lately, so it's stock 3.0 GHz & still doing everything.
8) Message boards : Number crunching : V7 VLARs & Kepler GPUs? (Message 1380465)
Posted 5 days ago by Profile jason_gee
Jason´s

Winterknight have 670 like me and sees the same problems (670 is almost the same generation of my 690 too) could be something related to our GPU´s since appears that video lag not happening with the 680/660 hosts as reported by you and Cliff?


To answer as many of the earlier questions as I can, It's an old 45nM Core2Duo (Wolfdale) cpu (2 cores),on a G33 express chipset (so PCIe 1.1 I believe)

Some differences may include:
-680 I think has larger L2 cache than 670, iirc 512KiB Vs 384KiB, which for the size of the datasets & the current pulsefinding code might have a profound impact. That doesn't explain 690's which are 2 x 680's having the larger cache. [ To those ends, no I haven't noticed VLAR+ non-VLAR interference, though cache is a likely suspect & has been replicated elsewhere ]
- Being my dev system, I BlackVyper the snot out of this system. [Fluid development tends to require fairly slick system response so as not to interrupt trains of thought. Including fast reboots if needed, in case of configuration changes or updates etc, RAID-10 helps there.]
- measuring with DPC latency checker when I migrated to Win7, I noticed severe driver quailty issues, in Chipset and Wifi card drivers through the recommended isolation strategy. Forcing Intel chipset driver updates was challenging, requiring careful reading of the Intel Chipset inf utility update readme (after OS installation section) , that fixed those. The Wifi now uses modified drivers from laptopvideo2go, which brought system usability into usable range. Also I avoid using Firefox now, and instead use Chrome, & now get no DPC latency spikes at all (doing any browsing/video while crunching).

Every system has its own character, for example the i5 I run in the lounge exhibited no DPC latency spikes out of the box, and didn't require any special convincing to run the latest chipset drivers.

HTH
Jason
9) Message boards : Number crunching : V7 VLARs & Kepler GPUs? (Message 1380251)
Posted 5 days ago by Profile jason_gee
ATI cards never had issues with VLARs.



Ok, anyone with NVidia cards that didn't have any problems then. I'm trying to have a honest conversation here.


single x GTX 680, on 45nM core2Duo, aggressive priority and pulsefinding settings. 2 VLAR in 1h28m (~44mins each), no particular display or keyboard lag noted as has been described by others.

I would like opt-in too.(for the science value & big work supply)
10) Message boards : Number crunching : Linux: v7 CUDA? (Message 1380233)
Posted 5 days ago by Profile jason_gee
64 bit Cuda is of course non-ideal unless you require massive amounts of VRAM (which we don't)

We never did have a stable Linux 32 build for the MB CUDA application, did we? As I recall, only a 64-bit version was available. I'm presuming there were issues that couldn't be resolved for 32-bit at that time.

That was most likely a function of available dev resources at the time. trying to build 32 bit with wide compatibility on x86_64 without fully functional autotools setup & multiple compilers / libraries available is awkward at best. So, I've gone the simpler route of installing an older 32 bit distro. The only problems I've encountered there so far are boincapi and equipment failure related. AFAICT the app code builds perfectly. Once the logistical issues are sorted out I expect it'll be no more challenging than the x64 build was.
11) Message boards : Number crunching : Linux: v7 CUDA? (Message 1380080)
Posted 6 days ago by Profile jason_gee

Discussed in 'Porting s@h V7 to Linux':
http://setiathome.berkeley.edu/forum_thread.php?id=71818


And unfortunately as far as I'm concerned, real work is intruding on my efforts. My direct port works well as a standalone, and produces results as close to the CPU executable as you could expect. But it produces access violations when run from BOINC. JG has had more success with his port but last I heard wasn't fully satisfied with it. We'll get there someday -- I keep threatening to retire to get more time for this sort of thing but there are small problems like the lack of enough pension fund...


My status is that I have a sortof working well app for 64 bit later kernels only (lots of dependencies etc) but that there are issues with later Boinc showing up. 64 bit Cuda is of course non-ideal unless you require massive amounts of VRAM (which we don't)

In an ongoing effort I've managed to get an older Fedora 13 distro up and running (listed for Cuda 3.2 toolkit), though am experiencing issues with the flaky GTX260 I have in that machine (freezing X), and naturally a different set of issues building against older boincapi ;)

I expect to change the card in that machine once the test live run with x86_64, Cuda 5 on Ubuntu 12.04 LTS is completed (currently NNT is set). Despite the flakiness of the card, it looks like once proper error handling & threadsafe exits were enabled like on Windows, reliability of reported results has been great.

http://setiathome.berkeley.edu/show_host_detail.php?hostid=6439256
At the time of this post, errors stopped on the 7th when I enabled error handling & threadsafe exits. Plenty of valids, a couple of inconclusive that are probably this card's fault.

It would help out if by the time I get things sorted for an x86 build with less dependencies, interested parties were signed up at Crunchers Anonymous, ready for some controlled tests (alpha, debug & beta).

Jason
12) Message boards : Number crunching : GTX 6xx / 7xx series overview v1.0 (Message 1379912)
Posted 6 days ago by Profile jason_gee
...
GTX 770
1536 CUDA Cores
1046 Base Clock (MHz)
1085 Boost Clock (MHz)
7.0 Gbps Memory Speed
230 W Maximum Graphics Card Power
...


Here the specs are very telling. This is a tweaked/improved GTX 680. They made the memory controller better. The higher max power figure is because you have a better cooler and 'GPU Boost 2'. This means you can OC the card more if you want, with boost 2 allowing voltage unlock as well as higher power target. Clock for clock at reference voltage it will use the same or less power, and be cooler... it is just possible to push this card higher than reference 680,without modifying the hardware.
BTW, the GTX7xx series will get a SAHv7 cuda55 app? ;-)
Of course, but this will make most difference in 780 in TITAN after I manage to get a 780, as these have some special instructions that do *exactly* the same things as some of my optimised code, with single hardware instructions, without memory transfer costs.
13) Message boards : Number crunching : VLAR's to Kepler cards. (Message 1379604)
Posted 7 days ago by Profile jason_gee
...[edit]It would be interesting to know why the memory controller loads drops 50% while running VLAR's rather than increase, maybe this is where the problem lays?[/edit]


In a nutshell, that's exactly where 'the problem' lies ( even though it's really a set of related problems). Those are symptoms rather than causes though.

Since the whole basis of Eric's & Alex Kan's refinements to pulsefinding orient around managing these difficult patterns on CPU, 'the problem' on CPU is hidden with algorithm level optimisation (Eric K), combined with mircoarchitectural ones (AK & now Joe Segur). [Edit: in turn refinement to Staelin's work looking for pulsars in ~1968 thereabouts, so some ~45 years-ish work]

Difficult access patterns initiate 'bank conflicts' that induce cache thrashing, which trigger processing (front end) stalls (goto 1). Averages over human monitoring timescales then would look like running cooler (doing less work) while underutilising in some aspects & looking as though they are maxxed when they aren't (they're 'stalled' for some portion, which is still utilisation of a form, in that the resource can't be used for something else while stalled).

All *someone* needs to do is 'invent' a way to map pulsefinding on GPUs in friendly access patterns. I have my own ideas on that, though mature they are not, especially while basic v7 release functionality takes precedence over optimisation (algorithmic & micoarchitectural).
14) Message boards : Number crunching : v7 cuda23 WUs getting ERR_TOO_MANY_EXITS (Message 1379575)
Posted 7 days ago by Profile jason_gee
Apropos, have you ever compared the LZMA size of a UPX file, compared to the LZMA of the raw uncompressed file? Eric's trick might work in other distribution situations, too.


I use various forms of executable packing for assembly and embedded development. It reduces load time and transfer/storage costs significantly. Unfortunately from time to time poor antivirus heuristics designed as catch-all mechanisms tend to flag this practice routinely with false positives, along with a number of other useful assembly language techniques. So in general for developments I've done in the past, the larger the distribution the added support cost of such false positives tends to outweigh the immediate benefits.
15) Message boards : Number crunching : v7 cuda23 WUs getting ERR_TOO_MANY_EXITS (Message 1379569)
Posted 7 days ago by Profile jason_gee
...If v7 cuda23 tasks are being sent out to other stock users, aren't they all going to be failing like this on an ongoing basis (unless all those users just happen to be preemptively following this thread :))? Why not just hold those tasks back until an application-based solution can be applied?
...


There's only two small problems with hoping for an application based solution, namely that the cuda libraries (all versions) don't support Delay loading of the DLLs, and that the libraries themselves are interdependent (cufft depends on cudart) so no overriding the names to workaround the issue. Ideally cuda 2.2 & 2.3 shouldn't co-reside, and Boinc getting confused by existing different sized ones in the project directory is added complication.

One possibility for manual workaround would be to take the 'correct' copies of (2.3) cudart.dll and cufft.dll and put them in WINDOWS\SYSTEMS32. These should be found first (by Windows) & Boinc can do whatever it wants with links & copies etc, but will probably still complain about mismatches in the project folder & kill the app before it starts (under various stock V7 + prior 2.2-2.3 stock/opt V6 scenarios). So while it might solve any complaints the app might have, Boinc is trying to be a bit too clever during the switchover.
16) Message boards : Number crunching : What has your RAC done......since..... (Message 1379296)
Posted 7 days ago by Profile jason_gee
Looking at the comments, and the credits being granted, I can't help feeling that someone has forgotten to include a chunk of the calculations in the credit awarded equation. I suggest this because, like others, I'm seeing tasks taking "twice" as long, but only getting "half" the credits. Surely if a task takes twice as long to execute on a given processor it should get double the credit, not half.

That´s exactly what we all wants, but with Creditnew seem´s to be an impossible target.


I was just asleep and woke up with a start, having had an idea that might just work... Calculate David Anderson's pay cheque with CreditNew....

Back to sleep
17) Message boards : Number crunching : Lunatics Windows Installer v0.41 Release Notes (Message 1379246)
Posted 8 days ago by Profile jason_gee

301.42 and 301.24 both support cuda 4.2 out of the gate Mark, just in case you didn't know.

I was only going by the information given in the installer.


as per readme (*) in some cases there may be beta, day 0, or day 1 drivers supporting the cuda revision described. In most cases except if there is some specific reason I specifically avoid an implied recommendation of using boxed or beta drivers, and state the developer driver shipped with the Cuda toolkit instead. early 300.x 301.10 are problematic, 301.24 being first Kepler beta that didn't have major issues (though it has some subtle ones underneath)
18) Message boards : Number crunching : VLAR's to Kepler cards. (Message 1379085)
Posted 8 days ago by Profile jason_gee
Thanks Cliff. In basic principles I agree, though I am sensitive to look for 'something new', and do expect to be able to 'use' my machine while crunching. Something that in days gone by, from time to time, was not feasible.

For context during Beta tests I was watching 'The Hobbit' in HD while running 2 x VLARs on 680. From the experiences I hear this wouldn't be possible with specific configurations. Exactly what those conditions are for now is a mystery to me, but at the same time I fully know how much new work is in V7 (not just VLAR) and have a pretty firm idea on the limitations of OS, drivers and Boinc infrastructure, as well as the Boinc client and multibeam demands.

What it comes down to is that under V6 with Cuda x41g user expectation was raised somewhat over the initial GPU offerings. I hope to be able to push through the new challenges & improve on that over & above extra demands presented with V7, and forthcoming GBT reobservation processing.

Jason
19) Message boards : Number crunching : VLAR's to Kepler cards. (Message 1379037)
Posted 8 days ago by Profile jason_gee
Would it possibly depend what foreground application people were using as the 'subjective lag' monitor? I used to use solitaire in Windows XP, but I can't stand the Windows 7 version...

A bit of light Chroming might be different from a full multi-tab Firefox with hardware acceleration turned on.


Certainly the subjective part can account for some of it, possibly even a large proportion, on top of innate prejudice against any tasks marked vlar.

More profound indicators of something else going on though, are descriptions of 'typing waiting for characters to appear'. To me those are well beyond subjective feel into fundamental hardware and/or driver differences.
20) Message boards : Number crunching : VLAR's to Kepler cards. (Message 1379011)
Posted 8 days ago by Profile jason_gee
... Each tasks takes approx. 1 hr. to complete when using Lunatics 0.41 / cuda50. I see no reason what so ever to have the server quit sending VLARs to GPUs. GIVE ME MORE!!! ...


This closer matches my own tests on a 680, as well as tests on beta prior to release. It's certainly one for the head-scratching department ro work out in the long run, why there's a clear discrepency in user experience between remarkably similarly equipped dual GTX 670 machines. How much is subjective & how much is some hidden real capability difference remains to be explored (for example PCIe slot shared interrupts & DPC latency come to mind).

In any case, with initial V7 under our belts (warts & all ;) ) That allows a post-analysis (post-mortem ?) of sorts to occur. It wouldn;t be any fun if there weren;t any mysteries left to solve, (or any code left to optimise)



Next 20

Copyright © 2013 University of California