Eric J Korpela
Message 45152 - Posted: 12 Mar 2013, 22:48:10 UTC

SETI@home 7.00 for graphics cards has been released. Most of the changes are small apart from there now being a version for CUDA 5.0.

Right now the version for ATI/AMD cards is only for BOINC 7. BOINC 6 support may follow, if it can be done at all. AMD is also forcing us to give up on Windows XP by dropping OpenCL support in Windows XP. The Windows version detection isn't working, so those of you with XP and new enough cards may get work that fails.

CPU versions should be out in the next day or so.

I expect to see all the usual work assignment problems (too much/too little/bad deadlines) that come with new version releases.
Claggy
Message 45154 - Posted: 13 Mar 2013, 2:35:03 UTC - in response to Message 45152.

Right now the version for ATI/AMD cards is only for BOINC 7. BOINC 6 support may follow, if it can be done at all. AMD is also forcing us to give up on Windows XP by dropping OpenCL support in Windows XP. The Windows version detection isn't working, so those of you with XP and new enough cards may get work that fails.

I hope you made a requirement for the ATI/AMD version to have a Max Catalyst version of Cat 12.8 (that is APP runtime 938.2), the APP runtime included in Cat 12.10 compiles a kernel file that then produces inconclusives where incorrect pulses are found, later Catalyst versions eithier produce a kernel file that causes driver restarts, or fail to do the compilation at all.

Claggy
Mike
Message 45156 - Posted: 13 Mar 2013, 8:49:07 UTC

If 7.00 is HD5 version like Raistmer mentioned only 12.10 is a problem.
No driver restarts or faulty results with Cat 13.1.
Wedge009 already confirmed it on main with his hosts.

Raistmer
Message 45157 - Posted: 13 Mar 2013, 14:20:32 UTC - in response to Message 45152.

I expect to see all the usual work assignment problems (too much/too little/bad deadlines) that come with new version releases.

Maybe worth day of brainstorming with David & all for this? Cause it becomes really nasty to get this on each and every app version update. This causes version update event much more painful event (and much more important) than it should actually be.
Raistmer
Message 45158 - Posted: 13 Mar 2013, 14:21:42 UTC - in response to Message 45156.

If 7.00 is HD5 version like Raistmer mentioned only 12.10 is a problem.
No driver restarts or faulty results with Cat 13.1.
Wedge009 already confirmed it on main with his hosts.

Yes, it should be r1779 HD5 build I posted recently.
Also, looks like only VLARs computed incorrectly and GPU apps recive no VLARs now...
Claggy
Message 45159 - Posted: 13 Mar 2013, 14:59:10 UTC - in response to Message 45158.

If 7.00 is HD5 version like Raistmer mentioned only 12.10 is a problem.
No driver restarts or faulty results with Cat 13.1.
Wedge009 already confirmed it on main with his hosts.

Yes, it should be r1779 HD5 build I posted recently.
Also, looks like only VLARs computed incorrectly and GPU apps recive no VLARs now...

No, it's not just the VLAR Wu's, the refquick_v7.wu also computes incorrectly:

MB7_win_x86_SSE_OpenCL_ATi_HD5_r1779.exe -verb -nog / refquick_v7.wu :
AppName: MB7_win_x86_SSE_OpenCL_ATi_HD5_r1779.exe
AppArgs: -verb -nog
Started at  : 21:36:56.066
Ended at    : 21:37:17.236
21.138 secs Elapsed
9.266 secs CPU time
Speedup     : 85.17%
Ratio       : 6.74x

R2: .\ref\ref-setiathome_6.98_windows_intelx86.exe-refquick_v7.wu.res
------------- R1:R2 ------------     ------------- R2:R1 ------------
Spike      0      7      7      7      0        0      7      7      7      0
Autocorr      0      5      5      5      0        0      5      5      5      0
Gaussian      0      6      6      6      0        0      6      6      6      0
Pulse      0      4      4      4      0        0      4      4      4      1
Triplet      0      5      5      5      0        0      5      5      5      0
Best Spike      0      1      1      1      0        0      1      1      1      0
Best Autocorr      0      1      1      1      0        0      1      1      1      0
Best Gaussian      0      1      1      1      0        0      1      1      1      0
Best Pulse      0      1      1      1      0        0      1      1      1      0
Best Triplet      0      1      1      1      0        0      1      1      1      0
----   ----   ----   ----   ----     ----   ----   ----   ----   ----
0     32     32     32      0        0     32     32     32      1

Unmatched signal(s) in R2 at line(s) 235
For R1:R2 matched signals only, Q= 99.89%
Result      : Weakly similar.

Then there are the hundreds of inconclusives i've had from Wingmen that did their compilations under Cat 12.10, and find the wrong pulses, for example:

http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=4424313

Claggy
Eric J Korpela
Message 45162 - Posted: 13 Mar 2013, 16:26:38 UTC - in response to Message 45154.

I hope you made a requirement for the ATI/AMD version to have a Max Catalyst version of Cat 12.8 (that is APP runtime 938.2), the APP runtime included in Cat 12.10 compiles a kernel file that then produces inconclusives where incorrect pulses are found, later Catalyst versions eithier produce a kernel file that causes driver restarts, or fail to do the compilation at all.

Unfortunately there's no easy way to exclude a range of version numbers. I suppose I could split it into two plan classes, one for 13+ and one for 12.0 to 12.8.

Claggy
Message 45163 - Posted: 13 Mar 2013, 18:00:03 UTC - in response to Message 45162.

I hope you made a requirement for the ATI/AMD version to have a Max Catalyst version of Cat 12.8 (that is APP runtime 938.2), the APP runtime included in Cat 12.10 compiles a kernel file that then produces inconclusives where incorrect pulses are found, later Catalyst versions eithier produce a kernel file that causes driver restarts, or fail to do the compilation at all.

Unfortunately there's no easy way to exclude a range of version numbers. I suppose I could split it into two plan classes, one for 13+ and one for 12.0 to 12.8.

That's what Einstein did for the Nvidia 295.xx/296.xx Sleeping Monitor Bug.

Claggy
TRuEQ & TuVaLu
Message 45164 - Posted: 13 Mar 2013, 18:23:21 UTC

No wu's?
Claggy
Message 45166 - Posted: 13 Mar 2013, 20:53:00 UTC - in response to Message 45164.

I got some of my existing CPU Wu's resent as first CPU 6.98 Wu's, then Cuda32 and Cuda42 7.00 Wu's, for some reason the scheduler wouldn't resend Wu's for the Cuda5 7.00 app, or the the OpenCL AMD/ATI 7.00 app,

Claggy
Eric J Korpela
Message 45168 - Posted: 14 Mar 2013, 0:18:58 UTC - in response to Message 45166.

Yep, the version assignment logic is still pretty messed up. In theory it shouldn't be possible for a single machine to get the cuda22, cuda32, and cuda50 apps, but I just saw it happen.

cuda22 shouldn't go to anything with a driver above 190.37 and cuda32 shouldn't go to anything with a driver below 263.06 and cuda50 needs a driver of 304.48.

It has to have something to do with app_version caching in the scheduler.

Mystery solved. It was an older version of BOINC that doesn't report driver version numbers. Not a whole lot I can do about that.

Richard Haselgrove
Message 45174 - Posted: 14 Mar 2013, 10:08:57 UTC - in response to Message 45168.

<edit>
Mystery solved. It was an older version of BOINC that doesn't report driver version numbers. Not a whole lot I can do about that.
</edit>

We probably ought to work out when driver version number reporting was introduced, so that we can advise people not to run BOINC clients earlier than that (with GPUs).

And, in case it ever becomes relevant in the future, so far as I can tell, *no* Linux version of BOINC is currently reporting NVIDIA version numbers - whatever __cuDriverGetVersion in http://boinc.berkeley.edu/trac/browser/boinc/client/gpu_nvidia.cpp is supposed to be doing, hosts like http://www.gpugrid.net/show_host_detail.php?hostid=136850 aren't showing a driver version from BOINC v7.0.53: that project does show ATI driver versions for Linux clients.
Gary Charpentier
Message 45179 - Posted: 14 Mar 2013, 13:34:49 UTC - in response to Message 45174.

<edit>
Mystery solved. It was an older version of BOINC that doesn't report driver version numbers. Not a whole lot I can do about that.
</edit>

We probably ought to work out when driver version number reporting was introduced, so that we can advise people not to run BOINC clients earlier than that (with GPUs).

Yes a nice message on the notice tab to update, if the version supports that.

William
Message 45181 - Posted: 14 Mar 2013, 16:45:16 UTC - in response to Message 45168.

Yep, the version assignment logic is still pretty messed up. In theory it shouldn't be possible for a single machine to get the cuda22, cuda32, and cuda50 apps, but I just saw it happen.

cuda22 shouldn't go to anything with a driver above 190.37 and cuda32 shouldn't go to anything with a driver below 263.06 and cuda50 needs a driver of 304.48.

It has to have something to do with app_version caching in the scheduler.

<edit>
Mystery solved. It was an older version of BOINC that doesn't report driver version numbers. Not a whole lot I can do about that.
</edit>

I'm sorry to report that my host with a GT 9800 with 512 MiB RAM and driver 310.70 picked up a Cuda22 task.

Actually per preferences it shouldn't have gotten a GPU task at all, but when attaching a new host the initial fetch is 1 sec for all devices, regardless of preferences - those only reach the host with the scheduler reply that has the initial work... sort of shoot first - ask questions later.
That unwanted workfetch also meant downloading the big .exe and .dll - at least it was a cuda22 - those higher cuda dlls are horribly big.

Oh BTW I found when packaging that those .dlls compress really well with 7z - any chance they could be transferred compressed and unpacked locally? that would save quite a bit of bandwidth.

And for reference the initial estimates are 18h52:51 for CPU and 1h43:04 for GPU. At least with a boinc 6.12.x it still has DCF. Better not lift David's carpet...
Raistmer
Message 45182 - Posted: 14 Mar 2013, 17:03:31 UTC

should BOINC not re-download cuFFT DLLs if there are on host already with older app executable ?
Eric J Korpela
Message 45183 - Posted: 14 Mar 2013, 17:15:23 UTC - in response to Message 45182.

It shouldn't re-download, but it's possible that the timestamp on the file has been changed by the update_versions script which might trigger a new download.
Richard Haselgrove
Message 45184 - Posted: 14 Mar 2013, 17:15:54 UTC - in response to Message 45182.

should BOINC not re-download cuFFT DLLs if there are on host already with older app executable ?

Thinking ahead to the transfer to main - that's a lot of (stock) users, none of whom will have anything later than cuda30 (6.10 Fermi). That's cuda32, cuda42, cuda50 - all the big ones. And as we know, the server has a tendency to hand out more than one app while it's feeling its way towards the fastest app for the host.

BOINC doesn't recycle DLLs downloaded for another project - that would be a big one for David to tackle.
Raistmer
Message 45185 - Posted: 14 Mar 2013, 17:40:31 UTC - in response to Message 45184.

BOINC doesn't recycle DLLs downloaded for another project - that would be a big one for David to tackle.

Perhaps we should copy all BOINC related issues to BOINC dev/alpha lists cause hardly David will look here.
Claggy
Message 45186 - Posted: 14 Mar 2013, 18:03:39 UTC

Still can't get any fresh GPU work eithier for my Nvidia or AMD GPUs, yesterday i managed to resend a whole lot as Cuda32 and Cuda42 work, today when i tried it, the server timed the remainder out,
the reply from the server for a work request doesn't say why work isn't being sent:

14/03/2013 17:57:55 | SETI@home Beta Test | [sched_op] Starting scheduler request
14/03/2013 17:57:55 | SETI@home Beta Test | Sending scheduler request: Requested by user.
14/03/2013 17:57:55 | SETI@home Beta Test | Requesting new tasks for NVIDIA and ATI
14/03/2013 17:57:55 | SETI@home Beta Test | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
14/03/2013 17:57:55 | SETI@home Beta Test | [sched_op] NVIDIA work request: 40053.90 seconds; 0.00 devices
14/03/2013 17:57:55 | SETI@home Beta Test | [sched_op] ATI work request: 87264.00 seconds; 1.00 devices
14/03/2013 17:57:58 | SETI@home Beta Test | Scheduler request completed: got 0 new tasks
14/03/2013 17:57:58 | SETI@home Beta Test | [sched_op] Server version 701
14/03/2013 17:57:58 | SETI@home Beta Test | Project requested delay of 7 seconds
14/03/2013 17:57:58 | SETI@home Beta Test | [sched_op] Deferring communication for 7 sec
14/03/2013 17:57:58 | SETI@home Beta Test | [sched_op] Reason: requested by project

Application details for host 45274

Claggy
Eric J Korpela
Message 45187 - Posted: 14 Mar 2013, 18:14:26 UTC - in response to Message 45186.

[sched_op] NVIDIA work request: 40053.90 seconds; 0.00 devices

The 0.00 devices would probably have something to do with not getting any work.
