Average Processing Rate Not Being Calculated?

Message boards : Number crunching : Average Processing Rate Not Being Calculated?
Message board moderation

To post messages, you must log in.

AuthorMessage
Wedge009
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 451
Credit: 431,396,357
RAC: 1,255
Australia
Message 1273258 - Posted: 20 Aug 2012, 11:39:41 UTC
Last modified: 20 Aug 2012, 11:40:51 UTC

Thanks for the advice. I've given BOINC 7.0.33 a try (after increasing the time limits) - as you suggested, existing GPU time estimates became ridiculously short, but freshly received work-units seem to be okay. I haven't been able to check on this much, though, with the current network issues that seem to be plaguing the servers (more so than usual).

For the host which had no average processing rate for MB/ATI, at least now it's a comparatively more reasonable 1.5 hour estimate compared with 12-13 hours previously. Unfortunately, the other host already has a pre-existing (too low) average processing rate, so the new BOINC doesn't help there. But I suppose I've put up with it long enough already, I can put up with it a bit longer.

I haven't been able to follow everything that's been said in those other threads Claggy quoted. Is there a summary with regards to progress/ideas/development for the outliers in average processing rate issue? Is it being actively worked on? Thanks for any information you can share on this.
Soli Deo Gloria
ID: 1273258 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 8
United Kingdom
Message 1272908 - Posted: 19 Aug 2012, 12:20:59 UTC - in response to Message 1272903.  

It's not so much r390 being Broken, as there being no Stock ATI MB app, so there's no Peak Flop Count for the Stock ATI MB app, which i believe anonymous platform would use, but not added to,

You could try Boinc 7.0.33, it has a revised GPU flops estimation for anonymous platform hosts:

client: when estimating FLOPS for an anonymous-platform app version for which no estimate has been supplied by user, use (CPU speed)*(cpu_usage + 10*gpu_usage) (--> add the 10*)


But, since it increases the flops by *10 it means existing GPU Wu's are on the verge of erroring out for Maximum Time Exceeded,
you'd want to eithier complete all your GPU Wu's before upgrading, or get your GPU Wu's resent as soon as you're installed Boinc 7.0.33,

Claggy
ID: 1272908 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 32170
Credit: 79,922,639
RAC: 181
Germany
Message 1272907 - Posted: 19 Aug 2012, 12:20:48 UTC

Adding flops in the appinfo will help a little.
But check your DCF first so you dont get -177 errors.

With each crime and every kindness we birth our future.
ID: 1272907 · Report as offensive
Wedge009
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 451
Credit: 431,396,357
RAC: 1,255
Australia
Message 1272903 - Posted: 19 Aug 2012, 11:59:03 UTC
Last modified: 19 Aug 2012, 12:06:08 UTC

That's interesting. If it's related to video cards, it might be worth noting that I have used different (slower) ATI cards in those hosts in the past. Thanks for the information.

I don't suppose there are any ideas for a solution or work-around, are there? Aside from manually setting flops (which I understand is not recommended).

Edit: You remark about MB/ATI r390 having a broken outlier detection must also apply to r426, then.

Also, I just checked - other applications seem to be working okay (CPU, AP/ATI, etc), although they're also skipping the odd task count, maybe also counting them as 'outliers'.
Soli Deo Gloria
ID: 1272903 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 8
United Kingdom
Message 1272900 - Posted: 19 Aug 2012, 11:47:55 UTC - in response to Message 1272898.  
Last modified: 19 Aug 2012, 11:54:27 UTC

Probably because of this:

http://setiweb.ssl.berkeley.edu/beta/forum_thread.php?id=1916&postid=43585

OK, I think I see the problem... There is a sanity check on the claimed credit. If the claimed credit is to high it also counts the result as an outlier. But that's a problem. Since you never get an updated Peak Flop Count because all the credit claims are too high, your credit claims will never go down.

The gears in the back of my head are now working on finding a solution.


It is also what Fred J. Verster is suffering from in this thread: Is it hosts or work erring out?

and what Snowmain is partly suffering from in this thread: Not getting everything out of my setup.

Claggy
ID: 1272900 · Report as offensive
Wedge009
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 451
Credit: 431,396,357
RAC: 1,255
Australia
Message 1272898 - Posted: 19 Aug 2012, 11:32:02 UTC
Last modified: 19 Aug 2012, 11:33:56 UTC

For the longest time I've been trying to work out why the estimated run-times on my SETI@home applications have settled down properly for everything except for MB/ATI on two of my hosts, and I think I have discovered at least part of the reason why:

I've been watching the application details page for my Remiem host, a relatively recent addition. See the third application listed, 'SETI@home Enhanced (anonymous platform, ATI GPU)' - it has no 'average processing rate' calculated, and it's ridiculous that it registers a 'number of tasks completed today', yet still has 'total number of tasks completed' listed as 0. Consequently, the run-time estimates are sitting at 12-13 hours instead of the ~3 minutes for short WUs, ~12 minutes for 'normal' WUs and ~28 mins for VLAR WUs.

On my Zanarkand host, my longest-serving host, I've observed that the application entry for 'SETI@home Enhanced (anonymous platform, ATI GPU)' is also increasing its counts for 'number of tasks completed today' and 'consecutive valid tasks', yet the 'total number of tasks completed' remains constant. Interestingly, for this host, the 'total number of tasks completed' count is non-zero, meaning that it was correctly registering tasks and calculating the 'average processing rate' at some point in the past. Obviously, since it's no longer being updated properly, my estimated run-times remain stuck at about 58 minutes instead of the previously mentioned run-times.

Finally, a third host running MB/ATI, Valfarre, seems to be calculating the 'average processing rate' correctly. This host only has CPU + ATI, unlike CPU + ATI + NV as with the other two hosts.

So my question is this: does anyone know what might be causing this issue? Is it something I've configured incorrectly at my end? Or is something going on at the server side? For reference, I'm currently using BOINC 7.0.28, with optimised MB/ATI application build 390 (build 426 on Remiem because of the OpenCL enumeration bug with mixed ATI/NV hosts). My app_info.xml specifies version 610 with ati14 plan class and this has always been the case (given that average processing rate has been properly calculated in the past, I don't expect the app_info.xml details to make a difference...).
Soli Deo Gloria
ID: 1272898 · Report as offensive

Message boards : Number crunching : Average Processing Rate Not Being Calculated?


 
©2020 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.