I am puzzled...

Message boards : Number crunching : I am puzzled...

To post messages, you must log in.

AuthorMessage
Seahawk
Volunteer tester
Avatar

Send message
Joined: 8 Jan 08
Posts: 926
Credit: 5,748,161
RAC: 0
United States
Message 1202091 - Posted: 3 Mar 2012, 14:51:09 UTC

I am puzzled by the behavoir of my Q6600 system overnight. Lastnight when I last looked all the ATI WU's had a remaining time of 7 hours or so. This morning they are all 90+ hours. The CPU WU's are showing 1-3 hours remaining which looks to be the same as lastnight. Everything in the log looks ok to me and work is being done in expected amount of time. Just not sure why the DCF for the GPU went whacko.


I used to be a cruncher like you, then I took an arrow to the knee.

ID: 1202091 · Report as offensive
Profile Mike
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 29579
Credit: 49,107,650
RAC: 17,227
Germany
Message 1202093 - Posted: 3 Mar 2012, 14:56:43 UTC

You had one unit running over 16000 seconds.


With each crime and every kindness we birth our future.

ID: 1202093 · Report as offensive
Richard HaselgroveProject Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 11141
Credit: 83,779,454
RAC: 46,056
United Kingdom
Message 1202094 - Posted: 3 Mar 2012, 14:59:52 UTC - in response to Message 1202093.  

You had one unit running over 16000 seconds.

And several VLAR running over 25,000 seconds - like task 2334375975.

ID: 1202094 · Report as offensive
ClaggyProject Donor
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4623
Credit: 46,349,811
RAC: 2,941
United Kingdom
Message 1202098 - Posted: 3 Mar 2012, 15:03:34 UTC - in response to Message 1202094.  

You had one unit running over 16000 seconds.

And several VLAR running over 25,000 seconds - like task 2334375975.

Sounds like the ATI low GPU use Bug,

Claggy

ID: 1202098 · Report as offensive
Seahawk
Volunteer tester
Avatar

Send message
Joined: 8 Jan 08
Posts: 926
Credit: 5,748,161
RAC: 0
United States
Message 1202099 - Posted: 3 Mar 2012, 15:07:07 UTC

Ok, so they should come back down over time? The system has only been up and crunching for 38 hours since it got overhualed. I know the 2 ATI 6670 GPUs aren't the fastest crunchers but 90 hours is alittle out of the ballpark.


I used to be a cruncher like you, then I took an arrow to the knee.

ID: 1202099 · Report as offensive
Seahawk
Volunteer tester
Avatar

Send message
Joined: 8 Jan 08
Posts: 926
Credit: 5,748,161
RAC: 0
United States
Message 1202101 - Posted: 3 Mar 2012, 15:09:07 UTC

How do I check for this bug and is it correctable?


I used to be a cruncher like you, then I took an arrow to the knee.

ID: 1202101 · Report as offensive
Profile Michel448a
Volunteer tester
Avatar

Send message
Joined: 27 Oct 00
Posts: 1201
Credit: 2,891,635
RAC: 0
Canada
Message 1202103 - Posted: 3 Mar 2012, 15:12:51 UTC
Last modified: 3 Mar 2012, 15:13:33 UTC

each time a gpu one is ticking you can lower 30sec to 1 mins each, but when you get a cpu one ticking, specially big long ones, you can jump 8 to 12 mins (on the estimations for each WU in the queue)

boinc doesnt do estimations for cpu and gpu seperatly. it s 1 estimation for both.


am I right, guys ?


ID: 1202103 · Report as offensive
Richard HaselgroveProject Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 11141
Credit: 83,779,454
RAC: 46,056
United Kingdom
Message 1202107 - Posted: 3 Mar 2012, 15:21:32 UTC

The Application details for that host are showing a pretty insane APR for the ATI app.

I'll leave it to the ATI specialists to decide how plausible it is.

ID: 1202107 · Report as offensive
Profile Mike
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 29579
Credit: 49,107,650
RAC: 17,227
Germany
Message 1202117 - Posted: 3 Mar 2012, 15:44:13 UTC - in response to Message 1202101.  

How do I check for this bug and is it correctable?


Not much you can do.
First thing is to restart the computer.
Reduce instances to 3 better would be 2 on your card.

Each time you notice a slow down suspend GPU for 30 seconds and resume again.

With each crime and every kindness we birth our future.

ID: 1202117 · Report as offensive
Profile janneseti
Avatar

Send message
Joined: 14 Oct 09
Posts: 11101
Credit: 624,378
RAC: 206
Sweden
Message 1202210 - Posted: 3 Mar 2012, 22:31:12 UTC

I have the same problem with ATI GPU.
The APR value showing in the Application details page is insane.
But if you want the remaining time in BM to show more accurately for your GPU there is a way.
In your BOINC folder C:\ProgramData\BOINC\projects\setiathome.berkeley.edu there is a file app_info.xml.
Add a new entry <flops>16000000000</flops> to the <app_version> section.
The value, in my case 16 GFlops, will probably have to be adjusted so it will match your GPU.

Here is a snippet from app_info.xml.

<app_version>
<app_name>setiathome_enhanced</app_name>
<version_num>610</version_num>
<avg_ncpus>0.05</avg_ncpus>
<max_ncpus>0.05</max_ncpus>
<flops>16000000000</flops>
<plan_class>ati13ati</plan_class>
<cmdline>-period_iterations_num 20 -instances_per_device 1</cmdline>
<coproc>
<type>ATI</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>MB6_win_x86_SSE3_OpenCL_ATi_HD5_r390.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>MultiBeam_Kernels_r390.cl</file_name>
<copy_file/>
</file_ref>
</app_version>

ID: 1202210 · Report as offensive
Profile arkaynProject Donor
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4097
Credit: 51,576,341
RAC: 968
United States
Message 1202230 - Posted: 4 Mar 2012, 0:18:03 UTC - in response to Message 1202210.  

Actually you want it in a different format than that.

it should look something like this.
<flops>200.52843553287e09</flops>

That is for my GPU's.



ID: 1202230 · Report as offensive
Profile janneseti
Avatar

Send message
Joined: 14 Oct 09
Posts: 11101
Credit: 624,378
RAC: 206
Sweden
Message 1202235 - Posted: 4 Mar 2012, 0:46:04 UTC - in response to Message 1202230.  

Actually you can use both formats to give this value.

ID: 1202235 · Report as offensive
Richard HaselgroveProject Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 11141
Credit: 83,779,454
RAC: 46,056
United Kingdom
Message 1202237 - Posted: 4 Mar 2012, 0:50:49 UTC - in response to Message 1202230.  
Last modified: 4 Mar 2012, 1:28:57 UTC

Actually you want it in a different format than that.

it should look something like this.
<flops>200.52843553287e09</flops>

That is for my GPU's.

According to Joe Segur, the parser that handles that value can cope with most reasonable numeric and scientific formats.

You do, however, need to keep your wits about you when dealing with numbers as large as that. Not many of us (except the bankers) regularly deal with a couple of hundred billion - of anything, even flops.

If you're used to exponential notation, 200e9 is indeed easier to get right than 200000000000. But then, why not 2e11? For this purpose, the minute fractional bits after the decimal point really don't matter.

There is a nice example of this in today's New Scientist magazine. The website is subscription-only, so I'll have to quote instead of link:

BANKS are under orders to tighten their accounting, but we fear HSBC may be paying too much attention to the small things rather than the gigapounds. Andrew Beggs wanted to know the distance from his home to the bank's nearest branch. He consulted the bank's website, which came up with the answer 0.9904670356841079 miles (1.5940021810076005 kilometres). This is supposedly accurate to around 10-13 metres, much less than the radius of a hydrogen atom. Moisture condensing on the branch's door would bring it several significant digits closer.

ID: 1202237 · Report as offensive
Richard HaselgroveProject Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 11141
Credit: 83,779,454
RAC: 46,056
United Kingdom
Message 1202321 - Posted: 4 Mar 2012, 10:28:57 UTC - in response to Message 1202237.  
Last modified: 4 Mar 2012, 10:31:46 UTC

Testing superscripts for DA.

BANKS are under orders to tighten their accounting, but we fear HSBC may be paying too much attention to the small things rather than the gigapounds. Andrew Beggs wanted to know the distance from his home to the bank's nearest branch. He consulted the bank's website, which came up with the answer 0.9904670356841079 miles (1.5940021810076005 kilometres). This is supposedly accurate to around 10-13 metres, much less than the radius of a hydrogen atom. Moisture condensing on the branch's door would bring it several significant digits closer.

Superscripts seem OK, but don't click the Use BBCode tags to format your text link while editing - you'll lose your changes. I'll tell him.

ID: 1202321 · Report as offensive
Profile tullioProject Donor
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 5715
Credit: 973,130
RAC: 2,789
Italy
Message 1202325 - Posted: 4 Mar 2012, 11:00:32 UTC

I can read many articles in New Scientist without even registering. I am a registered non paying guest in Nature magazine and can read all its editorials and even some articles, same in Nature Communications.
Tullio


ID: 1202325 · Report as offensive
Richard HaselgroveProject Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 11141
Credit: 83,779,454
RAC: 46,056
United Kingdom
Message 1202327 - Posted: 4 Mar 2012, 11:16:40 UTC - in response to Message 1202325.  

I can read many articles in New Scientist without even registering. I am a registered non paying guest in Nature magazine and can read all its editorials and even some articles, same in Nature Communications.
Tullio

I tried the section link on the front page for that story (Feedback: Highway exit with no return) before posting, but it told me I had to log in first. Not everybody here will want to create even a guest registration just to read one silly story.

ID: 1202327 · Report as offensive
Profile tullioProject Donor
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 5715
Credit: 973,130
RAC: 2,789
Italy
Message 1202328 - Posted: 4 Mar 2012, 11:28:25 UTC - in response to Message 1202327.  

I have just read and printed a short article "Oceans acidifying at unprecedented speed". I grab what I can without registering from New Scientist, but Nature is a more reliable source.
Tullio


ID: 1202328 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1202411 - Posted: 4 Mar 2012, 18:13:59 UTC - in response to Message 1202107.  

The Application details for that host are showing a pretty insane APR for the ATI app.

I'll leave it to the ATI specialists to decide how plausible it is.

So it was indicating 1.4 TeraFlops and has since grown to 2.2 TeraFlops. I'm no ATI specialist but need not be to say that's ridiculously high.

The cause is result_overflow tasks which haven't been marked as runtime_outlier by a sah_validate process, because the BOINC core client API all too often truncates the stderr. Task 2334954485 is a current example though due to be purged soon. It has Run time 39.31, CPU time 30.91, Credit 0.33, but only the first 8 lines of stderr.txt captured and the result_overflow line would be much later. The Validator has always looked for that result_overflow keyword so the assimilated result is marked, and now it is also used to tell BOINC not to include the runtime in its averages.

In January, Matt Arsenault (Milkyway) noted on the boinc_dev list that about 2% of reports had truncated stderr sections, and provided a patch to fix the problem. Dr. Anderson apparently wasn't convinced it needs fixing, and even if he had implemented the patch it would only be effective for those alpha testing BOINC 7.0.x clients.

As a related note, the runtime_outlier detection for Astropulse v6 will be based on information in the uploaded result file rather than the stderr which is supposed to go back in requests to the Scheduler. Perhaps it would be a good idea to make a similar change for SETI@home v7.
                                                                  Joe


ID: 1202411 · Report as offensive
Seahawk
Volunteer tester
Avatar

Send message
Joined: 8 Jan 08
Posts: 926
Credit: 5,748,161
RAC: 0
United States
Message 1202422 - Posted: 4 Mar 2012, 18:56:22 UTC

I dropped to 2 task per GPU and the times have dropped from 90 hours to around 30. This is still 5-6 times higher than I have seen most task finish in. I'll just let it keep crunching and watch it.


I used to be a cruncher like you, then I took an arrow to the knee.

ID: 1202422 · Report as offensive
Richard HaselgroveProject Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 11141
Credit: 83,779,454
RAC: 46,056
United Kingdom
Message 1202477 - Posted: 4 Mar 2012, 21:34:16 UTC - in response to Message 1202321.  

Superscripts seem OK, but don't click the Use BBCode tags to format your text link while editing - you'll lose your changes. I'll tell him.

It's safe to use again now, though I'm not sure I remember all the page headers being in the formatting pop-up before.

ID: 1202477 · Report as offensive

Message boards : Number crunching : I am puzzled...


 
©2016 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.