Message boards :
Number crunching :
I am puzzled...
Message board moderation
Author | Message |
---|---|
Seahawk Send message Joined: 8 Jan 08 Posts: 937 Credit: 8,157,029 RAC: 5 |
I am puzzled by the behavoir of my Q6600 system overnight. Lastnight when I last looked all the ATI WU's had a remaining time of 7 hours or so. This morning they are all 90+ hours. The CPU WU's are showing 1-3 hours remaining which looks to be the same as lastnight. Everything in the log looks ok to me and work is being done in expected amount of time. Just not sure why the DCF for the GPU went whacko. I used to be a cruncher like you, then I took an arrow to the knee. |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
You had one unit running over 16000 seconds. With each crime and every kindness we birth our future. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
You had one unit running over 16000 seconds. And several VLAR running over 25,000 seconds - like task 2334375975. |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
You had one unit running over 16000 seconds. Sounds like the ATI low GPU use Bug, Claggy |
Seahawk Send message Joined: 8 Jan 08 Posts: 937 Credit: 8,157,029 RAC: 5 |
Ok, so they should come back down over time? The system has only been up and crunching for 38 hours since it got overhualed. I know the 2 ATI 6670 GPUs aren't the fastest crunchers but 90 hours is alittle out of the ballpark. I used to be a cruncher like you, then I took an arrow to the knee. |
Seahawk Send message Joined: 8 Jan 08 Posts: 937 Credit: 8,157,029 RAC: 5 |
How do I check for this bug and is it correctable? I used to be a cruncher like you, then I took an arrow to the knee. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
The Application details for that host are showing a pretty insane APR for the ATI app. I'll leave it to the ATI specialists to decide how plausible it is. |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
How do I check for this bug and is it correctable? Not much you can do. First thing is to restart the computer. Reduce instances to 3 better would be 2 on your card. Each time you notice a slow down suspend GPU for 30 seconds and resume again. With each crime and every kindness we birth our future. |
janneseti Send message Joined: 14 Oct 09 Posts: 14106 Credit: 655,366 RAC: 0 |
I have the same problem with ATI GPU. The APR value showing in the Application details page is insane. But if you want the remaining time in BM to show more accurately for your GPU there is a way. In your BOINC folder C:\ProgramData\BOINC\projects\setiathome.berkeley.edu there is a file app_info.xml. Add a new entry <flops>16000000000</flops> to the <app_version> section. The value, in my case 16 GFlops, will probably have to be adjusted so it will match your GPU. Here is a snippet from app_info.xml. <app_version> <app_name>setiathome_enhanced</app_name> <version_num>610</version_num> <avg_ncpus>0.05</avg_ncpus> <max_ncpus>0.05</max_ncpus> <flops>16000000000</flops> <plan_class>ati13ati</plan_class> <cmdline>-period_iterations_num 20 -instances_per_device 1</cmdline> <coproc> <type>ATI</type> <count>1</count> </coproc> <file_ref> <file_name>MB6_win_x86_SSE3_OpenCL_ATi_HD5_r390.exe</file_name> <main_program/> </file_ref> <file_ref> <file_name>MultiBeam_Kernels_r390.cl</file_name> <copy_file/> </file_ref> </app_version> |
arkayn Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 |
|
janneseti Send message Joined: 14 Oct 09 Posts: 14106 Credit: 655,366 RAC: 0 |
Actually you can use both formats to give this value. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Actually you want it in a different format than that. According to Joe Segur, the parser that handles that value can cope with most reasonable numeric and scientific formats. You do, however, need to keep your wits about you when dealing with numbers as large as that. Not many of us (except the bankers) regularly deal with a couple of hundred billion - of anything, even flops. If you're used to exponential notation, 200e9 is indeed easier to get right than 200000000000. But then, why not 2e11? For this purpose, the minute fractional bits after the decimal point really don't matter. There is a nice example of this in today's New Scientist magazine. The website is subscription-only, so I'll have to quote instead of link: BANKS are under orders to tighten their accounting, but we fear HSBC may be paying too much attention to the small things rather than the gigapounds. Andrew Beggs wanted to know the distance from his home to the bank's nearest branch. He consulted the bank's website, which came up with the answer 0.9904670356841079 miles (1.5940021810076005 kilometres). This is supposedly accurate to around 10-13 metres, much less than the radius of a hydrogen atom. Moisture condensing on the branch's door would bring it several significant digits closer. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Testing superscripts for DA. BANKS are under orders to tighten their accounting, but we fear HSBC may be paying too much attention to the small things rather than the gigapounds. Andrew Beggs wanted to know the distance from his home to the bank's nearest branch. He consulted the bank's website, which came up with the answer 0.9904670356841079 miles (1.5940021810076005 kilometres). This is supposedly accurate to around 10-13 metres, much less than the radius of a hydrogen atom. Moisture condensing on the branch's door would bring it several significant digits closer. Superscripts seem OK, but don't click the Use BBCode tags to format your text link while editing - you'll lose your changes. I'll tell him. |
tullio Send message Joined: 9 Apr 04 Posts: 8797 Credit: 2,930,782 RAC: 1 |
I can read many articles in New Scientist without even registering. I am a registered non paying guest in Nature magazine and can read all its editorials and even some articles, same in Nature Communications. Tullio |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
I can read many articles in New Scientist without even registering. I am a registered non paying guest in Nature magazine and can read all its editorials and even some articles, same in Nature Communications. I tried the section link on the front page for that story (Feedback: Highway exit with no return) before posting, but it told me I had to log in first. Not everybody here will want to create even a guest registration just to read one silly story. |
tullio Send message Joined: 9 Apr 04 Posts: 8797 Credit: 2,930,782 RAC: 1 |
I have just read and printed a short article "Oceans acidifying at unprecedented speed". I grab what I can without registering from New Scientist, but Nature is a more reliable source. Tullio |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
The Application details for that host are showing a pretty insane APR for the ATI app. So it was indicating 1.4 TeraFlops and has since grown to 2.2 TeraFlops. I'm no ATI specialist but need not be to say that's ridiculously high. The cause is result_overflow tasks which haven't been marked as runtime_outlier by a sah_validate process, because the BOINC core client API all too often truncates the stderr. Task 2334954485 is a current example though due to be purged soon. It has Run time 39.31, CPU time 30.91, Credit 0.33, but only the first 8 lines of stderr.txt captured and the result_overflow line would be much later. The Validator has always looked for that result_overflow keyword so the assimilated result is marked, and now it is also used to tell BOINC not to include the runtime in its averages. In January, Matt Arsenault (Milkyway) noted on the boinc_dev list that about 2% of reports had truncated stderr sections, and provided a patch to fix the problem. Dr. Anderson apparently wasn't convinced it needs fixing, and even if he had implemented the patch it would only be effective for those alpha testing BOINC 7.0.x clients. As a related note, the runtime_outlier detection for Astropulse v6 will be based on information in the uploaded result file rather than the stderr which is supposed to go back in requests to the Scheduler. Perhaps it would be a good idea to make a similar change for SETI@home v7. Joe |
Seahawk Send message Joined: 8 Jan 08 Posts: 937 Credit: 8,157,029 RAC: 5 |
I dropped to 2 task per GPU and the times have dropped from 90 hours to around 30. This is still 5-6 times higher than I have seen most task finish in. I'll just let it keep crunching and watch it. I used to be a cruncher like you, then I took an arrow to the knee. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Superscripts seem OK, but don't click the Use BBCode tags to format your text link while editing - you'll lose your changes. I'll tell him. It's safe to use again now, though I'm not sure I remember all the page headers being in the formatting pop-up before. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.