Could 6.12.33 be the cause for my invalids?

Message boards : Number crunching : Could 6.12.33 be the cause for my invalids?
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Acrklor
Volunteer tester
Avatar

Send message
Joined: 22 Oct 01
Posts: 14
Credit: 639,144
RAC: 0
Austria
Message 1125871 - Posted: 8 Jul 2011, 13:23:04 UTC

From the minute I upgraded to the new 6.12.33 BOINC version my Lunatics ATI Apps (both SETI and Beta) started to last only about 20 seconds and get marked as invalid (at least the ones that aren't pending anymore).

Of course it could be my GPU, but it would be a weird conincidence to produce failures right after the upgrade, didn't have a single invalid before. (Temperature of the GPU is even less than what it is used to.)

The strangest thing is the stderr output of these tasks (eg http://setiathome.berkeley.edu/result.php?resultid=1983386504) is only one line:
<core_client_version>6.12.33</core_client_version>

Anyone else experiencing this? Any clues/suggetions?
Also, is there anything to watch out for when downgrading BOINC?
"Judging people you don't know for things you don't understand is just really stupid." - Ellen Page
ID: 1125871 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34262
Credit: 79,922,639
RAC: 80
Germany
Message 1125872 - Posted: 8 Jul 2011, 13:31:18 UTC
Last modified: 8 Jul 2011, 13:31:32 UTC

I dont think its Boinc but downgrade just to make sure.
Just install old version over the top.
I´ve seen out of memory message in one unit.

I guess its the card.


With each crime and every kindness we birth our future.
ID: 1125872 · Report as offensive
Acrklor
Volunteer tester
Avatar

Send message
Joined: 22 Oct 01
Posts: 14
Credit: 639,144
RAC: 0
Austria
Message 1125873 - Posted: 8 Jul 2011, 13:33:56 UTC

Thanks Mike, I'll try the downgradge.

@Out of memory: I'm pretty sure that's another issue which I'll post in a few minutes. ;)
"Judging people you don't know for things you don't understand is just really stupid." - Ellen Page
ID: 1125873 · Report as offensive
Acrklor
Volunteer tester
Avatar

Send message
Joined: 22 Oct 01
Posts: 14
Credit: 639,144
RAC: 0
Austria
Message 1125877 - Posted: 8 Jul 2011, 13:42:08 UTC

You were right, it wasn't BOINC. Still getting these 20 seconds tasks.
Are these 'normal' super-shorties?
But that would still not explain the stderr (now with the older version number of course):
<core_client_version>6.12.26</core_client_version>
"Judging people you don't know for things you don't understand is just really stupid." - Ellen Page
ID: 1125877 · Report as offensive
Profile skildude
Avatar

Send message
Joined: 4 Oct 00
Posts: 9541
Credit: 50,759,529
RAC: 60
Yemen
Message 1125884 - Posted: 8 Jul 2011, 14:00:37 UTC - in response to Message 1125877.  

have you tried a reboot, yet?


In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope
ID: 1125884 · Report as offensive
Acrklor
Volunteer tester
Avatar

Send message
Joined: 22 Oct 01
Posts: 14
Credit: 639,144
RAC: 0
Austria
Message 1125889 - Posted: 8 Jul 2011, 14:08:00 UTC

Yes, rebooted before starting this thread.
"Judging people you don't know for things you don't understand is just really stupid." - Ellen Page
ID: 1125889 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34262
Credit: 79,922,639
RAC: 80
Germany
Message 1125898 - Posted: 8 Jul 2011, 14:44:52 UTC

Which settings are you running.
Did you change something in appinfo.xml ?

Do you have something in autostart ?



With each crime and every kindness we birth our future.
ID: 1125898 · Report as offensive
Acrklor
Volunteer tester
Avatar

Send message
Joined: 22 Oct 01
Posts: 14
Credit: 639,144
RAC: 0
Austria
Message 1125923 - Posted: 8 Jul 2011, 15:47:32 UTC

Settings: -period_iterations_num 14 -instances_per_device 1 (or did you mean other settings?)
I didn't change the appinfo.xml for a week.

Autostart: Do you mean what is running? Nothing new before this started...IE9, WMP12, WLM2011, Trillian, Skype, MSE and CCC.

Just 'found' a task which would get over the 20 seconds and finished in an hour including a stderr:
http://setiathome.berkeley.edu/result.php?resultid=1987166007
(perhaps there is still nothing out of the ordinary)
...so not every task is affected, however one that looks just like it has the same issue again:
http://setiathome.berkeley.edu/result.php?resultid=1987166005
"Judging people you don't know for things you don't understand is just really stupid." - Ellen Page
ID: 1125923 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34262
Credit: 79,922,639
RAC: 80
Germany
Message 1125932 - Posted: 8 Jul 2011, 15:56:46 UTC

Something is consuming your resources maybe.

Did you have a virus scan lately ?
Maybe Malware or spyware.

Make sure Boinc data directory is excluded from virus scanner.



With each crime and every kindness we birth our future.
ID: 1125932 · Report as offensive
Acrklor
Volunteer tester
Avatar

Send message
Joined: 22 Oct 01
Posts: 14
Credit: 639,144
RAC: 0
Austria
Message 1126000 - Posted: 8 Jul 2011, 18:44:38 UTC

As I expected, Virus scan was negativ.

> Something is consuming your resources maybe.
GPU resources? Other than Windows Aero and IE9? Well not that the CCC or GPU-Z would show, cause when BOINC is suspended the GPU shows 0% workload @157/300 (minimum MHz). Anyway what bad would that do? There were days when I played Star Trek Online while everything including GPU was crunching without error/problem.

> Make sure Boinc data directory is excluded from virus scanner.
Makes no different regarding this issue.


In any case I won't be able to track this the next week, cause I'm gonna be in Graz cheering for Austria @ the Amercian Football World Championship. ;)
"Judging people you don't know for things you don't understand is just really stupid." - Ellen Page
ID: 1126000 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1126048 - Posted: 8 Jul 2011, 20:38:26 UTC - in response to Message 1126000.  

Looking through your tasks, it looks as if MB r177 worked when you had Cat 11.5 (CAL 1.4.1385) drivers installed and hasn't worked since you had Cat 11.6 (CAL 1.4.1417) installed,
So try uninstalling Cat 11.6, and installing Cat 11.5 again,

Claggy
ID: 1126048 · Report as offensive
Profile Slavac
Volunteer tester
Avatar

Send message
Joined: 27 Apr 11
Posts: 1932
Credit: 17,952,639
RAC: 0
United States
Message 1126136 - Posted: 9 Jul 2011, 0:43:45 UTC - in response to Message 1126048.  

OP I'd also suggest upgrading to the Lunatics 39 build, completely killed any invalids I was having.


Executive Director GPU Users Group Inc. -
brad@gpuug.org
ID: 1126136 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1126194 - Posted: 9 Jul 2011, 6:50:13 UTC - in response to Message 1126136.  
Last modified: 9 Jul 2011, 7:06:38 UTC

OP I'd also suggest upgrading to the Lunatics 39 build, completely killed any invalids I was having.


It's a good app indeed, but won't help his ATi card.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1126194 · Report as offensive
W5GA, W5TAT, W8QR, K6XT

Send message
Joined: 25 Sep 99
Posts: 42
Credit: 23,144,377
RAC: 6
United States
Message 1127048 - Posted: 12 Jul 2011, 23:44:46 UTC - in response to Message 1126136.  

OP I'd also suggest upgrading to the Lunatics 39 build, completely killed any invalids I was having.

Is this installed automatically by Lunatics unified installer 0.38?
ID: 1127048 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1127049 - Posted: 12 Jul 2011, 23:49:14 UTC - in response to Message 1127048.  

No, the installer gives you the x38g build. This thread will lead you to the x39e build... http://setiathome.berkeley.edu/forum_thread.php?id=64739&nowrap=true#1125635 You have to make some changes to the app_info file.


PROUD MEMBER OF Team Starfire World BOINC
ID: 1127049 · Report as offensive
Acrklor
Volunteer tester
Avatar

Send message
Joined: 22 Oct 01
Posts: 14
Credit: 639,144
RAC: 0
Austria
Message 1129104 - Posted: 18 Jul 2011, 8:23:23 UTC - in response to Message 1126048.  

Thanks for the suggetions while I was away.

Looking through your tasks, it looks as if MB r177 worked when you had Cat 11.5 (CAL 1.4.1385) drivers installed and hasn't worked since you had Cat 11.6 (CAL 1.4.1417) installed,
So try uninstalling Cat 11.6, and installing Cat 11.5 again,

Claggy

I thought about the driver version myself and was sure I installed them at least a week before the problems. Anyhow I reverted back to Cat 11.5 to test this, of course. No change.

A couple of tests I ran myself with the last 6 workunits suggest the weird conincidence I mentioned: a hardware issue. <_<
It seems when the workunit will fail it will be at the beginning, after that it seems to be running fine. With these 6 workunits the failure rate is about 50% (assuming the ones running through would valided). The failing seems to be independent from the workunit, the load on the CPU and the GPU temperature, because there where failure and success in each case.
'Failing' will also send something back of course, however, the XML lacks a few starting tags which would explain the empty stderr.

Furthermore game play won't show any kinds of (graphical) error, the same with GPU stability tests which resulted only in higher temperature than while cruching but still not showing anything wrong. I found a memtest for CUDA...does anyone know one for ATI/OpenCL?
"Judging people you don't know for things you don't understand is just really stupid." - Ellen Page
ID: 1129104 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1129113 - Posted: 18 Jul 2011, 9:12:30 UTC - in response to Message 1129104.  

The Astropulse Wu ran O.K, i wonder if it's your settings for the r177_HD5 app, can you post that section of the app_info please,

Claggy
ID: 1129113 · Report as offensive
Acrklor
Volunteer tester
Avatar

Send message
Joined: 22 Oct 01
Posts: 14
Credit: 639,144
RAC: 0
Austria
Message 1129119 - Posted: 18 Jul 2011, 9:36:51 UTC

    <app_version>
        <app_name>setiathome_enhanced</app_name>
        <version_num>610</version_num>
        <platform>windows_intelx86</platform>
        <avg_ncpus>0.05</avg_ncpus>
        <max_ncpus>0.05</max_ncpus>
        <plan_class>ati13ati</plan_class>
        <cmdline>-period_iterations_num 14 -instances_per_device 1</cmdline>
        <coproc>
            <type>ATI</type>
            <count>1</count>
        </coproc>
        <file_ref>
            <file_name>MB_6.10_win_SSE3_ATI_HD5_r177.exe</file_name>
            <main_program/>                           
        </file_ref>
        <file_ref>
            <file_name>MultiBeam_Kernels_r177.cl</file_name>
            <open_name>MultiBeam_Kernels.cl</open_name>
            <copy_file/>
        </file_ref>
    </app_version>
    <app_version>
        <app_name>setiathome_enhanced</app_name>
        <version_num>610</version_num>
        <platform>windows_x86_64</platform>
        <avg_ncpus>0.05</avg_ncpus>
        <max_ncpus>0.05</max_ncpus>
        <plan_class>ati13ati</plan_class>
        <cmdline>-period_iterations_num 14 -instances_per_device 1</cmdline>
        <coproc>
            <type>ATI</type>
            <count>1</count>
        </coproc>
        <file_ref>
            <file_name>MB_6.10_win_SSE3_ATI_HD5_r177.exe</file_name>
            <main_program/>                           
        </file_ref>
        <file_ref>
            <file_name>MultiBeam_Kernels_r177.cl</file_name>
            <open_name>MultiBeam_Kernels.cl</open_name>
            <copy_file/>
        </file_ref>
    </app_version>


They only thing I changed from the original the Lunatics Installer created was the -period_iterations_num.
"Judging people you don't know for things you don't understand is just really stupid." - Ellen Page
ID: 1129119 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1129124 - Posted: 18 Jul 2011, 10:11:15 UTC - in response to Message 1129119.  
Last modified: 18 Jul 2011, 10:25:59 UTC

They only thing I changed from the original the Lunatics Installer created was the -period_iterations_num.

The app_info section looked fine, when i first ran r177_HD5 my HD5770 only needed -period_iterations_num 2

I can't remember what driver/SDK that was on, probably Cat 10.10/SDK2.2,

I'd just try -period_iterations_num 2 first, if no improvement downgrade your Driver to Cat 11.2/SDK2.3,
there has been some changes in SDK2.4 that increased the amount of Memory available to OpenCL apps, there might be side effects,

The other things that are strange is your HD5770 is only reporting 9 Compute Units at 700MHz, my HD5770 has 10 Compute Units at 850MHz

Claggy
ID: 1129124 · Report as offensive
Acrklor
Volunteer tester
Avatar

Send message
Joined: 22 Oct 01
Posts: 14
Credit: 639,144
RAC: 0
Austria
Message 1130165 - Posted: 20 Jul 2011, 18:51:31 UTC - in response to Message 1129124.  

Neither -period_iterations_num 2 (or other values) nor downgrading to Cat 11.2/SDK2.3 seem to make any difference.

I've got the HD5750 which has only 9 Compute Units @ 700 MHz.
"Judging people you don't know for things you don't understand is just really stupid." - Ellen Page
ID: 1130165 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : Could 6.12.33 be the cause for my invalids?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.