Discussion of Invalid Host Messaging

Message boards : Number crunching : Discussion of Invalid Host Messaging
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1325538 - Posted: 7 Jan 2013, 17:19:44 UTC - in response to Message 1324689.  

So, not a single AstroPulse "Computation error" since updating to BOINC 7.0.38. Interesting. I had been receiving around one, sometimes two, a day with 7.0.28. I see the AstroPulse error was vaporized, but, I did receive 4 others with the nVidia card before adding the FLOP entry. You can see when I updated to 7.0.38 here, Task 2780311703 It's become somewhat of a cliffhanger now, will it pass or throw the Error? The suspense is growing with every passing hour, All AstroPulse v6 tasks for computer 6797524
I suppose I could just change the Unroll to 2 and see what happens, but, that would remove the drama...

This could save a large amount of Computer Time if updating BOINC solved the problem :-)
ID: 1325538 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1325721 - Posted: 8 Jan 2013, 7:04:50 UTC - in response to Message 1325538.  

*** I suppose I could just change the Unroll to 2 and see what happens ***

I've completed 6 APs in a row with the Unroll setting at 2. If I would have done that back here ^Something isn't right^ around 5 of those 6 would have ended in a "Computation error".
Here are the first 3 of the 6;
Task 2784334429
Task 2784338147
Task 2784338745

I'm ready to declare Victory.
ID: 1325721 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1326061 - Posted: 9 Jan 2013, 14:16:13 UTC - in response to Message 1325721.  
Last modified: 9 Jan 2013, 14:23:18 UTC

*** I suppose I could just change the Unroll to 2 and see what happens ***

I've completed 6 APs in a row with the Unroll setting at 2. If I would have done that back here ^Something isn't right^ around 5 of those 6 would have ended in a "Computation error".
Here are the first 3 of the 6;
Task 2784334429
Task 2784338147
Task 2784338745

I'm ready to declare Victory.


You could adjust the Fetch and Thread_Block, see that you use 1:3 and
Fetch_Block 2048 and Thread_Block 6144, have you tried other values, f.i. Tread_Block 10240 and Fetch_Block 5120 or 8192 and 4096: (1:2).
Running on device number: 0
DATA_CHUNK_UNROLL at default:2
DATA_CHUNK_UNROLL set to:2
FFA thread block override value:6144
FFA thread fetchblock override value:2048


With UNROLL=2, Tread and Fetch_Block can be bigger!
Which one is the most effective also depends which GPU you're using.
See that you use an 9800GT and a BARTS GPU. Quite different architecture and
thus Regsize.
ID: 1326061 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1326114 - Posted: 9 Jan 2013, 17:37:28 UTC - in response to Message 1326061.  

*** I suppose I could just change the Unroll to 2 and see what happens ***

I've completed 6 APs in a row with the Unroll setting at 2. If I would have done that back here ^Something isn't right^ around 5 of those 6 would have ended in a "Computation error".
Here are the first 3 of the 6;
Task 2784334429
Task 2784338147
Task 2784338745

I'm ready to declare Victory.


You could adjust the Fetch and Thread_Block, see that you use 1:3 and
Fetch_Block 2048 and Thread_Block 6144, have you tried other values, f.i. Tread_Block 10240 and Fetch_Block 5120 or 8192 and 4096: (1:2).
Running on device number: 0
DATA_CHUNK_UNROLL at default:2
DATA_CHUNK_UNROLL set to:2
FFA thread block override value:6144
FFA thread fetchblock override value:2048


With UNROLL=2, Tread and Fetch_Block can be bigger!
Which one is the most effective also depends which GPU you're using.
See that you use an 9800GT and a BARTS GPU. Quite different architecture and
thus Regsize.

Look at my results from last night, you will see different -ffa_block & -ffa_block_fetch numbers, All AstroPulse v6 tasks for computer 6797524. Since declaring Victory I have been attempting to up the average GPU usage back to around 90%. BOINC 7.0.38 seems to have lowered the average to around 80%. From my experience the -ffa_block & -ffa_block_fetch numbers don't make that much of a difference above the 6144 & 1536 setting I was using for a long time. The Unroll numbers do make a noticeable difference. From what I've read, the optimum Unroll number is equal to your Compute Units, which for the 6850 is 12. I only work Multibeam on the 8800, and try to keep the 6850 on AstroPulses. They use different Apps, different settings. There are a large number of people receiving the "Access Violation (0xc0000005) at address 0x0040xxxx read attempt to address 0x04A3xxxx" Error, all you have to do is look around and you will find plenty. I find them by just looking at the results from my AstroPulse Wingmen.
ID: 1326114 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34253
Credit: 79,922,639
RAC: 80
Germany
Message 1326116 - Posted: 9 Jan 2013, 17:49:36 UTC

GPU usage has nothing to do with the Boinc version.
Blankings are reponsible for GPU usage because its calculation is done by the CPU.



With each crime and every kindness we birth our future.
ID: 1326116 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1326125 - Posted: 9 Jan 2013, 18:20:51 UTC - in response to Message 1326116.  

GPU usage has nothing to do with the Boinc version.
Blankings are reponsible for GPU usage because its calculation is done by the CPU.

Sorry Mike. My experience has shown that different BOINC versions do make a difference. I'm well aware of the Blanking slowing down GPU usage as I've watched the process for quite a while now. I just watched one crawl by at around 40% GPU usage.

BTW, remember the problem with running 2 MBs with BOINC 7.0.36? It was solved by going back to BOINC 7.0.28. Well, the problem is back with 7.0.38. Not only that, NOW I'm having problems with running 2 APs with 7.0.38. It's fine until I hit 2 of those Blanked APs at the same time, then I receive a hang. But....But.... I didn't have that problem with APs with BOINC 7.0.28 OR 7.0.36. BOINC versions do make a difference on my MacPro Running Win XPsp3 with an AMD 6850. How many of those do you have around here? A MacPro, running Win XPsp3, with an 6850?
ID: 1326125 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34253
Credit: 79,922,639
RAC: 80
Germany
Message 1326128 - Posted: 9 Jan 2013, 18:27:58 UTC

Do you keep a CPU core free ?

Its more the mixed setup i would say.
Running 2 different GPUs always uses much ressources.
I`m fully aware of the 6850, it has issues running multiple instances.



With each crime and every kindness we birth our future.
ID: 1326128 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1326134 - Posted: 9 Jan 2013, 19:00:06 UTC - in response to Message 1326128.  

A while back, I went to using a CPU setting of 60% for Multiprocessors. That gave me about the same as I had using a different setting, 2 CPUs for 603 Tasks and 2 CPUs for the GPUs. I've never had a problem with having 2 CPUs free for the 2 GPUs even with running Multiple Instances on 1 GPU. I'm running the wonderful System Information Viewer continuously, and it has a CPU Process graph for each Process. When the Hang occurs, the CPU process for one AP maxes out in the Red. My guess is BOINC 7.0.38 wants me to sacrifice another CPU for the GPUs. Again, I did't have that problem in 7.0.28 or 7.0.36. I'm really not interested in running 2 APs at the same time, I was just testing to see if I got the (0xc0000005) error. I didn't receive the (0xc0000005) error, I got something else. I'm happy with running 1 AP at a time, at around 90% GPU usage. If you look back at my results when using 7.0.28, you will notice I was getting some fast times. Those times were with the GPU running around 90% with most tasks. Since updating to 7.0.38, the most I've seen is around 80% usage on the fastest APs I've run so far. I should have run across quite a few at 90%, I haven't....So Far.
ID: 1326134 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1326253 - Posted: 10 Jan 2013, 1:41:08 UTC - in response to Message 1326134.  
Last modified: 10 Jan 2013, 2:39:55 UTC

BTW, I still haven't received the dreaded "Access Violation (0xc0000005) at address 0x0040A1FA read attempt to address 0x00399F64" Error on my MacPro since updating to 7.0.38. Later tonight I will be updating the machine with the ATI 4670 to 7.0.38. I'm not sure what's going on with that machine, it's had a terrible week. It usually doesn't give any problems. Now it's given the "Unhandled Exception Detected...
- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x0040A1FA read attempt to address 0x00399F64" Error, Followed by an Invalid, and then more questionable results. We'll see what happens after the update.

When it rains....
Check this out. I just looked at my last completed AP and look at what my Wingman got;
Access Violation (0xc0000005) at address 0x0040A1FA read attempt to address 0x0052901C
You can't make this stuff up....
ID: 1326253 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1327159 - Posted: 12 Jan 2013, 19:25:44 UTC

After 9 days, another Error. Well, it's better than it use to be.

Access Violation (0xc0000005) at address 0x0040A1FA read attempt to address 0x003AA2DC

When the other results are in, I'll bet they validate the results that are currently labeled "Invalid". Something is causing that Error after the results are written...
ID: 1327159 · Report as offensive
Profile trader
Volunteer tester

Send message
Joined: 25 Jun 00
Posts: 126
Credit: 4,968,173
RAC: 0
United States
Message 1350299 - Posted: 24 Mar 2013, 22:52:13 UTC - in response to Message 1327159.  

if you are getting errors using the stock app, and are running windows x64. pm me and i might be able to help
ID: 1350299 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1350322 - Posted: 24 Mar 2013, 23:53:07 UTC - in response to Message 1350299.  

i might be able to help

With a 10-week old problem?

Do share.
ID: 1350322 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1350361 - Posted: 25 Mar 2013, 2:22:02 UTC
Last modified: 25 Mar 2013, 2:31:18 UTC

Maybe someone could send a PM to Cody Sharp and explain to him how to add the command line entry to his ap_cmdline_6.04_windows_intelx86__opencl_ati.txt file. He just showed up at the top of one of my Workgroups again. I guess his file might have a different name since he's running Win 7 64bit. If he just added one of these lines to that file he could probably avoid most of those Errors. I keep thinking about it every time I see one of his Errors, but, there are so many of those affected Hosts...

Maybe SETI could make a post on the News page about the problem and explain how to add one of the lines;
Command Line Parameters,
High end cards (more than 12 compute units)
-unroll 12 -ffa_block 8192 -ffa_block_fetch 4096 -hp

Mid range cards (less than 12 compute units)
-unroll 10 -ffa_block 6144 -ffa_block_fetch 1536 -hp

Entry level GPU (less than 6 compute units)
-unroll 4 -ffa_block 2048 -ffa_block_fetch 1024 -hp

It might save some bandwidth...and give people sending PMs a link to reference.
ID: 1350361 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34253
Credit: 79,922,639
RAC: 80
Germany
Message 1350451 - Posted: 25 Mar 2013, 8:52:22 UTC

The app has lowest values set as default.
You only need to do that if you want to go faster.



With each crime and every kindness we birth our future.
ID: 1350451 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1350508 - Posted: 25 Mar 2013, 14:52:11 UTC - in response to Message 1350451.  
Last modified: 25 Mar 2013, 15:15:52 UTC

It also causes something else, related to the Error. What happens if you set the settings too high in XP? *Out of Memory* It causes the App to use more memory. But, You should know that though...

It sure helped me back here, Then try adding -unroll 10 -ffa_block 6144 -ffa_block_fetch 1536 to your ap_cmdline_6.04_windows_intelx86__opencl_ati.txt file. Send someone having a lot of these Errors a message, see if it helps them.

Oh, I'm about out of APs again...
ID: 1350508 · Report as offensive
Profile trader
Volunteer tester

Send message
Joined: 25 Jun 00
Posts: 126
Credit: 4,968,173
RAC: 0
United States
Message 1350638 - Posted: 25 Mar 2013, 18:54:54 UTC - in response to Message 1350322.  

i might be able to help

With a 10-week old problem?

Do share.


An answer via an objective interpolation of your definition of the word share when applied to the meagar number of posts I have precludes me from providing you with the answer that you desire that will not violate the august moderator's interpretation of what a flame/hate mail is, and I actually care about what he thinks.
ID: 1350638 · Report as offensive
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1350640 - Posted: 25 Mar 2013, 18:59:03 UTC - in response to Message 1350638.  

i might be able to help

With a 10-week old problem?

Do share.


An answer via an objective interpolation of your definition of the word share when applied to the meagar number of posts I have precludes me from providing you with the answer that you desire that will not violate the august moderator's interpretation of what a flame/hate mail is, and I actually care about what he thinks.

Which 'he' would that be now, in the last sentence? *puzzled look*
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1350640 · Report as offensive
andybutt
Volunteer tester
Avatar

Send message
Joined: 18 Mar 03
Posts: 262
Credit: 164,205,187
RAC: 516
United Kingdom
Message 1350644 - Posted: 25 Mar 2013, 19:01:55 UTC - in response to Message 1350638.  

we would still like to know your input to the problem whatever Richard said to offend you

Andy
ID: 1350644 · Report as offensive
Profile Dimly Lit Lightbulb 😀
Volunteer tester
Avatar

Send message
Joined: 30 Aug 08
Posts: 15399
Credit: 7,423,413
RAC: 1
United Kingdom
Message 1353958 - Posted: 6 Apr 2013, 1:58:08 UTC

Bump.

Member of the People Encouraging Niceness In Society club.

ID: 1353958 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1354203 - Posted: 6 Apr 2013, 22:07:55 UTC - in response to Message 1353958.  

Bump.

You just had to do that didn't you?

It triggered me going through that long list of those non-anonymous bad rigs of mine from last October and seeing how those rigs are going now.

A lot have gone, some have been fixed, but sadly a lot are still in "mangle mode".

If I get a chance today I'll go through my "Anonymous" list from that same time and see how that one stacks up.

Then I may just feel nasty enough to post the full details of all those bad rigs.

Cheers.
ID: 1354203 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : Discussion of Invalid Host Messaging


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.