10ja10zz. WU's ???

Message boards : Number crunching : 10ja10zz. WU's ???
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Bernd Noessler

Send message
Joined: 15 Nov 09
Posts: 99
Credit: 52,635,434
RAC: 0
Germany
Message 1277978 - Posted: 31 Aug 2012, 7:38:52 UTC

I have got some of them.

http://setiathome.berkeley.edu/workunit.php?wuid=1056302852

They all end with a div by zero exception.

ID: 1277978 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1277987 - Posted: 31 Aug 2012, 8:01:33 UTC

Not got any here, but I'll keep an eye open for them.
ID: 1277987 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1278294 - Posted: 31 Aug 2012, 16:17:49 UTC

Hmm, Windows stock applications (both CPU and CUDA) aren't showing an error but quit very soon. See WU 1056302893 for instance.

Both Linux and OSX do show the divide by zero, as well as at least one Lunatics Windows CPU app, see task 2584521118. The Windows dump there seems to indicate the error happens while doing baseline smoothing.
                                                                   Joe
ID: 1278294 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1278297 - Posted: 31 Aug 2012, 16:26:45 UTC
Last modified: 31 Aug 2012, 16:39:49 UTC

Out of the 4 I have processed so far 2 were completed by the stock app.

Workunit: 1056405533
10ja10zz.7361.4578.6.10.227.vlar
Workunit: 1056385201
10ja10zz.7361.897.6.10.78
Workunit: 1056325001 - Completed w/o error by Stock app.
10ja10zz.7591.2942.3.10.103.vlar
Workunit: 1056308870 - Completed w/o error by Stock app.
10ja10zz.7591.1306.3.10.29.vlar

EDIT: Across all of my systems I only found 1 more of these.
Workunit: 1056385156 - Already errored by wingmate on anon app.
10ja10zz.7361.897.6.10.65_1
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1278297 · Report as offensive
Bernd Noessler

Send message
Joined: 15 Nov 09
Posts: 99
Credit: 52,635,434
RAC: 0
Germany
Message 1278324 - Posted: 31 Aug 2012, 17:09:48 UTC

This one is from a Windows 7 machine.
It looks like a pre stock 6.03 version and gives a lot of debugging output.

http://setiathome.berkeley.edu/result.php?resultid=2584566724

How can the Windows stock 6.03 clients (CPU) finish the tasks in 3-5 secs without an error or overflow ?
I think they catch the math interrupt and finish without an result.

ID: 1278324 · Report as offensive
Profile VQ-2 Ghost
Avatar

Send message
Joined: 18 Jul 02
Posts: 55
Credit: 1,165,715
RAC: 0
United States
Message 1278333 - Posted: 31 Aug 2012, 17:18:58 UTC - in response to Message 1277978.  

It seems that this is happening with a lot of us, there's got to be something wrong with these 10ja10zz vlar units.

Two of my lowly machines started getting these units that end up with div by zero exception error.
ID: 1278333 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1278347 - Posted: 31 Aug 2012, 17:34:20 UTC - in response to Message 1278324.  
Last modified: 31 Aug 2012, 18:09:41 UTC

This one is from a Windows 7 machine.
It looks like a pre stock 6.03 version and gives a lot of debugging output.

http://setiathome.berkeley.edu/result.php?resultid=2584566724

How can the Windows stock 6.03 clients (CPU) finish the tasks in 3-5 secs without an error or overflow ?
I think they catch the math interrupt and finish without an result.

The WUs are another case where the polyphase filter method of splitting is in use, it's good news that now ends up with correct file size at least.

However, the template being used for analysis_cfg clearly indicates they did not intend to actually deliver these WUs. There are multiple items with non-useful parameters, these 3 lines are enough to account for the observed issues:

<analysis_fft_lengths>0</analysis_fft_lengths>
<bsmooth_boxcar_length>0</bsmooth_boxcar_length>
<bsmooth_chunk_size>0</bsmooth_chunk_size>

A normal WU has:

<analysis_fft_lengths>262136</analysis_fft_lengths>
<bsmooth_boxcar_length>8192</bsmooth_boxcar_length>
<bsmooth_chunk_size>32768</bsmooth_chunk_size>


If an application doesn't error on the baseline smoothing of a 10ja10zz, it won't do any analysis since there are no FFT lengths defined.

{edit:} I did fire off an email to the staff.
                                                                  Joe
ID: 1278347 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1278384 - Posted: 31 Aug 2012, 18:17:38 UTC

The 'tape' ID of zz does seem to indicate this is something odd. Unless they did happen to have 676 'tapes' that day.
Perhaps this is trying to get some of the old data that was unsplitable before.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1278384 · Report as offensive
Profile Lint trap

Send message
Joined: 30 May 03
Posts: 871
Credit: 28,092,319
RAC: 0
United States
Message 1279650 - Posted: 3 Sep 2012, 14:55:24 UTC



The 6 I received all ended in computation error (because I'm on 'anonymous platform'??), so I used a modified vlar resend process to delete them.

I already had NNT set so they never had a chance!

I can do this until they expire...:)


Lt

ID: 1279650 · Report as offensive
Profile Link
Avatar

Send message
Joined: 18 Sep 03
Posts: 834
Credit: 1,807,369
RAC: 0
Germany
Message 1279714 - Posted: 3 Sep 2012, 17:35:10 UTC - in response to Message 1279650.  

The 6 I received all ended in computation error (because I'm on 'anonymous platform'??), so I used a modified vlar resend process to delete them.

For what all this efford? If they error out, they do it within few seconds...
ID: 1279714 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1279734 - Posted: 3 Sep 2012, 18:19:06 UTC - in response to Message 1279650.  

The 6 I received all ended in computation error (because I'm on 'anonymous platform'??), so I used a modified vlar resend process to delete them.

Eithier way they'll end up being regarded as an Error, you could get them resent to the Stock app and complete them that way,

Claggy
ID: 1279734 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1279765 - Posted: 3 Sep 2012, 19:53:00 UTC - in response to Message 1279714.  

The 6 I received all ended in computation error (because I'm on 'anonymous platform'??), so I used a modified vlar resend process to delete them.

For what all this efford? If they error out, they do it within few seconds...

Sorry but the 1's that I had on my video cards were running between 5 & 10 min with no progress before I aborted them.

Cheers.
ID: 1279765 · Report as offensive
Profile Link
Avatar

Send message
Joined: 18 Sep 03
Posts: 834
Credit: 1,807,369
RAC: 0
Germany
Message 1279795 - Posted: 3 Sep 2012, 22:04:21 UTC - in response to Message 1279765.  

The 6 I received all ended in computation error (because I'm on 'anonymous platform'??), so I used a modified vlar resend process to delete them.

For what all this efford? If they error out, they do it within few seconds...

Sorry but the 1's that I had on my video cards were running between 5 & 10 min with no progress before I aborted them.

Yeah, that would be my way of solving that problem too (if I needed to do so), but "modified vlar resend process"?
ID: 1279795 · Report as offensive
Profile Lint trap

Send message
Joined: 30 May 03
Posts: 871
Credit: 28,092,319
RAC: 0
United States
Message 1279823 - Posted: 3 Sep 2012, 23:00:46 UTC - in response to Message 1279734.  

The 6 I received all ended in computation error (because I'm on 'anonymous platform'??), so I used a modified vlar resend process to delete them.

Eithier way they'll end up being regarded as an Error, you could get them resent to the Stock app and complete them that way,

Claggy


They were cpu wu's that all ended with Integer Divide by Zero faults. The result texts show no cpu seconds and no runtimes.

My Lunatics cpu app is AK_v8b2_win_SSE41.exe.

The files were resent, but they did not reappear in client_state.xml...so I deleted the files a second time. Another manual update since then was made with no more activity from the servers. They will timeout on 9/8.

Because they don't seem to be 'valid' workunits, I would just rather they all timeout than get computation errors.


Lt


ID: 1279823 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1279982 - Posted: 4 Sep 2012, 14:07:12 UTC - in response to Message 1279823.  

The 6 I received all ended in computation error (because I'm on 'anonymous platform'??), so I used a modified vlar resend process to delete them.

Eithier way they'll end up being regarded as an Error, you could get them resent to the Stock app and complete them that way,

Claggy


They were cpu wu's that all ended with Integer Divide by Zero faults. The result texts show no cpu seconds and no runtimes.

My Lunatics cpu app is AK_v8b2_win_SSE41.exe.

The files were resent, but they did not reappear in client_state.xml...so I deleted the files a second time. Another manual update since then was made with no more activity from the servers. They will timeout on 9/8.

Because they don't seem to be 'valid' workunits, I would just rather they all timeout than get computation errors.

Lt


Simply aborting them would have been to difficult? Then they would have been sent to another host instead of waiting 6-8 weeks to timeout and then be sent to another host.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1279982 · Report as offensive
Profile Lint trap

Send message
Joined: 30 May 03
Posts: 871
Credit: 28,092,319
RAC: 0
United States
Message 1280038 - Posted: 4 Sep 2012, 21:37:31 UTC - in response to Message 1279982.  

Simply aborting them would have been to difficult? Then they would have been sent to another host instead of waiting 6-8 weeks to timeout and then be sent to another host.



Yes, I would have aborted them if I had seen them before they errored.

Anyway, they were put on a very short deadline after being resent and timed out a few hours later.


Lt
ID: 1280038 · Report as offensive
Profile Link
Avatar

Send message
Joined: 18 Sep 03
Posts: 834
Credit: 1,807,369
RAC: 0
Germany
Message 1280172 - Posted: 5 Sep 2012, 7:09:34 UTC - in response to Message 1280038.  

Simply aborting them would have been to difficult? Then they would have been sent to another host instead of waiting 6-8 weeks to timeout and then be sent to another host.



Yes, I would have aborted them if I had seen them before they errored.

And what's wrong with reporting them as errors? I mean that would have happen automatically, no action from the user required...
ID: 1280172 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 1280337 - Posted: 5 Sep 2012, 17:28:07 UTC
Last modified: 5 Sep 2012, 22:46:12 UTC

In a word: oops.

This is a test tape I generated with completely artificial data for testing/calibration purposes. It seems the splitter didn't like it, and obviously we sent some garbage through the whole system. Sorry about that.

We will tweak a couple things and send this file out again. Please processes them normally, as you'll get normal credit, etc.

- Matt
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 1280337 · Report as offensive
Bernd Noessler

Send message
Joined: 15 Nov 09
Posts: 99
Credit: 52,635,434
RAC: 0
Germany
Message 1280492 - Posted: 6 Sep 2012, 4:36:41 UTC

Thanks for the info.

ID: 1280492 · Report as offensive
Profile shizaru
Volunteer tester
Avatar

Send message
Joined: 14 Jun 04
Posts: 1130
Credit: 1,967,904
RAC: 0
Greece
Message 1280523 - Posted: 6 Sep 2012, 9:32:34 UTC - in response to Message 1280337.  

This is a test tape I generated...


So that's why they're called JAZZ?:)
ID: 1280523 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : 10ja10zz. WU's ???


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.