Panic Mode On (85) Server Problems?

Message boards : Number crunching : Panic Mode On (85) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 17 · 18 · 19 · 20 · 21 · 22 · 23 · Next

AuthorMessage
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1452900 - Posted: 11 Dec 2013, 19:02:16 UTC - in response to Message 1452890.  
Last modified: 11 Dec 2013, 19:07:32 UTC

All have

  <error_code>-119</error_code>
  <error_message>MD5 check failed</error_message>

I had a feeling there were a lot more of those out there.
I just now finally received a new download on my machine that was graced with one of those errors.
ID: 1452900 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1452919 - Posted: 11 Dec 2013, 19:21:36 UTC - in response to Message 1452900.  

All have

  <error_code>-119</error_code>
  <error_message>MD5 check failed</error_message>

I had a feeling there were a lot more of those out there.
I just now finally received a new download on my machine that was graced with one of those errors.

Feel like identifying the task/WU, so the staff can find a common pattern?

My laptop has received more than 20 new tasks since the server came back online (mostly MB shorties), without error.

I checked Mike's -131, and it came from exactly the same batch (22jl08aa.7244.207012.438086664195.12.xxx) as the one I was told about privately.
ID: 1452919 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1452923 - Posted: 11 Dec 2013, 19:26:38 UTC - in response to Message 1452890.  

Yep, I got one of those also, WU #1376247105. Have successfully gotten quite a few other AP tasks this morning, but so far that was the only one attempted from the 01dc13ac file.
ID: 1452923 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1452933 - Posted: 11 Dec 2013, 19:45:26 UTC - in response to Message 1452919.  
Last modified: 11 Dec 2013, 19:47:51 UTC

It's the one from the above post, Message 1452823

Workunit 1376247104
Task           Computer 	          Sent 	                     Time reported                      Status 	             Run time CPU time Credit 	      Application
3278388381 	6796479 	11 Dec 2013, 4:55:36 UTC 	11 Dec 2013, 17:13:24 UTC 	Error while downloading 	0.00 	0.00 	--- 	AstroPulse v6  Anonymous platform (ATI GPU)
3278388382 	1504137 	11 Dec 2013, 4:55:36 UTC 	11 Dec 2013, 18:55:02 UTC 	Error while downloading 	0.00 	0.00 	--- 	AstroPulse v6  Anonymous platform (NVIDIA GPU)
3278461149 	7136250 	11 Dec 2013, 17:13:26 UTC 	5 Jan 2014, 17:13:26 UTC 	In progress 	                --- 	--- 	--- 	AstroPulse v6 v6.04 (opencl_nvidia_100)
3278609646 	5618795 	11 Dec 2013, 18:55:06 UTC 	11 Dec 2013, 19:00:14 UTC 	Error while downloading 	0.00 	0.00 	--- 	AstroPulse v6 v6.04 (opencl_nvidia_100)
3278617197 	5915829 	11 Dec 2013, 19:00:20 UTC 	11 Dec 2013, 19:05:26 UTC 	Error while downloading 	0.00 	0.00 	--- 	AstroPulse v6 v6.01
3278625745 	7080783 	11 Dec 2013, 19:05:34 UTC 	5 Jan 2014, 19:05:34 UTC 	In progress 	                --- 	--- 	--- 	AstroPulse v6 v6.01 
ID: 1452933 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1452950 - Posted: 11 Dec 2013, 20:19:10 UTC - in response to Message 1452933.  
Last modified: 11 Dec 2013, 20:22:31 UTC

Thanks. I've passed the message on.

Edit - Eric says:

They were probably being written when georgem died last night. I'll look for anything with odd sizes.
ID: 1452950 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1452977 - Posted: 11 Dec 2013, 21:08:46 UTC - in response to Message 1452845.  

One of my units failed with this error.
Long time no see.

</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>22jl08aa.7244.207012.438086664195.12.53_4_0</file_name>
<error_code>-131</error_code>
</file_xfer_error>

That appears to be a problem in the lab.

I had one reported to me privately this morning: now the download servers are working, I've checked the datafile, and it's 411KB in size (instead of the normal 367KB). The extra size is in the <workunit_header> - there are 412 occurrences of <coordinate_t> - and the header alone is 66KB. That's what makes the result file greater than the 64K max_nbytes allowed, and hence your ERR_FILE_TOO_BIG.

I've emailed the lab, and Eric has acknowledged receipt: they're looking into it.


What i find annoying is that our consecutive valid tasks will be reset for something the servers are responsible for.



With each crime and every kindness we birth our future.
ID: 1452977 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1452991 - Posted: 11 Dec 2013, 21:24:16 UTC - in response to Message 1452977.  

One of my units failed with this error.
Long time no see.

</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>22jl08aa.7244.207012.438086664195.12.53_4_0</file_name>
<error_code>-131</error_code>
</file_xfer_error>

That appears to be a problem in the lab.

I had one reported to me privately this morning: now the download servers are working, I've checked the datafile, and it's 411KB in size (instead of the normal 367KB). The extra size is in the <workunit_header> - there are 412 occurrences of <coordinate_t> - and the header alone is 66KB. That's what makes the result file greater than the 64K max_nbytes allowed, and hence your ERR_FILE_TOO_BIG.

I've emailed the lab, and Eric has acknowledged receipt: they're looking into it.

What i find annoying is that our consecutive valid tasks will be reset for something the servers are responsible for.

I don't hold much stock in the application numbers. As an example.
Number of tasks completed 6469
Consecutive valid tasks 6890

More valid tasks than complete... O.o
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1452991 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1453008 - Posted: 11 Dec 2013, 22:06:41 UTC - in response to Message 1452977.  

What i find annoying is that our consecutive valid tasks will be reset for something the servers are responsible for.

What I find annoying is that after half a day of NO downloads my libeled host is hit with a daily quota of less than HALF of it's daily output.
It makes it difficult to build the cache back...
ID: 1453008 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1453011 - Posted: 11 Dec 2013, 22:13:08 UTC - in response to Message 1452991.  

I don't hold much stock in the application numbers. As an example.
Number of tasks completed 6469
Consecutive valid tasks 6890

More valid tasks than complete... O.o

This happens because a completed task has a set of requirements in order to count. -9 overflows don't count. I don't know what other criteria MB uses for that. With AP, a task only counts for "number of tasks completed" if it was less than 10% blanked and did not 30/30 exit.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1453011 · Report as offensive
MikeN

Send message
Joined: 24 Jan 11
Posts: 319
Credit: 64,719,409
RAC: 85
United Kingdom
Message 1453034 - Posted: 11 Dec 2013, 23:12:28 UTC

Have a look at:

http://setiathome.berkeley.edu/workunit.php?wuid=1376247109

this one has already been classified as error may have bug even though no one has actually tried to crunch it yet!!
ID: 1453034 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1453097 - Posted: 12 Dec 2013, 4:27:38 UTC - in response to Message 1452923.  

Yep, I got one of those also, WU #1376247105. Have successfully gotten quite a few other AP tasks this morning, but so far that was the only one attempted from the 01dc13ac file.

Looks like I had another one this afternoon (about 4 hours ago), WU #1376247102, in case anybody's still keeping a log of these. This came in on a different one of my machines, but it looks like it was split from the same file as the first one. Since then, though, I've successfully received several AP tasks split from 01dc13ac.
ID: 1453097 · Report as offensive
Thomas
Volunteer tester

Send message
Joined: 9 Dec 11
Posts: 1499
Credit: 1,345,576
RAC: 0
France
Message 1453310 - Posted: 12 Dec 2013, 18:59:08 UTC

Maintenance...


ID: 1453310 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1453315 - Posted: 12 Dec 2013, 19:07:56 UTC

SSP not updating....
Wonder if georgem crashed again.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1453315 · Report as offensive
Profile S@NL Etienne Dokkum
Volunteer tester
Avatar

Send message
Joined: 11 Jun 99
Posts: 212
Credit: 43,822,095
RAC: 0
Netherlands
Message 1453327 - Posted: 12 Dec 2013, 19:50:07 UTC

I think there has been little to complain about over the last months server wise so two crashes in one week can do...

Fingers crossed this isn't a bad omen of things to come *searching for the duct tape again *
ID: 1453327 · Report as offensive
MikeN

Send message
Joined: 24 Jan 11
Posts: 319
Credit: 64,719,409
RAC: 85
United Kingdom
Message 1453328 - Posted: 12 Dec 2013, 20:03:09 UTC - in response to Message 1453315.  

SSP not updating....
Wonder if georgem crashed again.


Looks like it. Georgem now disabled, and everything else running fine with downloads coming through OK.

Sorry to hear about Squirrel.
ID: 1453328 · Report as offensive
Profile Fred E.
Volunteer tester

Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,140,697
RAC: 0
United States
Message 1453329 - Posted: 12 Dec 2013, 20:03:59 UTC
Last modified: 12 Dec 2013, 20:10:06 UTC

I'm connecting now and got work, Server Status Page is updating. GeorgeM is disabled for downloads but is running AP splitters, so not sure what's going on. Some downloads had a problem but did finish. Not sure I'd call this a crash since the message

219 SETI@home 12/12/2013 12:39:55 PM Project is temporarily shut down for maintenance
was being sent out - looks like an intentional shutdown to me.

Anyone with only one download server in a hosts file may need to modify that setting.
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.
ID: 1453329 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1453490 - Posted: 13 Dec 2013, 6:57:14 UTC - in response to Message 1453328.  

SSP not updating....
Wonder if georgem crashed again.


Looks like it. Georgem now disabled, and everything else running fine with downloads coming through OK.

Sorry to hear about Squirrel.

Thanks for the thoughts on Squirrel.

Eric replied that georgem had indeed crashed again, and Matt has switched some data transfers to another machine to try to make georgem more stable.

"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1453490 · Report as offensive
Thomas
Volunteer tester

Send message
Joined: 9 Dec 11
Posts: 1499
Credit: 1,345,576
RAC: 0
France
Message 1453493 - Posted: 13 Dec 2013, 7:12:40 UTC - in response to Message 1453490.  

Eric replied that georgem had indeed crashed again, and Matt has switched some data transfers to another machine to try to make georgem more stable.

THX for the heads-up Mark !
ID: 1453493 · Report as offensive
Profile James Sotherden
Avatar

Send message
Joined: 16 May 99
Posts: 10436
Credit: 110,373,059
RAC: 54
United States
Message 1453494 - Posted: 13 Dec 2013, 7:13:44 UTC

GeorgeM isnt that old. Is it hard drive problems? Or to much IO.
[/quote]

Old James
ID: 1453494 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1453540 - Posted: 13 Dec 2013, 14:26:31 UTC - in response to Message 1452820.  

One of my units failed with this error.
Long time no see.

</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>22jl08aa.7244.207012.438086664195.12.53_4_0</file_name>
<error_code>-131</error_code>
</file_xfer_error>


I've got 3 of those too. Every host on every one of them reports the same error. One has already reached "too many errors" status and the others will soon.

If I had time, I'd remote into my machine and abort any more of that series that it may have.

David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1453540 · Report as offensive
Previous · 1 . . . 17 · 18 · 19 · 20 · 21 · 22 · 23 · Next

Message boards : Number crunching : Panic Mode On (85) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.