Error Uploadserver

Message boards : Number crunching : Error Uploadserver
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Skywalker66 @ Berlin

Send message
Joined: 31 Jan 01
Posts: 78
Credit: 27,692,349
RAC: 0
Germany
Message 1017925 - Posted: 20 Jul 2010, 13:52:14 UTC
Last modified: 20 Jul 2010, 13:57:54 UTC

have a new Error and not see ever before

20.07.2010 15:49:53 SETI@home [error] Error reported by file upload server: can't open file /home/boincadm/projects/sah/upload/2b9/04jn10aa.3527.18886.15.10.105_2_0: Read-only file system
20.07.2010 15:49:53 SETI@home Temporarily failed upload of 04jn10aa.3527.18886.15.10.105_2_0: transient upload error
ID: 1017925 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1017933 - Posted: 20 Jul 2010, 14:16:53 UTC - in response to Message 1017925.  

Yes, new one for me too. I saw a brief total network outage (uploads, message boards, status page) starting around 13:20 UTC (I was doing shorties at the time):

20-Jul-2010 14:18:31 [SETI@home] Finished upload
20-Jul-2010 14:21:01 [SETI@home] Started upload
20-Jul-2010 14:21:24 [SETI@home] Temporarily failed upload

The network came back quite quickly, but all subsequent attempts show that error:

20-Jul-2010 14:26:40 [SETI@home] [error] Error reported by file upload server: can't open 
file /home/boincadm/projects/sah/upload/31e/19my10aa.29386.9479.7.10.208_0_0: Read-only file system

Unfortunately, it accepts the upload of the whole file before discovering the error, so the cricket graphs show high inbound activity.

I'll PM Jeff, and ask if he can fix it before total shut-down for the outage.
ID: 1017933 · Report as offensive
Profile Skywalker66 @ Berlin

Send message
Joined: 31 Jan 01
Posts: 78
Credit: 27,692,349
RAC: 0
Germany
Message 1017937 - Posted: 20 Jul 2010, 14:35:51 UTC

thanks Richard !!!!
ID: 1017937 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1017939 - Posted: 20 Jul 2010, 14:44:18 UTC

Extra information (I pointed Jeff to this thread, so any more useful information should catch his eye here):

Beta uploads are still running fine, it's only the main project which is affected.
ID: 1017939 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1017940 - Posted: 20 Jul 2010, 14:47:29 UTC
Last modified: 20 Jul 2010, 14:53:49 UTC

I'm getting the same error. I rebooted just before noticing the error. I aborted the first one in the transfers tab and then aborted it in the tasks tab. The next WU to finish has also hung with this message. Let us know what happens.

Ok, just a little added something, it's happening on both CPU and GPU work. The GPU WU was dated 04jn10aa and the CPU failure was dated 06jn10aa if that helps you any.


PROUD MEMBER OF Team Starfire World BOINC
ID: 1017940 · Report as offensive
Profile Lint trap

Send message
Joined: 30 May 03
Posts: 871
Credit: 28,092,319
RAC: 0
United States
Message 1017944 - Posted: 20 Jul 2010, 15:02:12 UTC


Yep, suspending network activity...

Martin
ID: 1017944 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1017946 - Posted: 20 Jul 2010, 15:14:54 UTC - in response to Message 1017944.  
Last modified: 20 Jul 2010, 15:20:50 UTC

I guess somebody is working on it, I'm getting the projects may be temporarily down message now. Wish they had waited just a few minutes more though, I just finished a CPU task from 05no09ad and would have liked to see if it might just be the newer WUs going bad.

Edit: Just noticed I'm getting validate errors now too. Probably related.


PROUD MEMBER OF Team Starfire World BOINC
ID: 1017946 · Report as offensive
Profile Lint trap

Send message
Joined: 30 May 03
Posts: 871
Credit: 28,092,319
RAC: 0
United States
Message 1017947 - Posted: 20 Jul 2010, 15:27:12 UTC - in response to Message 1017946.  

My first error occurred:

7/20/2010 10:33:51 AM	SETI@home	Started upload of 05jn10aa.18791.65145.15.10.65_0_0
7/20/2010 10:33:54 AM	SETI@home	[error] Error reported by file upload server: can't open file /home/boincadm/projects/sah/upload/6b/05jn10aa.18791.65145.15.10.65_0_0: Read-only file system


So, I'll just leave the network suspended now. It was going to "auto"-suspend at noon EST anyways...

No big loss. I'll just have 1 extra wu to report on Friday/Saturday, whenever I can...

Martin
ID: 1017947 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1017949 - Posted: 20 Jul 2010, 15:31:17 UTC - in response to Message 1017946.  

Yeah, Server Status page shows all red and orange except for the Databases and webpages.

Probably figured the upload server would not be back online before 0900 PDT (1600 UTC) so they started the shutdown early.

I had 2 short VHARs trying to upload, guess they'll have to wait until Friday.

ID: 1017949 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1017951 - Posted: 20 Jul 2010, 15:31:35 UTC - in response to Message 1017946.  
Last modified: 20 Jul 2010, 15:35:26 UTC

But all of a suddden, it appears online, atleast the Forums, hope I can post this message :)
Well them it wasn't offline, yet......
If the outage begins, still got no work, yeah a bunch of shortiees and some AP tasks.
Well, just wait and see.
ID: 1017951 · Report as offensive
Nemesis

Send message
Joined: 14 Mar 07
Posts: 129
Credit: 31,295,655
RAC: 0
Canada
Message 1017954 - Posted: 20 Jul 2010, 15:33:22 UTC
Last modified: 20 Jul 2010, 15:38:36 UTC

I sure hope it's being worked on and that the problem with validation errors can be corrected. I'm well over 100 "invalid" WU's now from all 3 of my crunchers...all stuff that was, supposedly, uploaded today - July 20th. If it was only one box then I would assume it was my problem, but with it happening on all 3 of my crunchers so it's something wrong on the SETI side of things.

Now that the weekly outage has started at least I won't be uploading any more trouble.
ID: 1017954 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1017961 - Posted: 20 Jul 2010, 16:11:14 UTC - in response to Message 1017954.  

I've noticed some WU's which have this message:

<core_client_version>6.10.56</core_client_version>
<![CDATA[
<message>
- exit code -529697949 (0xe06d7363)
</message>
<stderr_txt>
setiathome_CUDA: Found 1 CUDA device(s):
Device 1 : GeForce GTS 250

19no09ac.11796.885.9.10.220
applicatie SETI@home Enhanced
aangemaakt 13 Jul 2010 10:36:54 UTC

Error code: -529697949 (0xffffffffe06d7363)

And a lot client detached, messages, some 40 tasks. I wonder what
caused this, as I've no clue.
ID: 1017961 · Report as offensive
Terror Australis
Volunteer tester

Send message
Joined: 14 Feb 04
Posts: 1817
Credit: 262,693,308
RAC: 44
Australia
Message 1017998 - Posted: 20 Jul 2010, 23:49:28 UTC
Last modified: 20 Jul 2010, 23:52:33 UTC

My errors for the same period
20/07/2010 23:17:31 SETI@home Started upload of 03jn10aa.8800.8247.11.10.84_0_0
20/07/2010 23:17:31 SETI@home Started upload of 03jn10aa.8800.8247.11.10.83_0_0
20/07/2010 23:17:37 SETI@home [error] Error reported by file upload server: can't open file /home/boincadm/projects/sah/upload/c/03jn10aa.8800.8247.11.10.84_0_0: Read-only file system
20/07/2010 23:17:37 SETI@home [error] Error reported by file upload server: can't open file /home/boincadm/projects/sah/upload/30e/03jn10aa.8800.8247.11.10.83_0_0: Read-only file system
20/07/2010 23:17:37 SETI@home Temporarily failed upload of 03jn10aa.8800.8247.11.10.84_0_0: transient upload error
20/07/2010 23:17:37 SETI@home Backing off 1 min 0 sec on upload of 03jn10aa.8800.8247.11.10.84_0_0
20/07/2010 23:17:37 SETI@home Temporarily failed upload of 03jn10aa.8800.8247.11.10.83_0_0: transient upload error
20/07/2010 23:17:37 SETI@home Backing off 1 min 0 sec on upload of 03jn10aa.8800.8247.11.10.83_0_0

Similar to others but no mix up of the file names, times are in Australian Central Standard time - UTC + 9.5 hours about 1 minute after the problem started
All machines were showing this error and the Server Status page showed all SAH Validators in Red - Everything else was green

T.A.
ID: 1017998 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1018088 - Posted: 21 Jul 2010, 6:40:49 UTC - in response to Message 1017998.  
Last modified: 21 Jul 2010, 7:01:32 UTC

In Papa's sticky thread Extended Outage July 20 2010 - Problems he mentions that there was a BOINC Database Crash this morning shortly before the shutdown. That could explain many of the error messages cited in this thread.

I suspect getting the Database back online was one of the first tasks after the shutdown. Hope Jeff or Papa will give us an update on Wednesday, but if not, I can wait until Thursday or Friday.

Until then, crunch 'em if you got 'em.
ID: 1018088 · Report as offensive
Profile Hellsheep
Volunteer tester

Send message
Joined: 12 Sep 08
Posts: 428
Credit: 784,780
RAC: 0
Australia
Message 1018370 - Posted: 22 Jul 2010, 6:25:08 UTC

Just to clarify, as far as i am aware web-servers and file systems the way the work is after a crash or a serious error, the system reboots in read-only mode. Also usually a FSCK(File system check) is done on the server automatically.

It would seem the server encountered an error, and was either rebooted or rebooted itself into read-only mode to prevent any further issues. :)

(Good thing web servers and servers are my specialty.) ;)
- Jarryd
ID: 1018370 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1018373 - Posted: 22 Jul 2010, 6:43:41 UTC - in response to Message 1018370.  

Just to clarify, as far as i am aware web-servers and file systems the way the work is after a crash or a serious error, the system reboots in read-only mode. Also usually a FSCK(File system check) is done on the server automatically.

It would seem the server encountered an error, and was either rebooted or rebooted itself into read-only mode to prevent any further issues. :)

(Good thing web servers and servers are my specialty.) ;)

As I said in Pappa's thread yesterday, that makes a lot more sense than his off-the-cuff remark about a BOINC database crash. I don't know much about web or general *nix servers, but I do know a bit about databases - and if the early outage was invoked by staff because of database problems, then they would have been the result of the spontaneous reboot, not the original cause. Different symptoms entirely.
ID: 1018373 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1018455 - Posted: 22 Jul 2010, 16:24:45 UTC - in response to Message 1018373.  

Just to clarify, as far as i am aware web-servers and file systems the way the work is after a crash or a serious error, the system reboots in read-only mode. Also usually a FSCK(File system check) is done on the server automatically.

It would seem the server encountered an error, and was either rebooted or rebooted itself into read-only mode to prevent any further issues. :)

(Good thing web servers and servers are my specialty.) ;)

As I said in Pappa's thread yesterday, that makes a lot more sense than his off-the-cuff remark about a BOINC database crash. I don't know much about web or general *nix servers, but I do know a bit about databases - and if the early outage was invoked by staff because of database problems, then they would have been the result of the spontaneous reboot, not the original cause. Different symptoms entirely.


Richard, if I understand this right, you saw indications of a short power / Internet access interuption, which may have caused the upload/download servers (and maybe others) to reboot in Read-Only mode, which then caused the Master BOINC database to crash. With ALL that chaos, they shut everything down and did a full restart.

That DOES make a whole lot more sense than just a Database crash.


ID: 1018455 · Report as offensive
Profile Hellsheep
Volunteer tester

Send message
Joined: 12 Sep 08
Posts: 428
Credit: 784,780
RAC: 0
Australia
Message 1018585 - Posted: 23 Jul 2010, 4:03:56 UTC - in response to Message 1018455.  

Just to clarify, as far as i am aware web-servers and file systems the way the work is after a crash or a serious error, the system reboots in read-only mode. Also usually a FSCK(File system check) is done on the server automatically.

It would seem the server encountered an error, and was either rebooted or rebooted itself into read-only mode to prevent any further issues. :)

(Good thing web servers and servers are my specialty.) ;)

As I said in Pappa's thread yesterday, that makes a lot more sense than his off-the-cuff remark about a BOINC database crash. I don't know much about web or general *nix servers, but I do know a bit about databases - and if the early outage was invoked by staff because of database problems, then they would have been the result of the spontaneous reboot, not the original cause. Different symptoms entirely.


Richard, if I understand this right, you saw indications of a short power / Internet access interuption, which may have caused the upload/download servers (and maybe others) to reboot in Read-Only mode, which then caused the Master BOINC database to crash. With ALL that chaos, they shut everything down and did a full restart.

That DOES make a whole lot more sense than just a Database crash.




100% correct, a power outage or surge would cause the servers to reboot in read only mode due to it thinking it was a possible hardware failure. :)

Database probably did crash, but only as a result of it being in read only mode and unable to write anything. :)
- Jarryd
ID: 1018585 · Report as offensive

Message boards : Number crunching : Error Uploadserver


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.