Panic Mode On (17) Server problems

Message boards : Number crunching : Panic Mode On (17) Server problems
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · Next

AuthorMessage
Chelski
Avatar

Send message
Joined: 3 Jan 00
Posts: 121
Credit: 8,979,050
RAC: 0
Malaysia
Message 910738 - Posted: 24 Jun 2009, 14:36:24 UTC

I have a different problem. I got assigned work, but they just refused to download properly. Anyone else seeing this wrong size error? Is the server sending erroneous signal to Boinc that the download is completed while in reality nothing was transferred?

6/24/2009 10:34:03 PM SETI@home [error] File 06ap09ab.20174.20931.14.8.253 has wrong size: expected 375338, got 0

ID: 910738 · Report as offensive
Profile cliff west

Send message
Joined: 7 May 01
Posts: 211
Credit: 16,180,728
RAC: 15
United States
Message 910742 - Posted: 24 Jun 2009, 14:49:08 UTC

i have 6 tring to down load sence last night, and i'm out of cuda, i have my cashe set for 6 days, maybe i need to push to max
ID: 910742 · Report as offensive
Profile Virtual Boss*
Volunteer tester
Avatar

Send message
Joined: 4 May 08
Posts: 417
Credit: 6,440,287
RAC: 0
Australia
Message 910743 - Posted: 24 Jun 2009, 14:49:34 UTC - in response to Message 910738.  
Last modified: 24 Jun 2009, 14:50:58 UTC

I have a different problem. I got assigned work, but they just refused to download properly. Anyone else seeing this wrong size error? Is the server sending erroneous signal to Boinc that the download is completed while in reality nothing was transferred?

6/24/2009 10:34:03 PM SETI@home [error] File 06ap09ab.20174.20931.14.8.253 has wrong size: expected 375338, got 0


06ap09ab.20174.20931.14.8.253 has wrong size: expected 375338, got 0

This means the server tried to send the file, but you did not receive any of it.
Boinc will keep trying until you do receive it.

EDIT: Fixed tags.
Flying high with Team Sicituradastra.
ID: 910743 · Report as offensive
Profile triepke
Avatar

Send message
Joined: 3 Apr 99
Posts: 39
Credit: 14,102
RAC: 0
Belgium
Message 910744 - Posted: 24 Jun 2009, 14:49:43 UTC - in response to Message 910738.  

yes, same problem here
ID: 910744 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 910763 - Posted: 24 Jun 2009, 16:43:35 UTC - in response to Message 910707.  

I too posted something today well two messages and they have disappeared having trouble downloading again. Mainly connect failed or expecting and got zero


I had to shut down my 2nd (8-core) cruncher because no work is being sent. Ya know, I came back to SETI because I got an email, and built 2 octocore machines to process WUs...seems like a waste right now.

Why can't these probs be fixed? Is the h/w too old and flaky? Is the s/w inadequate for the load? In particular, is mysql strong enough? Any thoughts or ideas? Thanks!

There are two main issues with the back-end reliability:

1) Yes, a lot of the hardware is either hand-me-downs or prototypes being tested by the manufacturer, either case means special and careful attention when issues do arise, usually hardware issues
2) Most of the software is made in-house for a specific purpose, and MySQL is probably being used more intensely than it was ever intended to be used.

For either one of those two reasons, when problems arise, it takes a while to figure out what failed, and then to develop a solution to fix it by means of a band-aid, or in the best-case scenario, keep it from being a problem again. The only problem with fixing problems is that by fixing one problem, you often create 5 new ones.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 910763 · Report as offensive
Profile Byron S Goodgame
Volunteer tester
Avatar

Send message
Joined: 16 Jan 06
Posts: 1145
Credit: 3,936,993
RAC: 0
United States
Message 910766 - Posted: 24 Jun 2009, 16:47:35 UTC
Last modified: 24 Jun 2009, 17:14:10 UTC

Got 20 tasks and 4 of them downloaded. Hopefully it's a sign things are starting to clear up.

Edit: so far 9 downloaded. It's slow but happening.
ID: 910766 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30683
Credit: 53,134,872
RAC: 32
United States
Message 910768 - Posted: 24 Jun 2009, 16:49:49 UTC - in response to Message 910728.  

Has anyone else noticed missing posts on the forums. I saw a few threads where posts appear to have disappeared

I've seen a thread that has gone away. It is obvious that something crashed hard right after if came back up from the outage and they have had to pull the backup. Now it looks like it crashed again as there is no I/O on cricket.
I know it will get fixed. Wishing them godspeed.

ID: 910768 · Report as offensive
King Leo
Avatar

Send message
Joined: 14 Apr 07
Posts: 49
Credit: 2,271,270
RAC: 0
United States
Message 910772 - Posted: 24 Jun 2009, 16:58:11 UTC

I cannot connect to the server. Receiving error messages project communication failed. What is happening?

6/24/2009 9:53:32 AM||Starting BOINC client version 6.4.7 for windows_intelx86
6/24/2009 9:53:32 AM||log flags: task, file_xfer, sched_ops
6/24/2009 9:53:32 AM||Libraries: libcurl/7.19.4 OpenSSL/0.9.8j zlib/1.2.3
6/24/2009 9:53:32 AM||Running as a daemon
6/24/2009 9:53:32 AM||Data directory: C:\Documents and Settings\All Users\Application Data\BOINC
6/24/2009 9:53:32 AM||Running under account boinc_master
6/24/2009 9:53:33 AM|SETI@home|Found app_info.xml; using anonymous platform
6/24/2009 9:53:33 AM|SETI@home|[error] No app version for result: windows_intelx86 608 cuda
6/24/2009 9:53:33 AM||Processor: 4 AuthenticAMD AMD Phenom(tm) II X4 940 Processor [x86 Family 16 Model 4 Stepping 2]
6/24/2009 9:53:33 AM||Processor features: fpu tsc pae nx sse sse2 3dnow mmx
6/24/2009 9:53:33 AM||OS: Microsoft Windows XP: Professional x86 Editon, Service Pack 3, (05.01.2600.00)
6/24/2009 9:53:33 AM||Memory: 3.25 GB physical, 5.09 GB virtual
6/24/2009 9:53:33 AM||Disk: 279.47 GB total, 16.39 GB free
6/24/2009 9:53:33 AM||Local time is UTC -7 hours
6/24/2009 9:53:33 AM||Not using a proxy
6/24/2009 9:53:33 AM||CUDA devices found
6/24/2009 9:53:33 AM|SETI@home|URL: http://setiathome.berkeley.edu/; Computer ID: 4732015; location: home; project prefs: home
6/24/2009 9:53:33 AM||General prefs: from SETI@home (last modified 19-Jun-2009 08:14:55)
6/24/2009 9:53:33 AM||Computer location: home
6/24/2009 9:53:33 AM||General prefs: using separate prefs for home
6/24/2009 9:53:33 AM||Preferences limit memory usage when active to 3327.23MB
6/24/2009 9:53:33 AM||Preferences limit memory usage when idle to 3327.23MB
6/24/2009 9:53:33 AM||Preferences limit disk usage to 16.40GB
6/24/2009 9:53:33 AM||Suspending computation - time of day
6/24/2009 9:53:33 AM|SETI@home|Started download of setiathome_6.08_windows_intelx86__cuda.exe
6/24/2009 9:53:33 AM|SETI@home|Started download of cudart.dll
6/24/2009 9:53:55 AM|SETI@home|Temporarily failed download of setiathome_6.08_windows_intelx86__cuda.exe: HTTP error
6/24/2009 9:53:55 AM|SETI@home|Backing off 1 min 0 sec on download of setiathome_6.08_windows_intelx86__cuda.exe
6/24/2009 9:53:55 AM|SETI@home|Temporarily failed download of cudart.dll: HTTP error
6/24/2009 9:53:55 AM|SETI@home|Backing off 1 min 0 sec on download of cudart.dll
6/24/2009 9:53:55 AM|SETI@home|Started download of cufft.dll
6/24/2009 9:53:55 AM|SETI@home|Started download of libfftw3f-3-1-1a_upx.dll
6/24/2009 9:54:17 AM||Project communication failed: attempting access to reference site
6/24/2009 9:54:17 AM|SETI@home|Temporarily failed download of cufft.dll: HTTP error
6/24/2009 9:54:17 AM|SETI@home|Backing off 1 min 0 sec on download of cufft.dll
6/24/2009 9:54:17 AM|SETI@home|Temporarily failed download of libfftw3f-3-1-1a_upx.dll: HTTP error
6/24/2009 9:54:17 AM|SETI@home|Backing off 1 min 0 sec on download of libfftw3f-3-1-1a_upx.dll
6/24/2009 9:54:17 AM|SETI@home|Started download of setiathome-6.08_cuda_AUTHORS
6/24/2009 9:54:17 AM|SETI@home|Started download of setiathome-6.08_cuda_COPYING
6/24/2009 9:54:18 AM||Internet access OK - project servers may be temporarily down.
6/24/2009 9:54:39 AM||Project communication failed: attempting access to reference site
6/24/2009 9:54:39 AM|SETI@home|Temporarily failed download of setiathome-6.08_cuda_AUTHORS: HTTP error
6/24/2009 9:54:39 AM|SETI@home|Backing off 1 min 0 sec on download of setiathome-6.08_cuda_AUTHORS
6/24/2009 9:54:39 AM|SETI@home|Temporarily failed download of setiathome-6.08_cuda_COPYING: HTTP error
6/24/2009 9:54:39 AM|SETI@home|Backing off 1 min 0 sec on download of setiathome-6.08_cuda_COPYING
6/24/2009 9:54:39 AM|SETI@home|Started download of setiathome-6.08_cuda_COPYRIGHT
6/24/2009 9:54:39 AM|SETI@home|Started download of setiathome-6.08_cuda_README
6/24/2009 9:54:40 AM||Internet access OK - project servers may be temporarily down.
6/24/2009 9:55:01 AM||Project communication failed: attempting access to reference site
6/24/2009 9:55:01 AM|SETI@home|Temporarily failed download of setiathome-6.08_cuda_COPYRIGHT: HTTP error
6/24/2009 9:55:01 AM|SETI@home|Backing off 1 min 0 sec on download of setiathome-6.08_cuda_COPYRIGHT
6/24/2009 9:55:01 AM|SETI@home|Temporarily failed download of setiathome-6.08_cuda_README: HTTP error
6/24/2009 9:55:01 AM|SETI@home|Backing off 1 min 0 sec on download of setiathome-6.08_cuda_README
6/24/2009 9:55:01 AM|SETI@home|Started download of seti_608.jpg
6/24/2009 9:55:01 AM|SETI@home|Started download of 06ap09ac.4419.9888.14.8.32
6/24/2009 9:55:02 AM||Internet access OK - project servers may be temporarily down.
ID: 910772 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 910780 - Posted: 24 Jun 2009, 17:14:57 UTC - in response to Message 910763.  


There are two main issues with the back-end reliability:

1) Yes, a lot of the hardware is either hand-me-downs or prototypes being tested by the manufacturer, either case means special and careful attention when issues do arise, usually hardware issues
2) Most of the software is made in-house for a specific purpose, and MySQL is probably being used more intensely than it was ever intended to be used.

For either one of those two reasons, when problems arise, it takes a while to figure out what failed, and then to develop a solution to fix it by means of a band-aid, or in the best-case scenario, keep it from being a problem again. The only problem with fixing problems is that by fixing one problem, you often create 5 new ones.


I thought MySQL might be the problem - is there any chance someone could/would donate a more Enterprise-Oriented DB? SQLServer? Oracle? Other? Any corporate DB users/admins out there who have a feel for what might work better?
ID: 910780 · Report as offensive
Profile Hammeh
Volunteer tester
Avatar

Send message
Joined: 21 May 01
Posts: 135
Credit: 1,143,316
RAC: 0
United Kingdom
Message 910791 - Posted: 24 Jun 2009, 17:49:07 UTC

There seemed to be a few server problems earlier, none of my hosts where able to report tasks all morning.

Just let it play out, it will sort itself out in the end.
ID: 910791 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 910807 - Posted: 24 Jun 2009, 18:52:49 UTC
Last modified: 24 Jun 2009, 19:03:29 UTC





BOINC Database Engine State            #         As of 
Master database queries/second        696*         0m 
Replica seconds behind master        62,807*       0m 

[* seconds]


Hmm.. maybe it have something to do with the DB that some posts/thread are not available?


Also:
wrong size error @ download
no jobs available @ work request


Also the Berkeley crew modded the forum?
Now at the post view more functions [BBCode userfriendly] available..

ID: 910807 · Report as offensive
FiveHamlet
Avatar

Send message
Joined: 5 Oct 99
Posts: 783
Credit: 32,638,578
RAC: 0
United Kingdom
Message 910812 - Posted: 24 Jun 2009, 19:02:27 UTC - in response to Message 910807.  
Last modified: 24 Jun 2009, 19:04:46 UTC

Much more user friendly
ID: 910812 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30683
Credit: 53,134,872
RAC: 32
United States
Message 910818 - Posted: 24 Jun 2009, 19:19:27 UTC

The two stuck from last night before crash are still stuck with a wrong size error
as are the four new workunits it just tried to get with connect() fails
but it did report 2 tasks as done
Wed Jun 24 12:09:55 2009 SETI@home [error] File 06ap09ab.20174.6207.14.8.25 has wrong size: expected 375329, got 0
Wed Jun 24 12:09:55 2009 SETI@home Started download of 06ap09ab.20174.6207.14.8.25
Wed Jun 24 12:10:04 2009 SETI@home [error] File 01mr09ad.18953.3753.16.8.173 has wrong size: expected 375335, got 0
Wed Jun 24 12:10:04 2009 SETI@home Started download of 01mr09ad.18953.3753.16.8.173
Wed Jun 24 12:10:29 2009 SETI@home Sending scheduler request: Requested by user.
Wed Jun 24 12:10:29 2009 SETI@home Reporting 2 completed tasks, requesting new tasks
Wed Jun 24 12:11:11 2009 Project communication failed: attempting access to reference site
Wed Jun 24 12:11:11 2009 SETI@home Temporarily failed download of 06ap09ab.20174.6207.14.8.25: connect() failed
Wed Jun 24 12:11:11 2009 SETI@home Backing off 10 min 35 sec on download of 06ap09ab.20174.6207.14.8.25
Wed Jun 24 12:11:12 2009 Internet access OK - project servers may be temporarily down.
Wed Jun 24 12:11:14 2009 SETI@home Scheduler request completed: got 4 new tasks
Wed Jun 24 12:11:16 2009 SETI@home Started download of 01mr09af.7406.18477.15.8.152
Wed Jun 24 12:12:31 2009 Project communication failed: attempting access to reference site
Wed Jun 24 12:12:31 2009 SETI@home Temporarily failed download of 01mr09af.7406.18477.15.8.152: connect() failed
Wed Jun 24 12:12:31 2009 SETI@home Backing off 1 min 0 sec on download of 01mr09af.7406.18477.15.8.152
Wed Jun 24 12:12:31 2009 SETI@home Started download of 01mr09af.7406.18477.15.8.113
Wed Jun 24 12:12:32 2009 Internet access OK - project servers may be temporarily down.
Wed Jun 24 12:13:46 2009 Project communication failed: attempting access to reference site
Wed Jun 24 12:13:46 2009 SETI@home Temporarily failed download of 01mr09af.7406.18477.15.8.113: connect() failed
Wed Jun 24 12:13:46 2009 SETI@home Backing off 1 min 0 sec on download of 01mr09af.7406.18477.15.8.113
Wed Jun 24 12:13:46 2009 SETI@home Started download of 01mr09af.7406.18477.15.8.161
Wed Jun 24 12:13:47 2009 Internet access OK - project servers may be temporarily down.
Wed Jun 24 12:15:02 2009 Project communication failed: attempting access to reference site
Wed Jun 24 12:15:02 2009 SETI@home Temporarily failed download of 01mr09af.7406.18477.15.8.161: connect() failed
Wed Jun 24 12:15:02 2009 SETI@home Backing off 1 min 0 sec on download of 01mr09af.7406.18477.15.8.161
Wed Jun 24 12:15:02 2009 SETI@home Started download of 01mr09af.7406.18477.15.8.158
Wed Jun 24 12:15:03 2009 Internet access OK - project servers may be temporarily down.
Wed Jun 24 12:15:14 2009 Project communication failed: attempting access to reference site
Wed Jun 24 12:15:14 2009 SETI@home Temporarily failed download of 01mr09ad.18953.3753.16.8.173: HTTP error
Wed Jun 24 12:15:14 2009 SETI@home Backing off 1 hr 3 min 13 sec on download of 01mr09ad.18953.3753.16.8.173
Wed Jun 24 12:15:14 2009 SETI@home [error] File 01mr09af.7406.18477.15.8.152 has wrong size: expected 375331, got 0
Wed Jun 24 12:15:14 2009 SETI@home Started download of 01mr09af.7406.18477.15.8.152
Wed Jun 24 12:15:15 2009 Internet access OK - project servers may be temporarily down.
Wed Jun 24 12:16:17 2009 Project communication failed: attempting access to reference site
Wed Jun 24 12:16:17 2009 SETI@home Temporarily failed download of 01mr09af.7406.18477.15.8.158: connect() failed
Wed Jun 24 12:16:17 2009 SETI@home Backing off 1 min 0 sec on download of 01mr09af.7406.18477.15.8.158
Wed Jun 24 12:16:18 2009 Internet access OK - project servers may be temporarily down.
Wed Jun 24 12:16:18 2009 SETI@home [error] File 01mr09af.7406.18477.15.8.113 has wrong size: expected 375335, got 0
Wed Jun 24 12:16:18 2009 SETI@home Started download of 01mr09af.7406.18477.15.8.113

ID: 910818 · Report as offensive
Kevin Benfield

Send message
Joined: 29 Dec 03
Posts: 39
Credit: 30,085,439
RAC: 0
United Kingdom
Message 910831 - Posted: 24 Jun 2009, 19:47:52 UTC

does anyone know when the current issues are likely to be fixed, i am out of work unts for the cpu, it is trying to download about 11 units but they fail to download, even if they did wouldonly take a few hours to process, as the CPU does 8 at once.

I have lots of cuda units to processor which is odd as i only have single GPU
ID: 910831 · Report as offensive
Kevin Benfield

Send message
Joined: 29 Dec 03
Posts: 39
Credit: 30,085,439
RAC: 0
United Kingdom
Message 910832 - Posted: 24 Jun 2009, 19:48:08 UTC

does anyone know when the current issues are likely to be fixed, i am out of work unts for the cpu, it is trying to download about 11 units but they fail to download, even if they did wouldonly take a few hours to process, as the CPU does 8 at once.

I have lots of cuda units to processor which is odd as i only have single GPU
ID: 910832 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 910834 - Posted: 24 Jun 2009, 19:49:02 UTC - in response to Message 910812.  

Much more user friendly



How is something that doesn't seem to work a large percentage of the time (recently, anyway) user friendly in any way, shape or form????
ID: 910834 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 910835 - Posted: 24 Jun 2009, 19:53:05 UTC

FYI: I just started up my second 8-core (which I had shut down before because of no work) and I "got" about 50 WUs. Unfortunately, they are not downloading to my machine; they show up in both the Transfers tab and the Tasks tab.

How can I shut down my machine now and put them back in the system queues so someone else can waste money (KWH) waiting for the data to move?
ID: 910835 · Report as offensive
Rob.B

Send message
Joined: 23 Jul 99
Posts: 157
Credit: 1,439,682
RAC: 0
United Kingdom
Message 910839 - Posted: 24 Jun 2009, 20:03:08 UTC
Last modified: 24 Jun 2009, 20:04:26 UTC

I'm getting this on my servers, single, dual and quad's.

24/06/2009 21:01:05 SETI@home Scheduler request completed: got 0 new tasks
24/06/2009 21:01:05 SETI@home Message from server: No work sent
24/06/2009 21:01:05 SETI@home Message from server: No work is available for SETI@home Enhanced
24/06/2009 21:01:05 SETI@home Message from server: (reached daily quota of 10 results)
24/06/2009 21:01:05 SETI@home Message from server: (Project has no jobs available)


Any ideas? (The 10 results bit in particular, the only errors have neen killed VLAR's.)
ID: 910839 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 910840 - Posted: 24 Jun 2009, 20:11:27 UTC - in response to Message 910839.  


Any ideas? (The 10 results bit in particular, the only errors have neen killed VLAR's.)

The key word here is "errors" -- each time you have an error, your quota is reduced by 1.

If you've intentionally "errored" 90 VLARs, then your daily quota is 10. Return one valid work unit, and it'll double to 20, etc.
ID: 910840 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 910841 - Posted: 24 Jun 2009, 20:13:15 UTC - in response to Message 910834.  
Last modified: 24 Jun 2009, 20:17:20 UTC

Much more user friendly



How is something that doesn't seem to work a large percentage of the time (recently, anyway) user friendly in any way, shape or form????

Follow the thread to the previous message. He's referring to the buttons to automatically add BBCODE tags.

If you aren't referring to the buttons that add BBCODE tags, then you're responding to the wrong topic.

Edit:

I've said this many times, and this is a good time to repeat it.

If you are referring to the servers being out of work, that isn't meant to be "user friendly" -- those transactions are between the BOINC client and the BOINC servers.

SETI strongly suggests that you crunch for more than one project, so that during lean times here you will not run out.

Alternately, you can shut down for a day. The work assigned (but not yet downloaded) will be here tomorrow -- or sooner.

... but I'd suggest another project, or a bigger cache, or both.
ID: 910841 · Report as offensive
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · Next

Message boards : Number crunching : Panic Mode On (17) Server problems


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.