Gasping for Air (May 14 2007)

Message boards : Technical News : Gasping for Air (May 14 2007)
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 11562
Credit: 170,591,731
RAC: 102,381
Australia
Message 567794 - Posted: 15 May 2007, 9:07:38 UTC - in response to Message 567785.  

About the spikes on the graph:

  • Matt mentions that yesterday was cutting some wires on the rack: The deadline was 'cause the servers was unplugged from the net
  • Bruno chocked: No data was transferred to Boinc clients
  • Bruno is on again, but overloaded: Boinc clients will take the work slowly
  • 'Cause this overload, bruno can appear as disabled on the status page


Dropping connections is normal whne coming out of an outage, especially a long outage. But the jaggedness of the graph shows there is more than just dropping conections. There are periods of tinme where there are no connections; which is unusal.
Hopefully the re-boot during the usual outage will sort it out.
Grant
Darwin NT
ID: 567794 · Report as offensive
EvoDude

Send message
Joined: 5 May 07
Posts: 1
Credit: 11,465
RAC: 0
United Kingdom
Message 567799 - Posted: 15 May 2007, 9:12:37 UTC - in response to Message 567778.  

i think it is getting better

How is throughput going down "getting better"?


Come to that, how is 'still not a single work unit received' getting better?
ID: 567799 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 11562
Credit: 170,591,731
RAC: 102,381
Australia
Message 567804 - Posted: 15 May 2007, 9:21:10 UTC - in response to Message 567799.  

i think it is getting better

How is throughput going down "getting better"?


Come to that, how is 'still not a single work unit received' getting better?

The lower the load the less likely it is to drop connections & the more likely it is to be able to complete an upload or download.
Grant
Darwin NT
ID: 567804 · Report as offensive
TarracoServer
Volunteer tester

Send message
Joined: 11 Apr 07
Posts: 38
Credit: 595,022
RAC: 0
Spain
Message 567806 - Posted: 15 May 2007, 9:24:02 UTC - in response to Message 567794.  
Last modified: 15 May 2007, 9:33:44 UTC

Dropping connections is normal whne coming out of an outage, especially a long outage. But the jaggedness of the graph shows there is more than just dropping conections. There are periods of tinme where there are no connections; which is unusal.
Hopefully the re-boot during the usual outage will sort it out.


Yeah, but cricket perform a "net use" stat, reading the ammount of info that pases through the main router/switch of the SETI@Home LAN. If the servers are unplugged or the MAC address isn't "reachable", it's possible that the DNS won't send any info to those machines (so, outagge).

Probably, the noise that can observe on cricket graph, will be the use of the internal LAN by the different computers on the lab (and inner comunications with the servers too) and/or redirected packages from outside the LAN.

Anyway, it's different a persistent connection (as all the up/domwload process) than only an "ask" if I can connect.
ID: 567806 · Report as offensive
Profile Florian Robardet (Nairolf)

Send message
Joined: 17 Dec 99
Posts: 2
Credit: 283,639
RAC: 0
France
Message 567811 - Posted: 15 May 2007, 9:30:16 UTC - in response to Message 567806.  

Hello...
I might not be posting in the right thread, but I have completed to work units and cannot upload any. I have the same problem than the guy who posted a long part of log. Is the upload server down ? Thank you for your help :)
ID: 567811 · Report as offensive
Conrad Human
Volunteer tester

Send message
Joined: 17 Nov 00
Posts: 67
Credit: 2,009,224
RAC: 0
South Africa
Message 567813 - Posted: 15 May 2007, 9:36:49 UTC

lets rather not fight about it in this tread it can just end in a flaming war that i mistakenly starhed



Guys please be patient i myself had not yet receaved a single workunit this morning had had got 10 reported last night

the better way would have been is to throtle bruno's # of conections to a number it can handle safely

Lets wait until after todays outage and see what wil hapen
ID: 567813 · Report as offensive
TarracoServer
Volunteer tester

Send message
Joined: 11 Apr 07
Posts: 38
Credit: 595,022
RAC: 0
Spain
Message 567817 - Posted: 15 May 2007, 9:44:25 UTC - in response to Message 567813.  

lets rather not fight about it in this tread it can just end in a flaming war that i mistakenly starhed



Guys please be patient i myself had not yet receaved a single workunit this morning had had got 10 reported last night

the better way would have been is to throtle bruno's # of conections to a number it can handle safely

Lets wait until after todays outage and see what wil hapen


I agree, but the weekly outage could not be sufficient. To many request.
The main thing to do is to be patient ;)
ID: 567817 · Report as offensive
Profile Pooh Bear 27
Volunteer tester
Avatar

Send message
Joined: 14 Jul 03
Posts: 3222
Credit: 4,603,826
RAC: 0
United States
Message 567819 - Posted: 15 May 2007, 9:45:35 UTC

I think one IMPORTANT LINE in Matt's post is:

The server situation will be in major flux, and generally in a positive direction, over the next week or so.


ID: 567819 · Report as offensive
kittyman Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 50378
Credit: 983,759,333
RAC: 31,727
United States
Message 567822 - Posted: 15 May 2007, 9:48:17 UTC - in response to Message 567819.  

I think one IMPORTANT LINE in Matt's post is:

The server situation will be in major flux, and generally in a positive direction, over the next week or so.



Don't mean to be a bitch, but the kitties and I are looking for the 'positive direction'.
"Learn from yesterday. Live for today. Hope for tomorrow." Albert Einstein
"With cats." kittyman

ID: 567822 · Report as offensive
TarracoServer
Volunteer tester

Send message
Joined: 11 Apr 07
Posts: 38
Credit: 595,022
RAC: 0
Spain
Message 567827 - Posted: 15 May 2007, 9:59:04 UTC - in response to Message 567822.  

I think one IMPORTANT LINE in Matt's post is:

The server situation will be in major flux, and generally in a positive direction, over the next week or so.



Don't mean to be a bitch, but the kitties and I are looking for the 'positive direction'.


ji,ji,ji. Yes, but right now, we must wait.

Remember all those guys made to keep the boinc clients on, and all those results must be integred into thumper's database, create new WU (done, I know), clean up the system, reboot, and all the physical work (unscrew, screw, push, move, bla, bla, bla)

Many work done, but as everything, now is the time to tune it on.
On small systems it can be a crazy work (windows blue window, reformatting HD, ... all of us knows this). So imagine on a server, with 64 HD, various CPUs, a huge quantity of wires (SCSI, Net, power, ATA/SATA if any, memory banks, ...). It isn't like plug'n'play!

That's why all the system will be on-line, running perfectly, on several days (100 % running and tested->Several WEEKS)
ID: 567827 · Report as offensive
dirk.b

Send message
Joined: 11 Sep 06
Posts: 6
Credit: 91,097
RAC: 0
Belgium
Message 567831 - Posted: 15 May 2007, 10:08:08 UTC - in response to Message 567413.  



hey we are patience but do we still have to wait ore is it with me that it still dosent work
downloading of the work no prob but thats it and then the computer waits for sommething but for what ore shut i reinstall boinc
ore is this still normal greetings dirk
ID: 567831 · Report as offensive
Andy Rozman

Send message
Joined: 18 May 99
Posts: 5
Credit: 483,539
RAC: 0
Ireland
Message 567841 - Posted: 15 May 2007, 10:24:02 UTC

Hi !

I am also having problems with connection to seti@home, but I am not sure where is the problem. I instaled BOINC 64bit manager and this problem could be connected to that.

5/15/2007 12:19:09 PM|SETI@home|Requesting 30240 seconds of new work
5/15/2007 12:19:19 PM|SETI@home|Scheduler RPC succeeded [server version 509]
5/15/2007 12:19:19 PM|SETI@home|Message from server: platform 'windows_x86_64' not found
5/15/2007 12:19:19 PM|SETI@home|Deferring communication for 1 days 0 hr 0 min 0 sec
5/15/2007 12:19:19 PM|SETI@home|Reason: requested by project

If someone can comment that I would be very thankful... I have Windows XP 64-bit, and since boinc didn't work because of seti problems I thought that maybe problem was on my side so I downloaded new version (64 bit this time)...

Andy
ID: 567841 · Report as offensive
Profile toffuuu
Volunteer tester

Send message
Joined: 30 Mar 00
Posts: 87
Credit: 1,871,193
RAC: 46
United States
Message 567854 - Posted: 15 May 2007, 10:43:18 UTC - in response to Message 567841.  

is anyone here having trouble with uploading done work units?
cause i keep getting this sort of error now for the past at least 2 days now,
2007-05-15 05:42:22 [SETI@home] [file_xfer] Temporarily failed upload of 18fe05aa.26345.4592.103400.3.111_2_0: http error
can anyone shed some light on this...?
ID: 567854 · Report as offensive
HachPi
Avatar

Send message
Joined: 2 Aug 99
Posts: 481
Credit: 14,294,995
RAC: 14,323
Belgium
Message 567871 - Posted: 15 May 2007, 11:20:56 UTC - in response to Message 567854.  

is anyone here having trouble with uploading done work units?
cause i keep getting this sort of error now for the past at least 2 days now,
2007-05-15 05:42:22 [SETI@home] [file_xfer] Temporarily failed upload of 18fe05aa.26345.4592.103400.3.111_2_0: http error
can anyone shed some light on this...?


Same problem over here NO uploads, NO downloads...
Suggestion : more patience

Greetz from Belgium, Europ
HP

ID: 567871 · Report as offensive
Profile Andy

Send message
Joined: 7 Feb 06
Posts: 12
Credit: 12,042,162
RAC: 3,689
United States
Message 567873 - Posted: 15 May 2007, 11:26:10 UTC - in response to Message 567413.  

What a weekend. As noted by the others they successfully got the replacement science database server from Sun and brought it to the lab Friday afternoon. As we hoped it was basically plug n' play after putting the old thumper's drives in it. After some file system syncing and data checking Eric started the splitters on Saturday. All was well until bruno's httpd processes choked (more on that below). So we were not sending work for a whole day until Jeff kicked bruno this morning. The bright side is this allowed the splitters to create a whole pile of work in the meantime which we are sending out right now as fast as we can. The main bottleneck is NFS on the workunit file server which is (and always has been) choking at around 60 Mbps. It'll take a while for things to catch up.

We officially retired both koloth and kryten as of today - both are powered down, and in the case of koloth completely removed from the closet to make way for thumper, sidious, and then some. With the closet as empty as it has been in a long time I finally removed dozens of unused SCSI/ethernet/terminal/power cables that came with the rack, all tucked in various corners and secured with cable ties. The process of cutting the tightly wound ties in sharp metal cages left me with four bleeding wounds on my hands - nothing bad, only two required band aids - but I've wanted to get that particular clutter out of that rack for years.

With koloth and kryten gone bruno has been taking up most of the slack. I noticed last week it gets into these periods of malaise where httpd just stops working. I think this may be buggy restart logic when we rotate web logs, but it's a little weirder than that. Adding insult to injury one of its internal drives just up and died today. Luckily it was a RAID spare so nothing was harmed, and we had replacement drives already donated to us a while back. Eric replaced the drive, but we may need to reboot to fully pick it up. Probably during the usual outage tomorrow. Bruno is dropping lots of packets right now, resulting in all kinds of upload/download snags and showing up as "disabled" on the server status page. This should clear up over time.

The server situation will be in major flux, and generally in a positive direction, over the next week or so. I'll be trying to keep updating the server status page, but I make no guarantees about its accuracy.

Thanks again for your patience during the past couple of weeks. While I appreciate the kind words and sentiments I should point out that this past weekend for me wasn't exactly restful time off. I was working at
my other job.

- Matt

Now i know what SUPERMAN does for giggles!
ID: 567873 · Report as offensive
Profile Ragnarock_83835
Avatar

Send message
Joined: 18 Feb 06
Posts: 16
Credit: 1,053,610
RAC: 0
United States
Message 567877 - Posted: 15 May 2007, 11:35:03 UTC
Last modified: 15 May 2007, 11:35:59 UTC

I have been able to download at a very slow rate, it seems if one of my four computers talks to bruno at just the right time, im getting about one successfull download every 1-2 hours.....only a few uploads worked last night......
ID: 567877 · Report as offensive
Floyd A. Wright

Send message
Joined: 30 Aug 99
Posts: 1
Credit: 1,812,634
RAC: 0
Message 567884 - Posted: 15 May 2007, 11:48:50 UTC

All I Get IS HTTP Errors when Uploading.
ID: 567884 · Report as offensive
lee clissett

Send message
Joined: 12 Jun 00
Posts: 46
Credit: 2,647,496
RAC: 0
United Kingdom
Message 567898 - Posted: 15 May 2007, 12:32:20 UTC

found this file on one of my systems anybody no what it is its not on my other three boinc.gorlaeus.net
ID: 567898 · Report as offensive
Dbm

Send message
Joined: 9 Apr 07
Posts: 1
Credit: 232,642
RAC: 0
Netherlands
Message 567903 - Posted: 15 May 2007, 12:39:36 UTC

As of 5 min ago everything is working again! (atleast for me ;) )
ID: 567903 · Report as offensive
Modesto
Volunteer tester

Send message
Joined: 4 Jul 04
Posts: 47
Credit: 321,752
RAC: 0
Canada
Message 567909 - Posted: 15 May 2007, 12:54:18 UTC - in response to Message 567898.  
Last modified: 15 May 2007, 12:54:48 UTC

found this file on one of my systems anybody no what it is its not on my other three boinc.gorlaeus.net


Well... it is the Leiden Classical url (see HERE)... so maybe only that system is or has been attached to that project?
ID: 567909 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : Technical News : Gasping for Air (May 14 2007)


 
©2019 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.