Can't talk.. Debugging.. (May 15 2007)

Message boards : Technical News : Can't talk.. Debugging.. (May 15 2007)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 12 · Next

AuthorMessage
DFilipowski

Send message
Joined: 4 Mar 07
Posts: 2
Credit: 2,607
RAC: 0
Message 568807 - Posted: 16 May 2007, 18:48:06 UTC
Last modified: 16 May 2007, 18:48:44 UTC

I used to run the older version of SETI. I loved it. Then it changed to this so-called Boinc. It plugged-up my computer and didn't work. I had done thousands of packets. I asked for help - never got any and my credit was lost. So finally I came back - and this is what I find: Pop-ups telling me to reset my connection. Can't connect. Can't bla bla bla.

I have a life outside of looking for Little Green Men. This is annoying that life.

I quit.
ID: 568807 · Report as offensive
Profile Dr.Okun_@_SETI.USA
Volunteer tester

Send message
Joined: 15 Dec 06
Posts: 7
Credit: 149,128
RAC: 0
United States
Message 568808 - Posted: 16 May 2007, 18:49:37 UTC

Thinking the best move is to totally block internet regions while others are allowed to upload to the project. Do a round robin of blockages until the project is caught up with the demand.
ID: 568808 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 568812 - Posted: 16 May 2007, 18:50:59 UTC - in response to Message 568807.  

I used to run the older version of SETI. I loved it. Then it changed to this so-called Boinc. It plugged-up my computer and didn't work. I had done thousands of packets. I asked for help - never got any and my credit was lost. So finally I came back - and this is what I find: Pop-ups telling me to reset my connection. Can't connect. Can't bla bla bla.

I have a life outside of looking for Little Green Men. This is annoying that life.

I quit.


OK, thanks for trying. After all, this is a science experiment, and things will go wrong from time to time. If it's too much to handle, then the experiment isn't for you. It's nothing to get OCD over.
ID: 568812 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 568813 - Posted: 16 May 2007, 18:51:43 UTC - in response to Message 568803.  

Hi all.
I'm not a net Guru and I cannot figure out what a hell is coming on.
This is what I noticed my side:
There's a size mismatch in every each uploaded result that's actually preventing successful uploading (in my case mostly around 0.72 KB).
Ciao.


I noticed that, it is a symptom, not the cause.


Symptom of What?


Communications Errors, perhaps in this case, perhaps not, Usually Transmitting more data than the source's size implies resends of packets.

"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 568813 · Report as offensive
Oddbjornik Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 220
Credit: 349,610,548
RAC: 1,728
Norway
Message 568826 - Posted: 16 May 2007, 19:05:52 UTC

I've seen the same symptoms for a while: The result gets uploaded, with approximately 0,72kb extra. Then, *after* having used precious bandwith on your side, it fails.

This means the network traffic graph may not be a good indicator that things are moving in the right direction.
ID: 568826 · Report as offensive
Profile Teratoma [SETI.USA]
Avatar

Send message
Joined: 30 Mar 00
Posts: 16
Credit: 2,200,914
RAC: 0
United States
Message 568834 - Posted: 16 May 2007, 19:09:46 UTC
Last modified: 16 May 2007, 19:10:26 UTC

Correct me if I am wrong (it wouldn't be the first time and I know I don't need to ask to be corrected here ;))...Upload issues started after Bruno was moved to the closet and I am assuming was connected to the new Gig Switch there. Maybe they should patch it back into the switch that it was using before. It may not be as fast but it works! The issue may be due to mismatched network configs. My .02 <sigh>

..
ID: 568834 · Report as offensive
Profile Pilot
Avatar

Send message
Joined: 18 May 99
Posts: 534
Credit: 5,475,482
RAC: 0
Message 568836 - Posted: 16 May 2007, 19:12:04 UTC - in response to Message 568790.  

A month ago I started up again after a 5 year absence and am having a problem with downloading work. It was doing fine before the server went down but now, even after my computer being on for 12 hours, there are no new projects. This is what is says:

Requesting 17280 seconds of new work
Scheduler RPC succeeded [server version 509]
Deferring communications for 11 sec
Reason: requested by project
Deffering communication for 28 min 22 sec
Reason: no work from project

Can anybody help?

In a word, no. I'm pretty sure that the project has deliberately turned off the supply of new work, to let the queues of work already allocated and crunched download and upload respectively.

Following Eric's "Addendumb" moment (wonderful what a good night's sleep can do!) those queues are now draining much more quickly: depending how things go, the staff may, or may not, start issuing new WUs before they finish work today.

LOL kinda like draining the swamp before going out to kill all the Allegators
When we finally figure it all out, all the rules will change and we can start all over again.
ID: 568836 · Report as offensive
Profile Jim Franklin

Send message
Joined: 3 Apr 99
Posts: 108
Credit: 10,843,395
RAC: 39
United Kingdom
Message 568845 - Posted: 16 May 2007, 19:24:23 UTC

Thanks for the update Matt, I know you and the team are working extremely hard on the current issues. I have found that none of my machines are able to connect to the servers at the moment, and they are slowly running out of work, the good thing is most are dual boot and they have Work units to process in the backup system, so I should be good for a few more days, but if we cannot connect by Monday then I'll be shutting machines down...Not a pleasent prospect.
ID: 568845 · Report as offensive
Profile William T. Guiher

Send message
Joined: 15 Feb 01
Posts: 7
Credit: 46,768,864
RAC: 30
United States
Message 568854 - Posted: 16 May 2007, 19:30:43 UTC

Matt, I can't seem to find a good place to report what I think is an error of some unknown description. Maybe I'm worrying unnecessarily. While running 18fe05aa.26345.25584.884652.3.33_3 an undefined computational error ocurred. Do I need to report this to someone somewhere, or just let the system take care of it -- quit worrying and keep computing?

ID: 568854 · Report as offensive
Profile Paul Hayslett Project Donor
Avatar

Send message
Joined: 3 Aug 00
Posts: 15
Credit: 14,207,862
RAC: 0
United States
Message 568858 - Posted: 16 May 2007, 19:36:18 UTC

I can see progress! I had about 60 uploads pending this morning. Now down to 45. WUs have been trickling in. It seems like bruno is slowly catching up. Eric really rang the bell with that config change.
ID: 568858 · Report as offensive
B. Davies

Send message
Joined: 6 Oct 02
Posts: 2
Credit: 8,716,917
RAC: 0
United States
Message 568864 - Posted: 16 May 2007, 19:48:52 UTC

just uploaded 2 WUs
ID: 568864 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 568866 - Posted: 16 May 2007, 19:56:10 UTC - in response to Message 568808.  

Thinking the best move is to totally block internet regions while others are allowed to upload to the project. Do a round robin of blockages until the project is caught up with the demand.

This has occurred to me as well: using the first octet gives about 180 active blocks, and by allowing a few blocks at a time you dramatically reduce the incoming connections.

Slowing the number of connections increases the number of successful connections, each success reduces the number of retries, and before you know it you can turn the rest of the world "on" and not worry about it.
ID: 568866 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 568868 - Posted: 16 May 2007, 19:58:15 UTC - in response to Message 568834.  

Correct me if I am wrong (it wouldn't be the first time and I know I don't need to ask to be corrected here ;))...Upload issues started after Bruno was moved to the closet and I am assuming was connected to the new Gig Switch there. Maybe they should patch it back into the switch that it was using before. It may not be as fast but it works! The issue may be due to mismatched network configs. My .02 <sigh>

Bruno was moved to the closet at the same time the project restarted, so it's hard to say if it was caused by the restart or the move.

... but if I was going to bet, I'd say "restart."
ID: 568868 · Report as offensive
Profile Sterling_Aug
Avatar

Send message
Joined: 27 Sep 02
Posts: 54
Credit: 14,105,725
RAC: 0
United States
Message 568869 - Posted: 16 May 2007, 19:58:26 UTC

I just uploaded 9 WUs, and about 15 WUs so far today.
ID: 568869 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 568874 - Posted: 16 May 2007, 20:06:46 UTC - in response to Message 568866.  


Slowing the number of connections increases the number of successful connections, each success reduces the number of retries, and before you know it you can turn the rest of the world "on" and not worry about it.


Nah... ;-)
ID: 568874 · Report as offensive
HachPi
Avatar

Send message
Joined: 2 Aug 99
Posts: 481
Credit: 21,807,425
RAC: 21
Belgium
Message 568875 - Posted: 16 May 2007, 20:10:44 UTC
Last modified: 16 May 2007, 20:17:13 UTC

Now I get this??? Any ideas ??


5/16/2007 10:02:08 PM|SETI@home|Scheduler request failed: HTTP internal server error
5/16/2007 10:02:08 PM|SETI@home|Deferring communication for 2 min 6 sec
5/16/2007 10:02:08 PM|SETI@home|Reason: scheduler request failed
5/16/2007 10:02:12 PM|SETI@home|[file_xfer] Started upload of file 16fe05ab.10775.2368.859632.3.35_0_0
5/16/2007 10:03:01 PM|SETI@home|[file_xfer] Finished upload of file 16fe05ab.10775.2368.859632.3.35_0_0
5/16/2007 10:03:01 PM|SETI@home|[file_xfer] Throughput 605 bytes/sec
5/16/2007 10:03:10 PM|SETI@home|Sending scheduler request: Requested by user
5/16/2007 10:03:10 PM|SETI@home|Requesting 726696 seconds of new work, and reporting 15 completed tasks
5/16/2007 10:04:21 PM|SETI@home|Scheduler request failed: HTTP internal server error
5/16/2007 10:04:21 PM|SETI@home|Deferring communication for 4 min 7 sec
5/16/2007 10:04:21 PM|SETI@home|Reason: scheduler request failed
5/16/2007 10:04:26 PM|SETI@home|Sending scheduler request: Requested by user
5/16/2007 10:04:26 PM|SETI@home|Requesting 726832 seconds of new work, and reporting 15 completed tasks
5/16/2007 10:04:57 PM|SETI@home|Scheduler RPC succeeded [server version 509]
5/16/2007 10:04:57 PM|SETI@home|Message from server: Completed result 16fe05ab.10775.2368.859632.3.29_0 refused: result already reported as success
5/16/2007 10:04:57 PM|SETI@home|Message from server: Completed result 18fe05aa.26345.3009.617326.3.196_0 refused: result already reported as success
5/16/2007 10:04:57 PM|SETI@home|Message from server: Completed result 18fe05aa.26345.3009.617326.3.200_0 refused: result already reported as success
5/16/2007 10:04:57 PM|SETI@home|Message from server: Completed result 18fe05aa.26345.3009.617326.3.195_0 refused: result already reported as success
5/16/2007 10:04:57 PM|SETI@home|Message from server: Completed result 18fe05aa.26345.3009.617326.3.199_0 refused: result already reported as success
5/16/2007 10:04:57 PM|SETI@home|Message from server: Completed result 18fe05aa.26345.3009.617326.3.203_0 refused: result already reported as success
5/16/2007 10:04:57 PM|SETI@home|Message from server: Completed result 16fe05ab.10775.2368.859632.3.38_0 refused: result already reported as success
5/16/2007 10:04:57 PM|SETI@home|Message from server: Completed result 16fe05ab.10775.2368.859632.3.36_1 refused: result already reported as success
5/16/2007 10:04:57 PM|SETI@home|Message from server: Completed result 18fe05aa.26345.5745.467332.3.227_0 refused: result already reported as success
5/16/2007 10:04:57 PM|SETI@home|Message from server: Completed result 11fe05aa.24379.10048.978420.3.138_1 refused: result already reported as success
5/16/2007 10:04:57 PM|SETI@home|Deferring communication for 11 sec
5/16/2007 10:04:57 PM|SETI@home|Reason: requested by project
5/16/2007 10:04:59 PM|SETI@home|[file_xfer] Started download of file 15mr05aa.23822.7713.336072.3.142
5/16/2007 10:04:59 PM|SETI@home|[file_xfer] Started download of file 15mr05aa.23822.7713.336072.3.153
5/16/2007 10:05:08 PM|SETI@home|Sending scheduler request: To fetch work
5/16/2007 10:05:08 PM|SETI@home|Requesting 48656 seconds of new work
5/16/2007 10:05:11 PM|SETI@home|[file_xfer] Finished download of file 15mr05aa.23822.7713.336072.3.142
5/16/2007 10:05:11 PM|SETI@home|[file_xfer] Throughput 34351 bytes/sec
5/16/2007 10:05:11 PM|SETI@home|[file_xfer] Started download of file 15mr05aa.23822.7713.336072.3.157
5/16/2007 10:05:14 PM|SETI@home|Scheduler RPC succeeded [server version 509]
5/16/2007 10:05:14 PM|SETI@home|Deferring communication for 11 sec
5/16/2007 10:05:14 PM|SETI@home|Reason: requested by project
5/16/2007 10:05:32 PM|SETI@home|[file_xfer] Finished download of file 15mr05aa.23822.7713.336072.3.157
5/16/2007 10:05:32 PM|SETI@home|[file_xfer] Throughput 18774 bytes/sec
5/16/2007 10:05:32 PM|SETI@home|[file_xfer] Started download of file 14fe05aa.4646.18018.54842.3.241
5/16/2007 10:07:16 PM|SETI@home|[file_xfer] Finished download of file 14fe05aa.4646.18018.54842.3.241
5/16/2007 10:07:16 PM|SETI@home|[file_xfer] Throughput 3519 bytes/sec
5/16/2007 10:07:16 PM|SETI@home|[file_xfer] Started download of file 14fe05aa.4646.18018.54842.3.243

Those were new units sent on saturday during the short up period and couple of minutes ago uploaded. They were new for my machine and not accepted...for any reason whatever???

Greetz, HP ;-))
ID: 568875 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 568879 - Posted: 16 May 2007, 20:20:55 UTC - in response to Message 568874.  


Slowing the number of connections increases the number of successful connections, each success reduces the number of retries, and before you know it you can turn the rest of the world "on" and not worry about it.


Nah... ;-)

And you're basing this on what, Brian?

I think a big part of what we're seeing is too many connections. Bruno gets a TCP SYN and sends the SYN-ACK, but there are so many other packets that the final ACK never comes through.

... or, the TCP SYN comes in, and the server can't open another handle.

... or, the connection comes up (SYN/SYN-ACK/ACK) but the server is too busy.

It's called the c10k problem.

There are web servers like lighttpd that are designed to handle large number of simultaneous connections, by pulling a whole bunch of tricks like keeping things in RAM, building date strings once a second instead of once per connection, etc.

IMHO, I think there should be code in the BOINC client to slow the connection rate. Throttle clients so they don't exceed Bruno's comfortable optimal speed, and we'd be back a lot quicker.

-- Ned
ID: 568879 · Report as offensive
zombie67 [MM]
Volunteer tester
Avatar

Send message
Joined: 22 Apr 04
Posts: 758
Credit: 27,771,894
RAC: 0
United States
Message 568886 - Posted: 16 May 2007, 20:29:32 UTC

Finally able to upload and report ~90 WUs over the course of the morning. Very good.

Now I'm getting "no work" when requesting new work. According to the server status page, there are over 500k ready to download.
Dublin, California
Team: SETI.USA
ID: 568886 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19060
Credit: 40,757,560
RAC: 67
United Kingdom
Message 568899 - Posted: 16 May 2007, 20:37:53 UTC

So, what did you do there at Berkeley, suddenly all my uploads disappeared.

But on request/report, I get HTTP internal server error.

Andy
ID: 568899 · Report as offensive
Profile Fuzzy Hollynoodles
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 9659
Credit: 251,998
RAC: 0
Message 568901 - Posted: 16 May 2007, 20:40:17 UTC - in response to Message 568899.  

So, what did you do there at Berkeley, suddenly all my uploads disappeared.

But on request/report, I get HTTP internal server error.

Andy


Yes, now I've been able to upload but can't report, but baby steps, baby steps...

So we're approaching slowly. :-)



"I'm trying to maintain a shred of dignity in this world." - Me

ID: 568901 · Report as offensive
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 12 · Next

Message boards : Technical News : Can't talk.. Debugging.. (May 15 2007)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.