Message boards :
Number crunching :
Panic Mode On (20) Server problems
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 15 · Next
Author | Message |
---|---|
![]() ![]() Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0 ![]() |
Same story as last week. AP becomes available, bandwidth goes to maximum and my uploads and downloads are borked again. When AP work runs out then perhaps one day later I can finally get through. By then all the AP work is gone. Yea...........this is working really well! Boinc....Boinc....Boinc....Boinc.... |
![]() ![]() Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0 ![]() |
I normally carry a 4 day cache. I have set all my computers to No New Work. Next Monday (July 20) I am shutting down my computers even if they still have completed work to upload. Then, color me gone! [edit]On Vacation to Utah!! Boinc....Boinc....Boinc....Boinc.... |
![]() ![]() Send message Joined: 9 Jan 00 Posts: 2562 Credit: 12,301,681 RAC: 0 ![]() |
I have looked at everything I can. I have a bit of information coming from Seti Staff. Looking that MB, AP and Cuda used to "all share" the same Upload/Download link and recover there is something else wrong. In a email that was sent to Seti Staff. At a point in time the 100Megabit link was Full Duplex. Meaning Uploads should not interfere with Downloads and vice versa (each is in its own channel). It may be that something happened to cause a link/connection coming up the hill to revert to a simplex mode. Hardware failure, lost configuration. Regards Please consider a Donation to the Seti Project. |
![]() ![]() Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 ![]() ![]() |
Here a picture of what's going on right now as I'm not able to upload or report right now, As I'm getting nothing but HTML Errors. It seems a bit backwards to me to kill the uploads. However, This is also the machine that processes validations. They might just be trying to catch up before going down for maint. SETI@home classic workunits: 93,865 CPU time: 863,447 hours ![]() |
![]() ![]() Send message Joined: 9 Jan 00 Posts: 2562 Credit: 12,301,681 RAC: 0 ![]() |
Here a picture of what's going on right now as I'm not able to upload or report right now, As I'm getting nothing but HTML Errors. Thank You SJ, Please do set No New Tasks. It will leave room for others. Matt and Others come and go all hours of the day. Many things happen unnoticed. Apparently you are trying to create panic during what may be troubleshooting is not obvious to anyone... The hard part is rather than asking a question/notifying, your response was designed to create panic. Regards Please consider a Donation to the Seti Project. |
![]() ![]() Send message Joined: 5 Sep 99 Posts: 1652 Credit: 1,065,191,981 RAC: 2,537 ![]() ![]() |
I have looked at everything I can. I have a bit of information coming from Seti Staff. Looking that MB, AP and Cuda used to "all share" the same Upload/Download link and recover there is something else wrong. Heeey This actually makes perfect sense, if i try to recall what's happening this behaviour about not beeing able to upload etc has started to arose about 2 months ago in a quite distinct manner. Before this i was hardly ever met at the morning with the sign from boinc that it couldn't connect to the project check your internet connection message. If full duplex isn't present the communication is plugged to that extent that it feels like an DSL extension, if uploads from the dsl is in the roof downloads stops to a crawl if you have multiple connections to deal with. Do you know how the internal interconnect is between the servers? I hope that it's atleast decent switches with paired GBit truncs so the bottlenecks is minimised. Decent GBit switches nowadays cost peanuts and HP is a favour of mine in internal backbone structure. Kind regards Vyper ![]() _________________________________________________________________________ Addicted to SETI crunching! Founder of GPU Users Group |
![]() ![]() Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 ![]() ![]() |
Here a picture of what's going on right now as I'm not able to upload or report right now, As I'm getting nothing but HTML Errors. I head it just as the description says "Disabled: Program has been disabled by staff (for debugging/maintenance)" Figured it was something for the overall good of things. If it said "Not Running: Program failed or ran out of work (or the project is down)" Then I would be a little worried, but not really. Expecting the servers to be down tuesday. Maybe they are just getting a jump on things this week? I think it is just the "not knowing" that is hard to deal with sometimes. SETI@home classic workunits: 93,865 CPU time: 863,447 hours ![]() |
![]() ![]() Send message Joined: 19 Jan 03 Posts: 205 Credit: 1,248,845 RAC: 0 ![]() |
Do you know how the internal interconnect is between the servers? I think it's been stated before that INTERNAL structure already consists of a Gigabit backbone. I am TCP JESUS...The Carpenter Phenom Jesus....and HAMMERING is what I do best! formerly known as...MC Hammer. |
![]() ![]() Send message Joined: 5 Sep 99 Posts: 1652 Credit: 1,065,191,981 RAC: 2,537 ![]() ![]() |
Aaaand i remember one more thing that i noticed. For about three-four months ago i started to get alot of messages indicating that i couldn't connect to the scheduler but down/uploading work was fine. It seems like that my gpumachine couldn't keep the link to berkeley up long enough so that berkeley would receive the schedxxx.xml files, if i made a proxy server on my DSL line here at home instead of using my works fibre ISP connection and set my gpumachine to talk to my proxy at home on DSL it could connect to berkeley and properly reach the scheduler.. Why?! I don't really know actually! All i know that this started to happen about three to four months ago so nowadays i need to manually switch between my proxy at home and off so it can connect properly to s@h because the larger the schedxxxx.xml files is the harder it is to send it to berkeley and this would fail if growing beyond reporting 500 Wu's+. So the last four to five months now has been a constant supervising of the gpumachine because it simply can't manage itself, it would get borked in one way or another. Kind regards Vyper ![]() _________________________________________________________________________ Addicted to SETI crunching! Founder of GPU Users Group |
![]() ![]() Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 ![]() ![]() |
Aaaand i remember one more thing that i noticed. I have noticed that BOINC often "forgets" to upload results and I have to do a manual Update to get them to send in. I'll see logs of "finished wu, uploading wu, dowloading wu" and such with no errors. Just will have 6-10 results "Ready for upload" in my tasks. I have read around to see if this is a "normal" thnk to occur, but it happens on all of my hosts. SETI@home classic workunits: 93,865 CPU time: 863,447 hours ![]() |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 ![]() |
In a email that was sent to Seti Staff. At a point in time the 100Megabit link was Full Duplex. Meaning Uploads should not interfere with Downloads and vice versa (each is in its own channel). We forget that TCP is a sliding window protocol. If the 100 megabit line is saturated inbound, part of that inbound traffic are the ACKs for the outbound traffic. When the ACKs are delayed or lost, at some point the sender stops sending new data, and waits. When the ACKs don't arrive (because they were lost) data is resent. In either direction, when the load is very high, data in the other direction will suffer too. |
BarryAZ Send message Joined: 1 Apr 01 Posts: 2580 Credit: 16,982,517 RAC: 0 ![]() |
I offer another analogy -- I call it the snow plow effect. I've used that to describe workload before and after a vacation -- the snowplow clearing the road (vacation) creates very large piles of snow (work) on either side of the road. It seems often enough for SETI that the snowplow (the Tuesday outage) results in very large piles of snow (upload/download congestion) for anywhere from 12 to 24 hours on either side. Analogies are sloppy, I realize this.
![]() |
![]() ![]() Send message Joined: 19 Jan 03 Posts: 205 Credit: 1,248,845 RAC: 0 ![]() |
So...how does taking the Upload server offline the day before a known outage (tuesday maintenance) help network congestion again ? I am TCP JESUS...The Carpenter Phenom Jesus....and HAMMERING is what I do best! formerly known as...MC Hammer. |
![]() ![]() Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 ![]() ![]() |
So...how does taking the Upload server offline the day before a known outage (tuesday maintenance) help network congestion again ? I'm guessing it would give the db's more free time to catch up. So that the servers are not down for 24 hours? SETI@home classic workunits: 93,865 CPU time: 863,447 hours ![]() |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13927 Credit: 208,696,464 RAC: 304 ![]() ![]() |
Hmm. I see the upload server is disabled. This would probably explain why i can't upload & there is no inbound network traffic worth mentioning. Grant Darwin NT |
![]() Send message Joined: 25 Nov 01 Posts: 21702 Credit: 7,508,002 RAC: 20 ![]() ![]() |
In a email that was sent to Seti Staff. At a point in time the 100Megabit link was Full Duplex. Meaning Uploads should not interfere with Downloads and vice versa (each is in its own channel). That's a very 'subdued' way of describing the situation. Lose the TCP control packets in either direction and the link is DOSed with an exponentially increasing stack of resend attempts that DOS for further attempts that then DOS for... Until the link disgracefully degrades to being totally blocked. Max link utilisation but no useful information gets through. The only limiting factors are the TCP timeouts and the rate of new connection attempts. And I thought the smooth 71Mb/s was due to some cool traffic management. OK, so restricting the available WUs is also a clumsy way to "traffic manage"! In short, keep the link at never anything more than 89Mb/s MAX and everyone is happy! Happy smooth crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
Chelski ![]() Send message Joined: 3 Jan 00 Posts: 121 Credit: 8,979,050 RAC: 0 ![]() |
Interesting observation. I was scratching my head earlier why downloading was very smooth and upload didnt even start properly like earlier problems (when they tend to get stuck at 100% because the ACK was lost in space) Well, very soon the smooth 71Mbps will die down as clients stop requesting work after a certain number of WUs (3?) are stuck on the upload queue. |
![]() Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 ![]() |
If you have > 'CPUs x 2' in the UL overview, BOINC will not ask for new work. Current I have ~ 400 results ready for UL and every few minutes increasing.. ![]() |
![]() ![]() Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 ![]() |
|
Zebra3 ![]() Send message Joined: 22 Oct 01 Posts: 186 Credit: 13,658,148 RAC: 0 ![]() |
As I wade into this quasi firestorm I look at my machines and see that my earliest deadline is July 20th which other than the days importance in history it is still 6 days away. I have a few hundred WU's waiting to upload but I carry a decent 4 day cache and have NEVER run out of MB WU's to crunch. Occasionally I will run out of the demon CUDA WU's but when that happens I load a WU from CPUGRID and the GPU is happy crunching away on it for the next 24 hours or so. So in my mind if your cache of WU's is fresh and you crunch and report them in a timely fashion a 1 or 2 day interruption should have no concern other than you are staring at all those 100% completed WU's and hoping that your hard drive doesn't crash like mine did recently. There are bumps on the road all the time that don't need to be made into mountains...just deal with what is most important...FAMILY and the rest of the world will come together just fine. As the song goes...DON'T WORRY...BE HAPPY! Cheers http://www.novascotia.com |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.