Panic Mode On (68) Server problems?

Message boards : Number crunching : Panic Mode On (68) Server problems?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 10 · Next

AuthorMessage
Profile Belthazor
Volunteer tester
Avatar

Send message
Joined: 6 Apr 00
Posts: 219
Credit: 9,325,586
RAC: 3,766
Russia
Message 1196777 - Posted: 18 Feb 2012, 10:02:30 UTC

As for me, it's a wrong point of view. If it would be so, splitters must be disabled, but they working though without producing new WUs. Curious situation. Moreover, if I would be S@H stuff, I hardly schedulled such task on the holiday when I myself outside of the lab.
ID: 1196777 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 13034
Credit: 143,355,064
RAC: 198,178
United Kingdom
Message 1196778 - Posted: 18 Feb 2012, 10:09:36 UTC - in response to Message 1196773.  

Remember that Matt said (in Technical News):

...you have to recreate a whole new table from scratch ... and repopulate it with all the data from the "full" table. We have a billion workunits in that table, so to speed this process up we only moved over workunits 90 days old (or newer) before turning the projects on again. We only need 90 days of recent workunits around for the assimilators to work, but to get the NTPCkrs rolling again we need to repopulate the whole thing, which we'll do more casually.

My guess is that they set the science database to copy the other 946 million workunits over the weekend: and if the science database is that busy, the splitters and assimilators - both of which need to access it - will be kept waiting quite a lot of the time.

I forgot about that. Does make sense now. I would have thought queries/sec would be higher than they are for that operation though.

I think the query rate is for the BOINC database, rather than the science database.

At least, new WUs are still being split (11/sec), even if they're being allocated as fast as they can be produced - so there are none spare for a 'ready to send' buffer.
ID: 1196778 · Report as offensive
Profile Wiggo "Democratic Socialist"
Avatar

Send message
Joined: 24 Jan 00
Posts: 16577
Credit: 221,421,990
RAC: 177,123
Australia
Message 1196780 - Posted: 18 Feb 2012, 10:21:52 UTC - in response to Message 1196770.  

Remember that Matt said (in Technical News):

...you have to recreate a whole new table from scratch ... and repopulate it with all the data from the "full" table. We have a billion workunits in that table, so to speed this process up we only moved over workunits 90 days old (or newer) before turning the projects on again. We only need 90 days of recent workunits around for the assimilators to work, but to get the NTPCkrs rolling again we need to repopulate the whole thing, which we'll do more casually.

My guess is that they set the science database to copy the other 946 million workunits over the weekend: and if the science database is that busy, the splitters and assimilators - both of which need to access it - will be kept waiting quite a lot of the time.


Also related is from Matt's previous post,

Speaking of network competition - yes, we're away that we are dropping all kinds of connections during uploads/downloads. This isn't because of our router (which was definitely the problem over the summer before we added RAM to it), but somewhere else further up the pipeline. Still figuring this out, but it's certainly load related.

which still needs to be fixed (which is where a good proxy will get you around it).

Cheers.
ID: 1196780 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 17597
Credit: 397,504,806
RAC: 208,290
United Kingdom
Message 1196786 - Posted: 18 Feb 2012, 10:41:30 UTC

Its worth noting that quite a number of US hosted sites are suffering a performance hit just now so there may be a (US) wider problem.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1196786 · Report as offensive
Profile Michel448a
Volunteer tester
Avatar

Send message
Joined: 27 Oct 00
Posts: 1331
Credit: 2,970,814
RAC: 60
Canada
Message 1196799 - Posted: 18 Feb 2012, 11:50:58 UTC

i guess that the splitters saw the number of work to do (13 tapes) and they decided to do a lockout or go on hunger strike ^^
ID: 1196799 · Report as offensive
kittyman Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 50377
Credit: 982,630,944
RAC: 49,499
United States
Message 1196803 - Posted: 18 Feb 2012, 12:19:14 UTC

The number of completed WUs being returned is rising steadily as well. Either because of a shorty storm or some rigs that finally got some work.

Looking at the cache on my top rig, out of about 1500 WUs, 1100 are VHAR shorties. My total cache across 9 rigs has dropped by about 600 total over the last 2 hours.
"Learn from yesterday. Live for today. Hope for tomorrow." Albert Einstein
"With cats." kittyman

ID: 1196803 · Report as offensive
Profile Michel448a
Volunteer tester
Avatar

Send message
Joined: 27 Oct 00
Posts: 1331
Credit: 2,970,814
RAC: 60
Canada
Message 1196806 - Posted: 18 Feb 2012, 12:25:47 UTC
Last modified: 18 Feb 2012, 12:28:31 UTC

mine, both, looks working good.

no diet in sight

and ive got some cuda lately, the download was perfect
2/18/2012 6:52:57 AM SETI@home Scheduler request completed: got 5 new tasks
2/18/2012 7:03:24 AM SETI@home Scheduler request completed: got 7 new tasks


0 http error, at all
ID: 1196806 · Report as offensive
Profile cliff
Avatar

Send message
Joined: 16 Dec 07
Posts: 625
Credit: 3,590,440
RAC: 0
United Kingdom
Message 1196811 - Posted: 18 Feb 2012, 13:35:53 UTC - in response to Message 1196806.  

Hi,
Back on a diet again:-) All servers working but no new WU showing up for boinc to grab.
Guess its gonna take a bit of time for stuff to get back in the swing.

Regards,
Cliff,
Been there, Done that, Still no damm T shirt!
ID: 1196811 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 62423
Credit: 50,095,758
RAC: 21,795
United States
Message 1196814 - Posted: 18 Feb 2012, 13:41:28 UTC

I got up cause My shoulder blades are hurting(OA), noticed I had about 85 wu's waiting to download and then I started downloading them, must not have been a lot of traffic as I've got them all now.
My Amazon Wishlist
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, One of America's First HST's
ID: 1196814 · Report as offensive
Grumpy Swede
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 7594
Credit: 47,512,384
RAC: 18,634
Sweden
Message 1196817 - Posted: 18 Feb 2012, 13:45:36 UTC
Last modified: 18 Feb 2012, 13:52:17 UTC

Lots of files for the splitters to work on, but nothing seems to be produced. Strange though that cricket is pegged, and that it was pegged when the database was full too, and all the splitters was doing was producing errors, and not sending out anything. The only time cricket was at the bottom, was when they had the Overnight Outage.

The cricket graph doesn't seem as reliable as it used to be. When there was little work, it was visible directly on the cricket, or if there was some other problems. Now though, cricket is pegged all the time when the system is up, no matter if there seems to be anything to send out or not. Just like now, when the status page shows that not much is being split at all, despite the splitters having tons of files to work on.
WARNING!! "THIS IS A SIGNATURE", of the "IT MAY CHANGE AT ANY MOMENT" type. It may, or may not be considered insulting, all depending upon HOW SENSITIVE THE VIEWER IS, to certain inputs to/from the nervous system.
ID: 1196817 · Report as offensive
Profile Slavac
Volunteer tester
Avatar

Send message
Joined: 27 Apr 11
Posts: 1932
Credit: 17,952,639
RAC: 0
United States
Message 1196819 - Posted: 18 Feb 2012, 13:52:35 UTC - in response to Message 1196814.  

I got up cause My shoulder blades are hurting(OA), noticed I had about 85 wu's waiting to download and then I started downloading them, must not have been a lot of traffic as I've got them all now.


I cleared my DL cache (500ish) in minutes. Not sure if that's a good sign or not.


Executive Director GPU Users Group Inc. -
brad@gpuug.org
ID: 1196819 · Report as offensive
Profile Khangollo
Avatar

Send message
Joined: 1 Aug 00
Posts: 245
Credit: 36,410,524
RAC: 0
Slovenia
Message 1196825 - Posted: 18 Feb 2012, 14:43:31 UTC - in response to Message 1196817.  

The cricket graph doesn't seem as reliable as it used to be.

I'm sure the simple traffic grapher is as reliable as ever, it's just that there is probably one or more rogue hosts out there, constantly in a state of downloading resends...or something similar.
ID: 1196825 · Report as offensive
Grumpy Swede
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 7594
Credit: 47,512,384
RAC: 18,634
Sweden
Message 1196826 - Posted: 18 Feb 2012, 14:51:16 UTC - in response to Message 1196825.  
Last modified: 18 Feb 2012, 14:51:32 UTC

The cricket graph doesn't seem as reliable as it used to be.

I'm sure the simple traffic grapher is as reliable as ever, it's just that there is probably one or more rogue hosts out there, constantly in a state of downloading resends...or something similar.


I dunno, but if that is the case: Kill 'em, I tell you, Kill 'em :-)
WARNING!! "THIS IS A SIGNATURE", of the "IT MAY CHANGE AT ANY MOMENT" type. It may, or may not be considered insulting, all depending upon HOW SENSITIVE THE VIEWER IS, to certain inputs to/from the nervous system.
ID: 1196826 · Report as offensive
Profile Michel448a
Volunteer tester
Avatar

Send message
Joined: 27 Oct 00
Posts: 1331
Credit: 2,970,814
RAC: 60
Canada
Message 1196833 - Posted: 18 Feb 2012, 15:23:35 UTC - in response to Message 1196825.  
Last modified: 18 Feb 2012, 16:10:32 UTC

The cricket graph doesn't seem as reliable as it used to be.

I'm sure the simple traffic grapher is as reliable as ever, it's just that there is probably one or more rogue hosts out there, constantly in a state of downloading resends...or something similar.


but an important thing we need to think, and i am pretty sure it s the case
please care to /disagree and /correctme if it s not the case....

i am pretty sure the green bars in the cricket, not only it counts the outgoing connection but also all the bandwith inside their own network.
when data are outgoing, it can go to me, to you, to us.. but also outgoing toward another server of the network.
from tape to spliter to storage to 100_WU_buffer ... etc... and then when we send results, there are some inside moves. The results and the data need to be assimilated and sent to the big database. and the old datas to be deleted cause they are no more required.

i dunno i m not working there and dont to know exactly whats going on but i noticed something.

we dont get tons of download anymore, presently. and the criket is full 94.81 Mbits/sec. and when we got the shortage (last weekend), they started to cut the external link to us and it didnt drop alot, it did down to 85MB/sec.
and it s when they stop all the servers (splitters, assimilators, validators, transitioners, etc etc etc) that is the moment the cricket really stop and it drop to 500-990 kb/sec.

so it s not 100% of 95MB/sec just to us in that graph.


thats my thoughts. but correct me if i m wrong (probably i am)

EDIT: an sorry about my english, trying my best to explain my mind. just hope you understand what i want to say.
ID: 1196833 · Report as offensive
Profile John Clark
Volunteer tester
Avatar

Send message
Joined: 29 Sep 99
Posts: 16515
Credit: 4,418,829
RAC: 0
United Kingdom
Message 1196841 - Posted: 18 Feb 2012, 15:51:43 UTC

After a couple of days with downloads becoming increasing slow, and the number of WUs growing. This morning the speed of download was good, and they all cleared while I was in town.

It's good to be back amongst friends and colleagues



ID: 1196841 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 62423
Credit: 50,095,758
RAC: 21,795
United States
Message 1196844 - Posted: 18 Feb 2012, 15:57:21 UTC - in response to Message 1196841.  

After a couple of days with downloads becoming increasing slow, and the number of WUs growing. This morning the speed of download was good, and they all cleared while I was in town.

Yeah and I'm on the rebound too, now I just have to get the energy to dust out 2 295 cards today...
My Amazon Wishlist
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, One of America's First HST's
ID: 1196844 · Report as offensive
Kevin Olley

Send message
Joined: 3 Aug 99
Posts: 787
Credit: 207,187,888
RAC: 387,722
United Kingdom
Message 1196851 - Posted: 18 Feb 2012, 16:04:20 UTC - in response to Message 1196817.  



The cricket graph doesn't seem as reliable as it used to be. When there was little work, it was visible directly on the cricket, or if there was some other problems. Now though, cricket is pegged all the time when the system is up, no matter if there seems to be anything to send out or not. Just like now, when the status page shows that not much is being split at all, despite the splitters having tons of files to work on.


There is a big gray area, we know how many WU's are ready to send but if a WU is removed from this list as soon as it is alocated to a machine, which I suspect it is, then it goes into that big unknown - awaiting download.

When the pipe is running at max the queue could build up to a substantial amount, remember all of those machines running 6.12.xx, some of them WU's could be waiting days before they get downloaded.



Kevin


ID: 1196851 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 13034
Credit: 143,355,064
RAC: 198,178
United Kingdom
Message 1196853 - Posted: 18 Feb 2012, 16:10:15 UTC - in response to Message 1196833.  

i am pretty sure the green bars in the cricket, not only it counts the outgoing connection but also all the bandwith inside their own network.
when data are outgoing, it can go to me, to you, to us.. but also outgoing toward another server of the network. from tape to spliter to storage_rdy_to_send to 100_WU_buffer ... etc... and then when we send results, there are some inside moves. linked to the results and the old datas to be deleted and no more required.

Not true, I'm afraid. The cricket graphs are run by the campus-wide infrastructure guys, and are outside the SSL labs themselves.

You can see the whole campus network: our graphs are just one of the 48 gigabit interfaces on router inr-250 - about half-way down the Tier2 column.

We are "gigabitethernet2_3: 169.229.0.190: SETI@Home_P2P_to_sslringeva1fes_Gi0/1" - third up from the bottom of the inr-250 page. As in the more normal view, the busy direction (light blue in the first column) is downloads from the lab to us - labelled 'bits in' from the point of view of that router, receiving data from SSL.

Having said that, and having bothered to call that page up for the first time in ages, I'm a bit worried by the number of 'Broadcast/Multicast Packets/Second' (third column) being received from the lab on that port. Ideas, anyone?
ID: 1196853 · Report as offensive
Kevin Olley

Send message
Joined: 3 Aug 99
Posts: 787
Credit: 207,187,888
RAC: 387,722
United Kingdom
Message 1196854 - Posted: 18 Feb 2012, 16:11:59 UTC - in response to Message 1196803.  

The number of completed WUs being returned is rising steadily as well. Either because of a shorty storm or some rigs that finally got some work.

Looking at the cache on my top rig, out of about 1500 WUs, 1100 are VHAR shorties. My total cache across 9 rigs has dropped by about 600 total over the last 2 hours.


No shorties here, I left this machine in shorty bashing mode.

OTOH GPU cache is only at 50%:-(



Kevin


ID: 1196854 · Report as offensive
kittyman Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 50377
Credit: 982,630,944
RAC: 49,499
United States
Message 1196859 - Posted: 18 Feb 2012, 16:17:53 UTC

Little soon to tell, but it looks like something may have come to a crashing halt. And the Cricket graph is starting to show signs of download traffic coming down too.
"Learn from yesterday. Live for today. Hope for tomorrow." Albert Einstein
"With cats." kittyman

ID: 1196859 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 . . . 10 · Next

Message boards : Number crunching : Panic Mode On (68) Server problems?


 
©2019 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.