Panic Mode On (78) Server Problems? |
![]() |
| log in |
Message boards : Number crunching : Panic Mode On (78) Server Problems?
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 22 · Next
| Author | Message |
|---|---|
|
Thank you, Juan! | |
| ID: 1301867 · | |
They will be acknowledged as report right now only if you select "No New Tasks" on the projects tab. Well it does on my system. Task gets done, it uploads and then the scheduler reports it, it goes through and the task vanishes from my client's task list. Don't know what to say. ____________ "Life is just nature's way of keeping meat fresh." - The Doctor | |
| ID: 1301871 · | |
Something other than just bandwidth saturation is at play here. That's my feeling. There have been many times in the past where network traffic has been maxed out, and downloads are pretty much impossible, but you are still able to contact the Scheduler to report work & get more work allocated. The fact is that even now with the network traffic maxed out, if you do (some how) manage to get some work, it's downloading fairly quickly. Certainly much, much faster than in the past, and when you were still able to get a response from the Scheduler. Over the last few months we've had issues with Scheduler timeouts, but not for nearly as long as this time, nor nearly as severe- from memory i would get a response about 1 in 5 to 7 attemps. Now i'm lucky if it's 1 in 20, No New Tasks set or not. Hence i suspect it's a system configuration/load problem, not a network load one. ____________ Grant Darwin NT. | |
| ID: 1301872 · | |
They will be acknowledged as report right now only if you select "No New Tasks" on the projects tab. It gets really hard to quantify it when I have 9 rigs trying to report 1000s of WUs. Hits and misses go by unnoticed by me. Until I check the stats page and I see some rigs have not reported for hours. That page is usually my barometer for the rigs, if I see one has not reported for a while, I suspect a crash and check it out. Not a reliable barometer at the moment. ____________ ****** "Ask not, what your kitty can do for you. Ask what you can do for your kitty." As it is kitten, so shall it be done. | |
| ID: 1301875 · | |
They will be acknowledged as report right now only if you select "No New Tasks" on the projects tab. My client_state is 2.4MB in size, my sched_request_setiathome.berkeley.edu is 450kB in size. I suspect yours are a lot smaller. You're in the US, i'm a few thousand kms away. End result- you may be able to get work, i'm lucky if i can even report work- even after 30min of endless Update clicking with No New Tasks set. ____________ Grant Darwin NT. | |
| ID: 1301878 · | |
They will be acknowledged as report right now only if you select "No New Tasks" on the projects tab. Grant, don't feel special. The kitties are equally fukayed from the midwest USA..... I could be on the Berk campus and still be screwed as bad as you right now. ____________ ****** "Ask not, what your kitty can do for you. Ask what you can do for your kitty." As it is kitten, so shall it be done. | |
| ID: 1301880 · | |
|
| |
| ID: 1301887 · | |
Been getting those for the last 24 hours or more. #2 rig has not been able to get through for an hour and a half. ____________ ****** "Ask not, what your kitty can do for you. Ask what you can do for your kitty." As it is kitten, so shall it be done. | |
| ID: 1301888 · | |
|
I'm having no trouble reporting 1-5 tasks every couple of hours. My cache got a little over-filled when I started hoarding APs, so I'm not asking for more work presently, which I suspect is nearly the equivalent of NNT, since both are effectively "not asking for more work." | |
| ID: 1301899 · | |
|
my linux boxes are reporting and obtaining normal work. but my one windoz box shows nothing but scheduler timeout. it seems to upload ok .. slowly but ok. | |
| ID: 1301901 · | |
|
You folks don't have a clue about hosts processing and trying to return 100s of results an hour. | |
| ID: 1301903 · | |
When I was downloading the APs, they were coming in at around 25KB/sec for each one, sometimes I had 5-8 of them going at a time. Which is more support for the Scheduler issues being server related, not network traffic. ____________ Grant Darwin NT. | |
| ID: 1301905 · | |
|
msattler wrote: The kitties need access to the servers 24/7. They ought to just send you 500gb unsplit raw drives to crunch :-) Then we'd all have some network bandwidth to spare. | |
| ID: 1301912 · | |
|
I can't believe it is Karma. It is just that the big guys are constipating the system. Everybody knows that. Until a politically acceptable and economic doable solution is proposed this is just venting. | |
| ID: 1301914 · | |
|
Tron, that gets away from the concept of distributed computing. | |
| ID: 1301916 · | |
Hey, that might work....LOL. I'll broach the subject with Eric in our next chat. I'd have to install the splitting software. I think the GPUUG has sent enough HDs for shuttle service. That would be a grand solution, I think. They would have to trust the kitties' science trail....err, tails. Something tells me that letting raw data out of the house would not work scientifically. The kitties are simply scintellated by the thought, though. ____________ ****** "Ask not, what your kitty can do for you. Ask what you can do for your kitty." As it is kitten, so shall it be done. | |
| ID: 1301917 · | |
Tron, that gets away from the concept of distributed computing. they still distribute the work , and only say... the top 25 machines would participate in a HD exchange program I don't think the work would need to be split either.. just one long stream with nominal checkpointing. | |
| ID: 1301919 · | |
You folks don't have a clue about hosts processing and trying to return 100s of results an hour. I wasn't trying to say that it was a situation of "it must just be a problem on your end," I was merely pointing out that I don't have any/many scheduler contact issues, even when only reporting a very small number of tasks. Others are having connection issues when reporting a small number of tasks, and so are those who are reporting a large number. Consider my message as a data point on a graph, or a breadcrumb for trying to pin-point the actual problem. ____________ Linux laptop uptime: 1484d 22h 42m Ended due to UPS failure, found 14 hours after the fact | |
| ID: 1301920 · | |
|
| |
| ID: 1301958 · | |
|
Perhaps this might be a situation where there is more than one problem occurring at the same time. That, as most of us know from experience, makes diagnosis exceedingly difficult, which seems to fit the current crisis. | |
| ID: 1301975 · | |
Message boards : Number crunching : Panic Mode On (78) Server Problems?
| Copyright © 2013 University of California |