Panic Mode On (81) Server Problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (81) Server Problems?

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 21 · Next
Author Message
Josef W. SegurProject donor
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4298
Credit: 1,067,168
RAC: 1,010
United States
Message 1334686 - Posted: 4 Feb 2013, 19:53:28 UTC - in response to Message 1334663.

Hmm... so they dumped once again lots of unsplited AP. Is AP not worth crunching or how shall we understand that?

They also have probably hundreds of Terabytes of data recorded before release of the Astropulse application. Dumping those is just adding some more to the pool of data processed only by S@h Enhanced algorithms.

The production rate of AP "splitting" is probably heavily influenced by the amount of server-side blanking. The reason the AP splitters were going so slowly might possibly have been a lot of detected RADAR RFI, in which case splitting those tapes might be considered less worthwhile than cleaner data.
Joe

BarryAZ
Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 12,271,925
RAC: 4,006
United States
Message 1334733 - Posted: 4 Feb 2013, 22:31:02 UTC - in response to Message 1334686.
Last modified: 4 Feb 2013, 22:37:23 UTC

Add that to the disruptive effect on those not in the AP land...

It is a bit troublesome that this problem seems to be re-enacted without much in the way of a learning process....

Perhaps since there seems to be this compulsion to revisit this operational problem what folks could do is announce in advance something along these lines:

AP disruptive release anticipated -- consider shifting over to another project until we realize it is disruptive and back off the process

Insert rueful smile/shrug here.




The production rate of AP "splitting" is probably heavily influenced by the amount of server-side blanking. The reason the AP splitters were going so slowly might possibly have been a lot of detected RADAR RFI, in which case splitting those tapes might be considered less worthwhile than cleaner data.
Joe

Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar
Send message
Joined: 20 Dec 05
Posts: 1958
Credit: 10,429,458
RAC: 8,354
United States
Message 1334738 - Posted: 4 Feb 2013, 23:03:28 UTC
Last modified: 4 Feb 2013, 23:04:03 UTC

I'm wondering about "10dc12ac 0.00 GB": why even bother "hanging a tape", if there's nothing on it? (yes, I know that they are disks, that's why "hanging a tape" is in quotes...)
____________
.

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2287
Credit: 8,797,847
RAC: 3,948
United States
Message 1334789 - Posted: 5 Feb 2013, 2:30:23 UTC

I thought the server-side stuff was supposed to be detecting APs that would be 100% blanked and not even send them out in the first place.

If they know what sections to inject random noise into.. they know what sections need to be blanked. A simple byte-map would handle that, plus some tiny bit of splitter logic.

Basically if your list of byte offsets and byte lengths are defined, then the splitter can then look at the list and say "oh, this WU that I'm splitting starts and finishes inside one length of blanked data, tell the science DB this one is bad and I'm moving on to the next start point."

Saves 16 MiB of data transfer right there. Do that a couple thousand times... and it adds up real fast.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5861
Credit: 60,432,342
RAC: 49,200
Australia
Message 1334878 - Posted: 5 Feb 2013, 7:59:13 UTC - in response to Message 1334789.
Last modified: 5 Feb 2013, 8:06:35 UTC

Still struggling to get work, "Couldn't connect to server" is still the standard response to a Scheduler request.
Very, very few requests result in contact, and then "Failure when receiving data from the peer" tends to be the response.
____________
Grant
Darwin NT.

Profile KWSN Ekky Ekky Ekky
Avatar
Send message
Joined: 25 May 99
Posts: 922
Credit: 12,005,018
RAC: 13,336
United Kingdom
Message 1334884 - Posted: 5 Feb 2013, 8:42:05 UTC - in response to Message 1334878.

Still struggling to get work, "Couldn't connect to server" is still the standard response to a Scheduler request.
Very, very few requests result in contact, and then "Failure when receiving data from the peer" tends to be the response.


Seems that way, with full contact very intermittent but I just got 15 units including one AP.
____________

Profile Tim
Volunteer tester
Avatar
Send message
Joined: 19 May 99
Posts: 204
Credit: 247,448,350
RAC: 160,322
Greece
Message 1334888 - Posted: 5 Feb 2013, 9:38:46 UTC
Last modified: 5 Feb 2013, 9:39:01 UTC

If no one can connect to server, who uses all the bandwidth?

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;ranges=d;view=Octets

Tim
____________

juan BFBProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 5383
Credit: 305,244,507
RAC: 329,953
Brazil
Message 1334890 - Posted: 5 Feb 2013, 10:03:07 UTC - in response to Message 1334888.

If no one can connect to server, who uses all the bandwidth?

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;ranges=d;view=Octets

Tim

Sure not me, no connect to the servers on any host.

____________

Jasper
Avatar
Send message
Joined: 29 Nov 11
Posts: 8
Credit: 1,026,295
RAC: 0
Switzerland
Message 1334891 - Posted: 5 Feb 2013, 10:06:14 UTC
Last modified: 5 Feb 2013, 10:06:31 UTC

Boinc Stats say Seti is offline since yesterday, see below, while the Server Status page seems showing it´s OK, or at least, there should be something available to crunch. What gives?

SETI@Home Production offline
since 2013-02-04 17:45:21 0 2262910 39 5:40
SETI@Home Beta Beta offline
since 2013-02-04 19:45:50 0 46753 1 3:00

juan BFBProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 5383
Credit: 305,244,507
RAC: 329,953
Brazil
Message 1334895 - Posted: 5 Feb 2013, 10:37:18 UTC
Last modified: 5 Feb 2013, 10:37:47 UTC

If you switch to a US based proxy, all is working, DL are very fast and the servers give you work, so the old problem with HE/Router connection returns for us the rest of the world. So the servers are realy on line and working.

But as allways, we uses to much bandwidth, so the proxy admins kick us very fast.
____________

Profile Tim
Volunteer tester
Avatar
Send message
Joined: 19 May 99
Posts: 204
Credit: 247,448,350
RAC: 160,322
Greece
Message 1334902 - Posted: 5 Feb 2013, 11:16:56 UTC

With or without proxy nothing is downloading here.

Sending my main rig to another project.

Tim

____________

Profile Tim
Volunteer tester
Avatar
Send message
Joined: 19 May 99
Posts: 204
Credit: 247,448,350
RAC: 160,322
Greece
Message 1334903 - Posted: 5 Feb 2013, 11:42:16 UTC
Last modified: 5 Feb 2013, 11:43:00 UTC

Just went back to Milkyway. Download speed 500kbps.

I can’t even saw the downloads. My cash is full at less than a minute.

Tim
____________

juan BFBProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 5383
Credit: 305,244,507
RAC: 329,953
Brazil
Message 1334904 - Posted: 5 Feb 2013, 11:54:09 UTC - in response to Message 1334903.
Last modified: 5 Feb 2013, 11:58:58 UTC

Just went back to Milkyway. Download speed 500kbps.

I can’t even saw the downloads. My cash is full at less than a minute.

Tim

Milkyway uses double precission math, our 590 is not good with DP math, so it´s not a match for the ATI GPU´s there. But at least you get plenity of jobs and will keep the GPU´s warm. There are few projects that the 590 works better on them, Collantz, Gpugrid for example, but take care each one of them have it problems too.

BTW Proxy DL stops to work here too.
____________

fscheel
Send message
Joined: 13 Apr 12
Posts: 73
Credit: 11,135,641
RAC: 0
United States
Message 1334905 - Posted: 5 Feb 2013, 11:59:01 UTC
Last modified: 5 Feb 2013, 12:03:23 UTC

Does anyone have an idea of when this might become a bit more stable?
I cannot even report completed tasks.

juan BFBProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 5383
Credit: 305,244,507
RAC: 329,953
Brazil
Message 1334906 - Posted: 5 Feb 2013, 12:03:24 UTC - in response to Message 1334905.

Does anyone have an idea of when this might become a bit more stable?

When they get more bandwidth, split the projects by ussing two separate connection one for MB and one for AP or stop/slow the production of WU on one of them. Nothing in the near future as noticed.

____________

Profile Link
Avatar
Send message
Joined: 18 Sep 03
Posts: 828
Credit: 1,570,879
RAC: 246
Germany
Message 1334910 - Posted: 5 Feb 2013, 12:30:50 UTC - in response to Message 1334906.

Does anyone have an idea of when this might become a bit more stable?

When they get more bandwidth, split the projects by ussing two separate connection one for MB and one for AP or stop/slow the production of WU on one of them. Nothing in the near future as noticed.

Or they could slow down the feeder. Preferably they should have an intelligent feeder, that knows what it gave to the scheduler (how many MB/AP) and waits according to that.
____________
.

juan BFBProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 5383
Credit: 305,244,507
RAC: 329,953
Brazil
Message 1334929 - Posted: 5 Feb 2013, 14:35:55 UTC

Totaly dry on SETI WU (CPU & GPU) only:

05/02/2013 12:07:11 | SETI@home | Reporting 45 completed tasks
05/02/2013 12:07:11 | SETI@home | Requesting new tasks for CPU and NVIDIA
05/02/2013 12:07:34 | SETI@home | Scheduler request failed: Couldn't connect to server
05/02/2013 12:07:48 | | Project communication failed: attempting access to reference site
05/02/2013 12:07:51 | | Internet access OK - project servers may be temporarily down.

Proxy stop to work too.

Some help please from the lab is needed, i just want few WU to crunch and a way to report the allready crunched WU.
____________

Profile James Sotherden
Avatar
Send message
Joined: 16 May 99
Posts: 8900
Credit: 35,811,350
RAC: 45,023
United States
Message 1334934 - Posted: 5 Feb 2013, 14:53:56 UTC

Seeing today is mantenance day. dont expect anything untill after the outage.
____________

Old James

Profile MikeProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Feb 01
Posts: 24481
Credit: 33,791,295
RAC: 24,181
Germany
Message 1334936 - Posted: 5 Feb 2013, 15:00:04 UTC - in response to Message 1334934.

Seeing today is mantenance day. dont expect anything untill after the outage.


At least i got 200 ghosts and 1 VLAR on my GPU so far.
LOL

____________

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 21 · Next

Message boards : Number crunching : Panic Mode On (81) Server Problems?

Copyright © 2014 University of California