Panic Mode On (81) Server Problems?

Message boards : Number crunching : Panic Mode On (81) Server Problems?

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 21 · Next

AuthorMessage
kittymanProject Donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 45913
Credit: 815,149,765
RAC: 125,382
United States
Message 1334664 - Posted: 4 Feb 2013, 18:57:53 UTC

And now none of the rigs has connected for half an hour.
From the looks of the Cricket graph, either things fell over again, or they are rebooting.
And so it goes.


Cats.....what more does one need?

Have made friends in this life.
Most were cats.

ID: 1334664 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1334686 - Posted: 4 Feb 2013, 19:53:28 UTC - in response to Message 1334663.  

Hmm... so they dumped once again lots of unsplited AP. Is AP not worth crunching or how shall we understand that?

They also have probably hundreds of Terabytes of data recorded before release of the Astropulse application. Dumping those is just adding some more to the pool of data processed only by S@h Enhanced algorithms.

The production rate of AP "splitting" is probably heavily influenced by the amount of server-side blanking. The reason the AP splitters were going so slowly might possibly have been a lot of detected RADAR RFI, in which case splitting those tapes might be considered less worthwhile than cleaner data.
                                                                    Joe

ID: 1334686 · Report as offensive
BarryAZ

Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 14,089,610
RAC: 1,238
United States
Message 1334733 - Posted: 4 Feb 2013, 22:31:02 UTC - in response to Message 1334686.  
Last modified: 4 Feb 2013, 22:37:23 UTC

Add that to the disruptive effect on those not in the AP land...

It is a bit troublesome that this problem seems to be re-enacted without much in the way of a learning process....

Perhaps since there seems to be this compulsion to revisit this operational problem what folks could do is announce in advance something along these lines:

AP disruptive release anticipated -- consider shifting over to another project until we realize it is disruptive and back off the process

Insert rueful smile/shrug here.




The production rate of AP "splitting" is probably heavily influenced by the amount of server-side blanking. The reason the AP splitters were going so slowly might possibly have been a lot of detected RADAR RFI, in which case splitting those tapes might be considered less worthwhile than cleaner data.
                                                                    Joe

ID: 1334733 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 2580
Credit: 34,731,381
RAC: 19,948
United States
Message 1334738 - Posted: 4 Feb 2013, 23:03:28 UTC
Last modified: 4 Feb 2013, 23:04:03 UTC

I'm wondering about "10dc12ac 0.00 GB": why even bother "hanging a tape", if there's nothing on it? (yes, I know that they are disks, that's why "hanging a tape" is in quotes...)


.

ID: 1334738 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 2871
Credit: 10,621,656
RAC: 330
United States
Message 1334789 - Posted: 5 Feb 2013, 2:30:23 UTC

I thought the server-side stuff was supposed to be detecting APs that would be 100% blanked and not even send them out in the first place.

If they know what sections to inject random noise into.. they know what sections need to be blanked. A simple byte-map would handle that, plus some tiny bit of splitter logic.

Basically if your list of byte offsets and byte lengths are defined, then the splitter can then look at the list and say "oh, this WU that I'm splitting starts and finishes inside one length of blanked data, tell the science DB this one is bad and I'm moving on to the next start point."

Saves 16 MiB of data transfer right there. Do that a couple thousand times... and it adds up real fast.


Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)

ID: 1334789 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 7483
Credit: 91,068,686
RAC: 46,377
Australia
Message 1334878 - Posted: 5 Feb 2013, 7:59:13 UTC - in response to Message 1334789.  
Last modified: 5 Feb 2013, 8:06:35 UTC

Still struggling to get work, "Couldn't connect to server" is still the standard response to a Scheduler request.
Very, very few requests result in contact, and then "Failure when receiving data from the peer" tends to be the response.


Grant
Darwin NT

ID: 1334878 · Report as offensive
Profile KWSN Ekky Ekky Ekky
Avatar

Send message
Joined: 25 May 99
Posts: 937
Credit: 20,559,960
RAC: 8,878
United Kingdom
Message 1334884 - Posted: 5 Feb 2013, 8:42:05 UTC - in response to Message 1334878.  

Still struggling to get work, "Couldn't connect to server" is still the standard response to a Scheduler request.
Very, very few requests result in contact, and then "Failure when receiving data from the peer" tends to be the response.


Seems that way, with full contact very intermittent but I just got 15 units including one AP.

ID: 1334884 · Report as offensive
Profile Tim
Volunteer tester
Avatar

Send message
Joined: 19 May 99
Posts: 211
Credit: 278,573,354
RAC: 182
Greece
Message 1334888 - Posted: 5 Feb 2013, 9:38:46 UTC
Last modified: 5 Feb 2013, 9:39:01 UTC

If no one can connect to server, who uses all the bandwidth?

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;ranges=d;view=Octets

Tim


ID: 1334888 · Report as offensive
juan BFP
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 5847
Credit: 330,546,456
RAC: 7,824
Panama
Message 1334890 - Posted: 5 Feb 2013, 10:03:07 UTC - in response to Message 1334888.  

If no one can connect to server, who uses all the bandwidth?

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;ranges=d;view=Octets

Tim

Sure not me, no connect to the servers on any host.

ID: 1334890 · Report as offensive
Jasper
Avatar

Send message
Joined: 29 Nov 11
Posts: 8
Credit: 1,026,591
RAC: 0
Switzerland
Message 1334891 - Posted: 5 Feb 2013, 10:06:14 UTC
Last modified: 5 Feb 2013, 10:06:31 UTC

Boinc Stats say Seti is offline since yesterday, see below, while the Server Status page seems showing it´s OK, or at least, there should be something available to crunch. What gives?

SETI@Home Production offline
since 2013-02-04 17:45:21 0 2262910 39 5:40
SETI@Home Beta Beta offline
since 2013-02-04 19:45:50 0 46753 1 3:00

ID: 1334891 · Report as offensive
juan BFP
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 5847
Credit: 330,546,456
RAC: 7,824
Panama
Message 1334895 - Posted: 5 Feb 2013, 10:37:18 UTC
Last modified: 5 Feb 2013, 10:37:47 UTC

If you switch to a US based proxy, all is working, DL are very fast and the servers give you work, so the old problem with HE/Router connection returns for us the rest of the world. So the servers are realy on line and working.

But as allways, we uses to much bandwidth, so the proxy admins kick us very fast.


ID: 1334895 · Report as offensive
Profile Tim
Volunteer tester
Avatar

Send message
Joined: 19 May 99
Posts: 211
Credit: 278,573,354
RAC: 182
Greece
Message 1334902 - Posted: 5 Feb 2013, 11:16:56 UTC

With or without proxy nothing is downloading here.

Sending my main rig to another project.

Tim


ID: 1334902 · Report as offensive
Profile Tim
Volunteer tester
Avatar

Send message
Joined: 19 May 99
Posts: 211
Credit: 278,573,354
RAC: 182
Greece
Message 1334903 - Posted: 5 Feb 2013, 11:42:16 UTC
Last modified: 5 Feb 2013, 11:43:00 UTC

Just went back to Milkyway. Download speed 500kbps.

I can’t even saw the downloads. My cash is full at less than a minute.

Tim


ID: 1334903 · Report as offensive
juan BFP
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 5847
Credit: 330,546,456
RAC: 7,824
Panama
Message 1334904 - Posted: 5 Feb 2013, 11:54:09 UTC - in response to Message 1334903.  
Last modified: 5 Feb 2013, 11:58:58 UTC

Just went back to Milkyway. Download speed 500kbps.

I can’t even saw the downloads. My cash is full at less than a minute.

Tim

Milkyway uses double precission math, our 590 is not good with DP math, so it´s not a match for the ATI GPU´s there. But at least you get plenity of jobs and will keep the GPU´s warm. There are few projects that the 590 works better on them, Collantz, Gpugrid for example, but take care each one of them have it problems too.

BTW Proxy DL stops to work here too.

ID: 1334904 · Report as offensive
fscheel

Send message
Joined: 13 Apr 12
Posts: 73
Credit: 11,135,641
RAC: 0
United States
Message 1334905 - Posted: 5 Feb 2013, 11:59:01 UTC
Last modified: 5 Feb 2013, 12:03:23 UTC

Does anyone have an idea of when this might become a bit more stable?
I cannot even report completed tasks.

ID: 1334905 · Report as offensive
juan BFP
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 5847
Credit: 330,546,456
RAC: 7,824
Panama
Message 1334906 - Posted: 5 Feb 2013, 12:03:24 UTC - in response to Message 1334905.  

Does anyone have an idea of when this might become a bit more stable?

When they get more bandwidth, split the projects by ussing two separate connection one for MB and one for AP or stop/slow the production of WU on one of them. Nothing in the near future as noticed.

ID: 1334906 · Report as offensive
Profile Link
Avatar

Send message
Joined: 18 Sep 03
Posts: 805
Credit: 1,678,562
RAC: 22
Germany
Message 1334910 - Posted: 5 Feb 2013, 12:30:50 UTC - in response to Message 1334906.  

Does anyone have an idea of when this might become a bit more stable?

When they get more bandwidth, split the projects by ussing two separate connection one for MB and one for AP or stop/slow the production of WU on one of them. Nothing in the near future as noticed.

Or they could slow down the feeder. Preferably they should have an intelligent feeder, that knows what it gave to the scheduler (how many MB/AP) and waits according to that.
.

ID: 1334910 · Report as offensive
juan BFP
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 5847
Credit: 330,546,456
RAC: 7,824
Panama
Message 1334929 - Posted: 5 Feb 2013, 14:35:55 UTC

Totaly dry on SETI WU (CPU & GPU) only:

05/02/2013 12:07:11 | SETI@home | Reporting 45 completed tasks
05/02/2013 12:07:11 | SETI@home | Requesting new tasks for CPU and NVIDIA
05/02/2013 12:07:34 | SETI@home | Scheduler request failed: Couldn't connect to server
05/02/2013 12:07:48 | | Project communication failed: attempting access to reference site
05/02/2013 12:07:51 | | Internet access OK - project servers may be temporarily down.

Proxy stop to work too.

Some help please from the lab is needed, i just want few WU to crunch and a way to report the allready crunched WU.


ID: 1334929 · Report as offensive
Profile James SotherdenProject Donor
Avatar

Send message
Joined: 16 May 99
Posts: 10133
Credit: 65,790,720
RAC: 37,551
United States
Message 1334934 - Posted: 5 Feb 2013, 14:53:56 UTC

Seeing today is mantenance day. dont expect anything untill after the outage.


[/quote]

Old James

ID: 1334934 · Report as offensive
Profile Mike
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 29578
Credit: 49,086,846
RAC: 16,958
Germany
Message 1334936 - Posted: 5 Feb 2013, 15:00:04 UTC - in response to Message 1334934.  

Seeing today is mantenance day. dont expect anything untill after the outage.


At least i got 200 ghosts and 1 VLAR on my GPU so far.
LOL

With each crime and every kindness we birth our future.

ID: 1334936 · Report as offensive
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 21 · Next

Message boards : Number crunching : Panic Mode On (81) Server Problems?


 
©2016 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.