Panic Mode On (112) Server Problems?

Message boards : Number crunching : Panic Mode On (112) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 33 · Next

AuthorMessage
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1934478 - Posted: 9 May 2018, 1:08:39 UTC - in response to Message 1934475.  

Ever since the database reorg, I have been getting either the permanent download error or missing header error. The ERR_XML_PARSE error would make the third type of database error we've seen.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1934478 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22240
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1934520 - Posted: 9 May 2018, 5:10:50 UTC

ERR_XML_PARSE

Is not a database error, but an parser error - part of the code that is being accessed is poorly formed and so the parser can't do its job. Looking back in the thread it would appear to be connected with download errors in a somewhat perverse way, so may be a transmission error that is corrupting the odd task.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1934520 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13755
Credit: 208,696,464
RAC: 304
Australia
Message 1934522 - Posted: 9 May 2018, 5:14:03 UTC - in response to Message 1934475.  

I don't like errors and not only do I still have 1 of them download errors left on my main rig, but now I have 3 of them "ERR_XML_PARSE" or "Bad Work Header" jobs on my backup rig. :-(

Cheers.

Been hit with a "Bad WU header" error myself.
:-/
Grant
Darwin NT
ID: 1934522 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1934528 - Posted: 9 May 2018, 6:22:43 UTC - in response to Message 1934520.  
Last modified: 9 May 2018, 6:29:57 UTC

ERR_XML_PARSE

Is not a database error, but an parser error - part of the code that is being accessed is poorly formed and so the parser can't do its job. Looking back in the thread it would appear to be connected with download errors in a somewhat perverse way, so may be a transmission error that is corrupting the odd task.
The component most likely to be implicated is the splitter - that knows the recording parameters of the tape it's working on, and writes out first the parameters and processing instructions (in XML format), and then the actual data, into a combined workunit file. If I wasn't scheduled to be out at a meeting all day, I'd download one of the files and try to read the XML bit with my own eyes.

Edit - OK, here's my example for y'all to play with. http://boinc2.ssl.berkeley.edu/sah/download_fanout/d9/05my18aa.15528.2934.10.37.56
ID: 1934528 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1934530 - Posted: 9 May 2018, 6:32:36 UTC - in response to Message 1934528.  

ID: 1934530 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1934531 - Posted: 9 May 2018, 6:45:29 UTC - in response to Message 1934530.  

Mine was WU 2964747029. It had a normal header, but only about 20 lines of data, and no XML to say that the end of the data had been reached - it should end

</data>
</workunit>
And as a consequence, it was a tiny file - 12 KB instead of 700-odd
ID: 1934531 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13755
Credit: 208,696,464
RAC: 304
Australia
Message 1934532 - Posted: 9 May 2018, 6:54:18 UTC

A result of the pre-outage outage where the server status data stopped up dating, then several hours later there was no work other than resends?
Grant
Darwin NT
ID: 1934532 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1934533 - Posted: 9 May 2018, 7:12:52 UTC - in response to Message 1934532.  

Could be. Maybe "created 7 May 2018, 23:58:30 UTC" is significant?

OK, I've got to get ready to go out. If anyone else wants to play, a reminder of WOW! Where on earth did that come from?
ID: 1934533 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13755
Credit: 208,696,464
RAC: 304
Australia
Message 1934536 - Posted: 9 May 2018, 7:39:25 UTC
Last modified: 9 May 2018, 7:55:38 UTC

9/05/2018 17:07:06 | SETI@home | Project has no tasks available

With 180k Ready-to-send, i'm hoping that was just a one off glitch.

Edit-
Just a glitch, next couple of requests got work.
*fingers crossed*
Grant
Darwin NT
ID: 1934536 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 1934665 - Posted: 9 May 2018, 16:57:58 UTC
Last modified: 9 May 2018, 17:03:00 UTC

Remember, the buffer that actually sends you work is only about 400 WU long - when that runs out, it has to dip into the database for more WU's to send...

This is why you will get the dreaded "no work to send" on occasion, when the status page says there are plenty of WU's... (also check to see that the "status" page is up-to-date...)
.

Hello, from Albany, CA!...
ID: 1934665 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1934752 - Posted: 10 May 2018, 1:27:07 UTC

. . It may just be me but my downloads are r.e.a.l slow.

Stephen

??
ID: 1934752 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13755
Credit: 208,696,464
RAC: 304
Australia
Message 1934776 - Posted: 10 May 2018, 5:45:11 UTC - in response to Message 1934665.  

Remember, the buffer that actually sends you work is only about 400 WU long - when that runs out, it has to dip into the database for more WU's to send...

This is why you will get the dreaded "no work to send" on occasion, when the status page says there are plenty of WU's... (also check to see that the "status" page is up-to-date...)

That may be the case after an extended outage, but with the much shortened outages once there is work available i'm able to get some. So it shouldn't be the case at any other time.
Grant
Darwin NT
ID: 1934776 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13755
Credit: 208,696,464
RAC: 304
Australia
Message 1934777 - Posted: 10 May 2018, 5:48:31 UTC - in response to Message 1934752.  

. . It may just be me but my downloads are r.e.a.l slow.

Might depend on the download server you got for that download. For the last few weeks my downloads have either been very fast, or just OK (other than when we had that system issue).


I notice the Ready-to-send buffer has settled around the 500k level. So either they've changed it, or the Splitters can't quite keep up their output to fill in that last 100k.
Grant
Darwin NT
ID: 1934777 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13755
Credit: 208,696,464
RAC: 304
Australia
Message 1935027 - Posted: 11 May 2018, 8:02:25 UTC
Last modified: 11 May 2018, 8:03:32 UTC

Anyone else finding the web site slower than a month of Sundays for the last 10min or so? Threads load OK, but clicking on any link results in a very long wait for something to actually happen.

EDIT- and now it's OK again.
(should have posted about it sooner).
Grant
Darwin NT
ID: 1935027 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13755
Credit: 208,696,464
RAC: 304
Australia
Message 1935054 - Posted: 11 May 2018, 9:52:23 UTC

Arecibo file appears to be stuck. Been 9 channels done, 5 in progress for several hours now.
Grant
Darwin NT
ID: 1935054 · Report as offensive
Profile Stargate (SA)
Volunteer tester
Avatar

Send message
Joined: 4 Mar 10
Posts: 1854
Credit: 2,258,721
RAC: 0
Australia
Message 1935065 - Posted: 11 May 2018, 10:40:27 UTC

Sorry to all, I've had a major issue with comp/seti and have a lot of work that has been deemed abandoned, spent all day since early hours trying to figure out
what went wrong..Again sorry to all..
ID: 1935065 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1935071 - Posted: 11 May 2018, 11:21:11 UTC - in response to Message 1935065.  

Sorry to all, I've had a major issue with comp/seti and have a lot of work that has been deemed abandoned, spent all day since early hours trying to figure out
what went wrong..Again sorry to all..


. . Abandoned is better than ghosted, they will be resent and with luck soon completed.

. . My attempts to run SoG r3584 have had several failures with eventual success, but I have ghosted a couple of hundred tasks along the way. It will take me most of the week to recover and complete them. Hopefully my wingmen will not be too put out by the delay.

Stephen

:(
ID: 1935071 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1935073 - Posted: 11 May 2018, 11:25:34 UTC - in response to Message 1935071.  

In our discussions of app_info.xml, did we ever say "Set small cache levels when testing, until you're sure that it's working"?
ID: 1935073 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1935082 - Posted: 11 May 2018, 12:07:58 UTC - in response to Message 1935073.  

In our discussions of app_info.xml, did we ever say "Set small cache levels when testing, until you're sure that it's working"?


. . Yep, and Bruce even advocated running the cache down to zero. But I am fatalistic and becoming quite adept at the ghost recovery procedure. I find practice helps me improve ... :)

. . I only have this machine online a few times a day unlike the Linux boxes, so I have to keep the cache large enough to keep it busy between online times. And that number of ghosts is from several failed attempts. Thankfully all the help I got resolved the problem before it went into 4 digits ...

. . At the end I would set NNT before reporting and then try the next rewrite of app_info. Otherwise the number would be bigger ...

. . It'll all be OK in the end.

Stephen

:)
ID: 1935082 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1935084 - Posted: 11 May 2018, 12:12:37 UTC - in response to Message 1935082.  

Use local preferences to set 0.01 day.

Try new app_info.

If result 'Oh f...', edit and try again.

If result 'Yeah!', revert to normal settings, walk away, pour beer.

:)
ID: 1935084 · Report as offensive
Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 33 · Next

Message boards : Number crunching : Panic Mode On (112) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.