Panic Mode On (77) Server Problems?

Message boards : Number crunching : Panic Mode On (77) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 22 · Next

AuthorMessage
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1289884 - Posted: 30 Sep 2012, 23:56:43 UTC - in response to Message 1289878.  

Wiggo,

My computers won't download any CPU workunits at all unless they already have at least one GPU workunit (not necessarily from the same BOINC project).

Well I have plenty of GPU work on hand but obviously my Q6600 didn't have 8 days worth of CPU work on hand as my cache is set to.

Each of my PC's is set to a different venue so I just edit those preferences as to what work is required at the time, both my other rigs have now been set back to accepting GPU again, and when the Q6600 stops requesting CPU I'll set it back to accepting GPU work.

Cheers.
ID: 1289884 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 783
Credit: 348,560,338
RAC: 223
United Kingdom
Message 1289904 - Posted: 1 Oct 2012, 2:21:27 UTC - in response to Message 1289884.  

[quote]Each of my PC's is set to a different venue so I just edit those preferences as to what work is required at the time, both my other rigs have now been set back to accepting GPU again, and when the Q6600 stops requesting CPU I'll set it back to accepting GPU work.

Cheers.

I've recently realised that a useful work-around, given that we have three locales to play with (work, home, school) would be to set one to accept both CPU & GPU, one GPU only and one CPU only -- and then switch machines as needs dictate to the locale providing the desired downloads.
ID: 1289904 · Report as offensive
bill

Send message
Joined: 16 Jun 99
Posts: 861
Credit: 29,352,955
RAC: 0
United States
Message 1289914 - Posted: 1 Oct 2012, 2:49:43 UTC

scheduling server synergy
scheduler process synergy
ap_splitter1 synergy
ap_splitter4 synergy
ap_splitter5 synergy
ap_splitter6 synergy

Since the upload/download transfers seem to have improved with the
cessation of the ap splitting, I wonder if there might be a connection
to the above functions all being on synergy?
ID: 1289914 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1289957 - Posted: 1 Oct 2012, 6:01:06 UTC - in response to Message 1289863.  

Whether do I still experience it that in Berkeley the homework is made and there is a well-arranged binding to the network?

The PC run dry constantly, all the same what is put in the Boincmanager. By hand the download / Upload stumble..... tiresomely.


What Boinc version are you running? and what cache settings are you running?

Claggy
ID: 1289957 · Report as offensive
.clair.

Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 55,390,408
RAC: 69
United Kingdom
Message 1290010 - Posted: 1 Oct 2012, 10:59:33 UTC
Last modified: 1 Oct 2012, 11:02:08 UTC

Not seen this before, 37 tasks undownloadable with :-
01/10/2012 10:37:08 | | [error] Can't create HTTP response output file projects/setiathome.berkeley.edu/14se10ab.13157.275891.16.10.211
Ran out of work coz this lot got stuck, or something
Me thinks just abort them and move on.

Edit - we get a lot of this over on Cosmology@home

Stderr output
<core_client_version>7.0.28</core_client_version>
<![CDATA[
<message>
WU download error: couldn't get input files:
<file_xfer_error>
<file_name>21ap11ah.12802.10039.7.10.248</file_name>
<error_code>-197</error_code>
<error_message>user requested transfer abort</error_message>
</file_xfer_error>

</message>
]]>
ID: 1290010 · Report as offensive
Profile S@NL Etienne Dokkum
Volunteer tester
Avatar

Send message
Joined: 11 Jun 99
Posts: 212
Credit: 43,822,095
RAC: 0
Netherlands
Message 1290016 - Posted: 1 Oct 2012, 11:24:29 UTC

well, there seems to be work handed out again... got 300+ GPU shorties this morning which are - of course - all running high priority now.

now got 100 or so of them stuck downloading. So,new approache : "No new work".

Maybe tomorrow's outage will sort stuff out.
ID: 1290016 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1290024 - Posted: 1 Oct 2012, 11:49:18 UTC - in response to Message 1290016.  

well, there seems to be work handed out again... got 300+ GPU shorties this morning which are - of course - all running high priority now.

now got 100 or so of them stuck downloading. So,new approache : "No new work".

Maybe tomorrow's outage will sort stuff out.


I currently have 939 CUDA tasks cached and another 622 downloading.

ID: 1290024 · Report as offensive
musicplayer

Send message
Joined: 17 May 10
Posts: 2430
Credit: 926,046
RAC: 0
Message 1290045 - Posted: 1 Oct 2012, 14:19:34 UTC

May I ask the following question?

I guess we went through a "shorties" storm once again.

In these tasks, the gaussian search was not carried out.

Also the same of course goes for the .vlar's.

But then, if some of the numbers (including pulses and possible triplets) from these tasks showed up better, are be back to the task of finding the better results once again by means of carrying out the additional gaussian search on these tasks?

I assume this is the only way this can be carried out. Or is it something else that can be tried out as well for the selected tasks?
ID: 1290045 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1290111 - Posted: 1 Oct 2012, 17:17:22 UTC - in response to Message 1290010.  

Not seen this before, 37 tasks undownloadable with :-
01/10/2012 10:37:08 | | [error] Can't create HTTP response output file projects/setiathome.berkeley.edu/14se10ab.13157.275891.16.10.211
...

That's a file system error, the file couldn't be opened for writing some data which had been received. The "Can't create" is poor wording, the same message is shown if an existing partial file can't be opened to append new data.

I've never seen that, and don't remember any previous posts mentioning it. Seems like the kind of thing which might happen if a virus scanner is allowed to check the BOINC directory hierarchy, though.
                                                                   Joe
ID: 1290111 · Report as offensive
.clair.

Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 55,390,408
RAC: 69
United Kingdom
Message 1290163 - Posted: 1 Oct 2012, 19:25:50 UTC
Last modified: 1 Oct 2012, 19:27:47 UTC

Thank`s for that Joe,
I dumped them and moved on.
Edit - [short version] dont bother with an antivirus on that crunch box.
ID: 1290163 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1290214 - Posted: 1 Oct 2012, 21:41:06 UTC - in response to Message 1290163.  


Still struggling to build up a cache of CPU work- overnight many of the requests for work resulted in "Project has no tasks available" messages.

Over the last few days i would get that message on probably 1 in 5 requests. For the last 8 hours or so it's more like 4 in 5 requests resulting in "Project has no tasks available messages".
Looks like something else has now gotten tangled up.
Grant
Darwin NT
ID: 1290214 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1290309 - Posted: 2 Oct 2012, 5:14:10 UTC - in response to Message 1290214.  
Last modified: 2 Oct 2012, 5:14:24 UTC


Still struggling to build up a cache of CPU work- overnight many of the requests for work resulted in "Project has no tasks available" messages.

Over the last few days i would get that message on probably 1 in 5 requests. For the last 8 hours or so it's more like 4 in 5 requests resulting in "Project has no tasks available" messages.
Looks like something else has now gotten tangled up.


Still no joy getting work- most requests result in "Project has no tasks available" or "No tasks sent" messages.
Even if i only got 6-8 WUs with each request my caches would be full by now, even with all the shorties. But with the vast majority of requests resulting in no work i'm as close as i ever was to running out of CPU work again.
Grant
Darwin NT
ID: 1290309 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22158
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1290311 - Posted: 2 Oct 2012, 5:20:09 UTC

watching the performance of the servers - soon after "something" is done the download/upload/report performance is acceptable. Gradually the delivery rate slows down, and the re-try rate increases, until "not a lot" is happening apart from retries, back-offs and more retries. Then something is done to the servers, and the whole cycle starts again. This suggests to me a memory bleed of some sort in one of the server processes....
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1290311 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1290340 - Posted: 2 Oct 2012, 8:19:06 UTC - in response to Message 1290311.  

watching the performance of the servers - soon after "something" is done the download/upload/report performance is acceptable. Gradually the delivery rate slows down, and the re-try rate increases, until "not a lot" is happening apart from retries, back-offs and more retries. Then something is done to the servers, and the whole cycle starts again. This suggests to me a memory bleed of some sort in one of the server processes....

One cause of that is database table fragmentation, which is one of the reasons for Tuesday maintenance and the improvements afterwards.
ID: 1290340 · Report as offensive
Sp@ceNv@der Project Donor
Avatar

Send message
Joined: 10 Jul 05
Posts: 41
Credit: 117,366,167
RAC: 152
Belgium
Message 1290348 - Posted: 2 Oct 2012, 10:02:00 UTC - in response to Message 1290340.  

What is the explanation behind a "shorties storm" ? They don't seem to originate from the same tapes? Yet almost anything being sent out are VHAR units. Is this a server problem? I'm curious to read some info on this ;) Works flows slowly at best right now, better than nothing at all of course.

Kind regards.
To boldly crunch ...
ID: 1290348 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1290366 - Posted: 2 Oct 2012, 10:59:35 UTC - in response to Message 1290348.  

What is the explanation behind a "shorties storm" ? They don't seem to originate from the same tapes? Yet almost anything being sent out are VHAR units. Is this a server problem? I'm curious to read some info on this ;) Works flows slowly at best right now, better than nothing at all of course.

Kind regards.

In simple terms: SETI gets its data for free, by taking its own copy of the data being recorded during the course of astronmonical observations at the Arecibo radio telescope.

Different groups of radio astronomers are allocated observing time on the telescope, according to an observatory schedule which can be searched online if you're really interested. Each separate group of observers has control of the telescope during their assigned time slot, and control its movement and observing patterns.

Some astronomers are interested in long, steady, deep-space observations of, near enough, point sources. The focal point of the telescope remains steady in relation to the sky - the recordings have a low 'angle range' between the beginning and end of the 109 seconds we study in each workunit. Those sessions create the 'VLAR' tasks when we get to crunch the recordings.

Other observing teams are more interested in fast surveys of large parts of the sky. They use the observatory's radio antenna in what is known as a 'basketweave' mode, with the telescope nodding from side to side while the earth turns under the sky. That leads to the high angle range tasks - we know them as 'shorties', because it's not worth doing such intense analysis when potential signal sources remain in the field of view for such a short time.

And in between, there are observations - or even recordings taken during telescope maintenance - where the antenna is not being actively steered at all, but simply receiving whatever happens to be coming from the sky patch directly overhead as the earth turns. That gives us the normal, mid-AR tasks which form our staple diet.
ID: 1290366 · Report as offensive
Sp@ceNv@der Project Donor
Avatar

Send message
Joined: 10 Jul 05
Posts: 41
Credit: 117,366,167
RAC: 152
Belgium
Message 1290376 - Posted: 2 Oct 2012, 11:37:01 UTC - in response to Message 1290366.  


Different groups of radio astronomers are allocated observing time on the telescope, according to an observatory schedule which can be searched online if you're really interested. Each separate group of observers has control of the telescope during their assigned time slot, and control its movement and observing patterns.


Thanks for the information Richard. I'd love to learn some more, so if you can give one or more useful links, be my guest. I'll see what I can find on my own using this information already,

Kind regards ;)

To boldly crunch ...
ID: 1290376 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1290379 - Posted: 2 Oct 2012, 11:51:12 UTC - in response to Message 1290376.  


Different groups of radio astronomers are allocated observing time on the telescope, according to an observatory schedule which can be searched online if you're really interested. Each separate group of observers has control of the telescope during their assigned time slot, and control its movement and observing patterns.

Thanks for the information Richard. I'd love to learn some more, so if you can give one or more useful links, be my guest. I'll see what I can find on my own using this information already,

Kind regards ;)

A good place to rummage is the Arecibo Observatory Telescope Schedule.
ID: 1290379 · Report as offensive
Sp@ceNv@der Project Donor
Avatar

Send message
Joined: 10 Jul 05
Posts: 41
Credit: 117,366,167
RAC: 152
Belgium
Message 1290388 - Posted: 2 Oct 2012, 12:30:38 UTC - in response to Message 1290379.  


A good place to rummage is the Arecibo Observatory Telescope Schedule.


Thanks again. I had already come across that link, but the information I did see on there is above my league to be honest ;). But I'll nose around further and see where it may lead me to.

Regarding the servers of the SETI project: are we to assume then the system (software of the SETI project servers) "lacks" somekind of security mechanisme preventing these problems? I mean, the more VHARs they send out, the more traffic they generate, cauze "us crunchers" chew right through them at very high speeds, hence asking lots of new units in return, thus chocking up traffic eventually for everybody. It seems the system is unable to maintain somekind of balance between the 3 main types of units being sent out to the crunchers: that of course would mean it would have be able to select data from different tapes, better yet, multiple tapes holding a different type of recording (as you've specified earlier) to maintain a balanced mixtures of units being sent out. If I understand it correctly, right now, the data from the different tapes that have been split, being sent out now, all are 'basketweave' mode recordings, leading inevitably to the current problems (shorties storm).

Please feel free to comment whether I see things right or wrong ;)

Kind regards
To boldly crunch ...
ID: 1290388 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1290446 - Posted: 2 Oct 2012, 22:30:28 UTC - in response to Message 1290388.  


Well, we're back up after the weekly outage, but the Scheduler appears to be struggling already.
Most requests result in "Couldn't connect to server" messages & the uploads are pretty hit or miss at the moment as well.
Maybe Bruno & Synergy have got more than they can handle?
Grant
Darwin NT
ID: 1290446 · Report as offensive
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 22 · Next

Message boards : Number crunching : Panic Mode On (77) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.