Message boards :
Number crunching :
Panic Mode On (77) Server Problems?
Message board moderation
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 22 · Next
Author | Message |
---|---|
Wiggo Send message Joined: 24 Jan 00 Posts: 34744 Credit: 261,360,520 RAC: 489 |
Wiggo, Well I have plenty of GPU work on hand but obviously my Q6600 didn't have 8 days worth of CPU work on hand as my cache is set to. Each of my PC's is set to a different venue so I just edit those preferences as to what work is required at the time, both my other rigs have now been set back to accepting GPU again, and when the Q6600 stops requesting CPU I'll set it back to accepting GPU work. Cheers. |
ivan Send message Joined: 5 Mar 01 Posts: 783 Credit: 348,560,338 RAC: 223 |
[quote]Each of my PC's is set to a different venue so I just edit those preferences as to what work is required at the time, both my other rigs have now been set back to accepting GPU again, and when the Q6600 stops requesting CPU I'll set it back to accepting GPU work. I've recently realised that a useful work-around, given that we have three locales to play with (work, home, school) would be to set one to accept both CPU & GPU, one GPU only and one CPU only -- and then switch machines as needs dictate to the locale providing the desired downloads. |
bill Send message Joined: 16 Jun 99 Posts: 861 Credit: 29,352,955 RAC: 0 |
scheduling server synergy scheduler process synergy ap_splitter1 synergy ap_splitter4 synergy ap_splitter5 synergy ap_splitter6 synergy Since the upload/download transfers seem to have improved with the cessation of the ap splitting, I wonder if there might be a connection to the above functions all being on synergy? |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Whether do I still experience it that in Berkeley the homework is made and there is a well-arranged binding to the network? What Boinc version are you running? and what cache settings are you running? Claggy |
.clair. Send message Joined: 4 Nov 04 Posts: 1300 Credit: 55,390,408 RAC: 69 |
Not seen this before, 37 tasks undownloadable with :- 01/10/2012 10:37:08 | | [error] Can't create HTTP response output file projects/setiathome.berkeley.edu/14se10ab.13157.275891.16.10.211 Ran out of work coz this lot got stuck, or something Me thinks just abort them and move on. Edit - we get a lot of this over on Cosmology@home Stderr output <core_client_version>7.0.28</core_client_version> <![CDATA[ <message> WU download error: couldn't get input files: <file_xfer_error> <file_name>21ap11ah.12802.10039.7.10.248</file_name> <error_code>-197</error_code> <error_message>user requested transfer abort</error_message> </file_xfer_error> </message> ]]> |
S@NL Etienne Dokkum Send message Joined: 11 Jun 99 Posts: 212 Credit: 43,822,095 RAC: 0 |
well, there seems to be work handed out again... got 300+ GPU shorties this morning which are - of course - all running high priority now. now got 100 or so of them stuck downloading. So,new approache : "No new work". Maybe tomorrow's outage will sort stuff out. |
arkayn Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 |
well, there seems to be work handed out again... got 300+ GPU shorties this morning which are - of course - all running high priority now. I currently have 939 CUDA tasks cached and another 622 downloading. |
musicplayer Send message Joined: 17 May 10 Posts: 2430 Credit: 926,046 RAC: 0 |
May I ask the following question? I guess we went through a "shorties" storm once again. In these tasks, the gaussian search was not carried out. Also the same of course goes for the .vlar's. But then, if some of the numbers (including pulses and possible triplets) from these tasks showed up better, are be back to the task of finding the better results once again by means of carrying out the additional gaussian search on these tasks? I assume this is the only way this can be carried out. Or is it something else that can be tried out as well for the selected tasks? |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
Not seen this before, 37 tasks undownloadable with :- That's a file system error, the file couldn't be opened for writing some data which had been received. The "Can't create" is poor wording, the same message is shown if an existing partial file can't be opened to append new data. I've never seen that, and don't remember any previous posts mentioning it. Seems like the kind of thing which might happen if a virus scanner is allowed to check the BOINC directory hierarchy, though. Joe |
.clair. Send message Joined: 4 Nov 04 Posts: 1300 Credit: 55,390,408 RAC: 69 |
Thank`s for that Joe, I dumped them and moved on. Edit - [short version] dont bother with an antivirus on that crunch box. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13720 Credit: 208,696,464 RAC: 304 |
Still struggling to build up a cache of CPU work- overnight many of the requests for work resulted in "Project has no tasks available" messages. Over the last few days i would get that message on probably 1 in 5 requests. For the last 8 hours or so it's more like 4 in 5 requests resulting in "Project has no tasks available messages". Looks like something else has now gotten tangled up. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13720 Credit: 208,696,464 RAC: 304 |
Still no joy getting work- most requests result in "Project has no tasks available" or "No tasks sent" messages. Even if i only got 6-8 WUs with each request my caches would be full by now, even with all the shorties. But with the vast majority of requests resulting in no work i'm as close as i ever was to running out of CPU work again. Grant Darwin NT |
rob smith Send message Joined: 7 Mar 03 Posts: 22158 Credit: 416,307,556 RAC: 380 |
watching the performance of the servers - soon after "something" is done the download/upload/report performance is acceptable. Gradually the delivery rate slows down, and the re-try rate increases, until "not a lot" is happening apart from retries, back-offs and more retries. Then something is done to the servers, and the whole cycle starts again. This suggests to me a memory bleed of some sort in one of the server processes.... Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14649 Credit: 200,643,578 RAC: 874 |
watching the performance of the servers - soon after "something" is done the download/upload/report performance is acceptable. Gradually the delivery rate slows down, and the re-try rate increases, until "not a lot" is happening apart from retries, back-offs and more retries. Then something is done to the servers, and the whole cycle starts again. This suggests to me a memory bleed of some sort in one of the server processes.... One cause of that is database table fragmentation, which is one of the reasons for Tuesday maintenance and the improvements afterwards. |
Sp@ceNv@der Send message Joined: 10 Jul 05 Posts: 41 Credit: 117,366,167 RAC: 152 |
What is the explanation behind a "shorties storm" ? They don't seem to originate from the same tapes? Yet almost anything being sent out are VHAR units. Is this a server problem? I'm curious to read some info on this ;) Works flows slowly at best right now, better than nothing at all of course. Kind regards. To boldly crunch ... |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14649 Credit: 200,643,578 RAC: 874 |
What is the explanation behind a "shorties storm" ? They don't seem to originate from the same tapes? Yet almost anything being sent out are VHAR units. Is this a server problem? I'm curious to read some info on this ;) Works flows slowly at best right now, better than nothing at all of course. In simple terms: SETI gets its data for free, by taking its own copy of the data being recorded during the course of astronmonical observations at the Arecibo radio telescope. Different groups of radio astronomers are allocated observing time on the telescope, according to an observatory schedule which can be searched online if you're really interested. Each separate group of observers has control of the telescope during their assigned time slot, and control its movement and observing patterns. Some astronomers are interested in long, steady, deep-space observations of, near enough, point sources. The focal point of the telescope remains steady in relation to the sky - the recordings have a low 'angle range' between the beginning and end of the 109 seconds we study in each workunit. Those sessions create the 'VLAR' tasks when we get to crunch the recordings. Other observing teams are more interested in fast surveys of large parts of the sky. They use the observatory's radio antenna in what is known as a 'basketweave' mode, with the telescope nodding from side to side while the earth turns under the sky. That leads to the high angle range tasks - we know them as 'shorties', because it's not worth doing such intense analysis when potential signal sources remain in the field of view for such a short time. And in between, there are observations - or even recordings taken during telescope maintenance - where the antenna is not being actively steered at all, but simply receiving whatever happens to be coming from the sky patch directly overhead as the earth turns. That gives us the normal, mid-AR tasks which form our staple diet. |
Sp@ceNv@der Send message Joined: 10 Jul 05 Posts: 41 Credit: 117,366,167 RAC: 152 |
Thanks for the information Richard. I'd love to learn some more, so if you can give one or more useful links, be my guest. I'll see what I can find on my own using this information already, Kind regards ;) To boldly crunch ... |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14649 Credit: 200,643,578 RAC: 874 |
A good place to rummage is the Arecibo Observatory Telescope Schedule. |
Sp@ceNv@der Send message Joined: 10 Jul 05 Posts: 41 Credit: 117,366,167 RAC: 152 |
Thanks again. I had already come across that link, but the information I did see on there is above my league to be honest ;). But I'll nose around further and see where it may lead me to. Regarding the servers of the SETI project: are we to assume then the system (software of the SETI project servers) "lacks" somekind of security mechanisme preventing these problems? I mean, the more VHARs they send out, the more traffic they generate, cauze "us crunchers" chew right through them at very high speeds, hence asking lots of new units in return, thus chocking up traffic eventually for everybody. It seems the system is unable to maintain somekind of balance between the 3 main types of units being sent out to the crunchers: that of course would mean it would have be able to select data from different tapes, better yet, multiple tapes holding a different type of recording (as you've specified earlier) to maintain a balanced mixtures of units being sent out. If I understand it correctly, right now, the data from the different tapes that have been split, being sent out now, all are 'basketweave' mode recordings, leading inevitably to the current problems (shorties storm). Please feel free to comment whether I see things right or wrong ;) Kind regards To boldly crunch ... |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13720 Credit: 208,696,464 RAC: 304 |
Well, we're back up after the weekly outage, but the Scheduler appears to be struggling already. Most requests result in "Couldn't connect to server" messages & the uploads are pretty hit or miss at the moment as well. Maybe Bruno & Synergy have got more than they can handle? Grant Darwin NT |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.