Message boards :
Number crunching :
Panic Mode On (95) Server Problems?
Message board moderation
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 22 · Next
Author | Message |
---|---|
Wiggo Send message Joined: 24 Jan 00 Posts: 36809 Credit: 261,360,520 RAC: 489 |
Oh well, my main rig's GPU's are now on their backup work and the 2nd rig's GPU's have about 8hrs work left before it may have to do the same. :-( Cheers. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13855 Credit: 208,696,464 RAC: 304 |
@Grant, I was referring to the fact that there was over 3 million results waiting to be deleted there was no backlog in the servers weren't running behind. I think the purging totals are higher moment because there are/were shorter work units going through. As I write I looked at the graphs at the time you posted, and there were no WUs at all waiting to be deleted- none. Hence my question. The only thing that came close to the 3 million you mentioned were the number of results (WUs) that were in progress at the time. And they are what they sound like- all the WUs people have downloaded, the system waiting for results to be returned. They aren't there waiting for deletion. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13855 Credit: 208,696,464 RAC: 304 |
Oh well, my main rig's GPU's are now on their backup work and the 2nd rig's GPU's have about 8hrs work left before it may have to do the same. :-( Got a couple of hours of GPU work left myself. Add to the present Scheduler woes & frozen server status, the forums are very slow to load at present. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13855 Credit: 208,696,464 RAC: 304 |
In the last 30min or so both of my machines have been able to report the work they had, took over a minute for the Scheduler to finally respond & accept them. However requests for work, even when the Scheduler does respond result in "Project has no tasks available" messages. Another half hour & 1 system will be out of GPU work, another half hour or so after that & the other system will be out of GPU work as well. Luckily the CPUs have quite a few VLARs lined up, so they should keep going for most of the night. Grant Darwin NT |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
Well there's two things I wanted to point out as interesting: 1) "MB results/WUs waiting for DB purge" did not drop after the maintenance. 2) A little after midnight (Berkeley time (UTC-8)), there is the fairly-predictable slight surge on the cricket graph that only lasts for a short period of time. In addition to that, I have noticed that the RTS buffer has dropped a bit. So that means there ARE, in fact, WUs being assigned, but something is very, very significantly hindering DB queries. Methinks that is why most people get scheduler time-outs, or long delays before a reply, and the successful replies result in "no tasks available," because of A) whatever is impeding the DB, also impedes the feeder's ability to run its query to get a new batch of WUs to assign, which leads to.. B) what little there was in the feeder (even if it was a full helping of 200 (? is that what it still is these days?)) when the successful scheduler contact went through, had already been assigned. Hopefully whatever query/operation is being done on the DB finishes up before long. I wonder if it's another massive ~16 TB backup operation? Last week, that operation didn't seem to hinder performance much, if at all, but it was going out over the network and limited by either the gigabit link itself, or the ability to write the data at the destination. Maybe this week, the same thing is being done, but on a local volume, so it can read/write much faster because I'm sure it is at least (in one way or another) a 5-disk stripe. Oh well, I guess we'll have to see how this plays out. btw.. if you couldn't tell by now, I like trying to sort-out logic puzzles with minimal information. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
I have a feeling they are doing a database merge or manipulation of data on the master AB db causing the slowdown. As long as they still have the original WU results somewhere they will be able to rebuild everything from there once they sort the data out. My old computer grabbed 3 WUs 3 hours ago so yes some work is making it out. The buffer has dropped at least 5,000 plus the resends so there is a little trickle happening. But likely those are ending up in the bottom of someone's 10 day cache :( I imagine the 0.3-0.6 creation rate is just the resends that are being generated from timeouts. |
Gary Charpentier Send message Joined: 25 Dec 00 Posts: 31012 Credit: 53,134,872 RAC: 32 |
|
OTS Send message Joined: 6 Jan 08 Posts: 371 Credit: 20,533,537 RAC: 0 |
Cricket back to 120 Mbps - hope in Mudville - at least for MBs? |
KWSN THE Holy Hand Grenade! Send message Joined: 20 Dec 05 Posts: 3187 Credit: 57,163,290 RAC: 0 |
I'm not getting scheduler time-outs, but I'm also not getting any WU's assigned to any of my seven computers! (but only on production: Beta's fine...) (yes, one computer tried at least three times...) . Hello, from Albany, CA!... |
rob smith Send message Joined: 7 Mar 03 Posts: 22535 Credit: 416,307,556 RAC: 380 |
All the hundreds of tasks sitting around to report have reported, now just waiting for some nice shiny new tasks crunch.... (And as has already been reported the Crickets have come back to life) Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13855 Credit: 208,696,464 RAC: 304 |
Both my systems now have some GPU work. Most requests for work result in "Project has no tasks available" messages. Given the length of the outage & the lack of any AP work I think that will be the case for a fair while yet as the feeder struggles to meet the demand. Grant Darwin NT |
ivan Send message Joined: 5 Mar 01 Posts: 783 Credit: 348,560,338 RAC: 223 |
Both my systems now have some GPU work. Well, as of 2042 UTC, all my machines have their full quota of MB tasks, except for my Celeron J1900 which I switched from Linux to Windows 10 on Monday. It's not done enough GPU jobs yet to have its performance quantified (currently its GPU is running at 70 or 80% of a single CPU core, but the CPU is reportedly running at 2.4 GHz, for a 2 GHz part!). |
David S Send message Joined: 4 Oct 99 Posts: 18352 Credit: 27,761,924 RAC: 12 |
When I saw that the crickets jumped up to ~400 again, I was hoping AP was running. No such luck, I see. Oh well. David Sitting on my butt while others boldly go, Waiting for a message from a small furry creature from Alpha Centauri. |
kittyman Send message Joined: 9 Jul 00 Posts: 51478 Credit: 1,018,363,574 RAC: 1,004 |
When I saw that the crickets jumped up to ~400 again, I was hoping AP was running. No such luck, I see. Oh well. Oh, meow. When Eric does get things sorted and AP starts to validate again, the kitties are gonna have a fine day....LOL. "Time is simply the mechanism that keeps everything from happening all at once." |
JaundicedEye Send message Joined: 14 Mar 12 Posts: 5375 Credit: 30,870,693 RAC: 1 |
Oh, meow. Have you counted your kitties today? There was some loose talk at Rockie's Cafe yesterday concerning a 'catserole'........ :Dg "Sour Grapes make a bitter Whine." <(0)> |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
So....Why are VLARS being sent to my GPUs? I see them on All 3 of my machines. Beware, the VLARS are coming! |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Yup, got a whole bunch of those as well. Haven't tried processing any of them yet. Will have to wait and see what they do. Edit.. yup, they are acting just like they did on Beta... Going to take longer time to process each of those. Looks like the majority are coming from 28no12ab and 28no12ad there are a couple from some others. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13855 Credit: 208,696,464 RAC: 304 |
So....Why are VLARS being sent to my GPUs? Noticed that myself. Will be interesting to see how my GTX 750Tis handle them, as they do better at processing longer running WUs than they do the shorties. EDIT- I also noticed some interesting recurring spikes in the network traffic. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13855 Credit: 208,696,464 RAC: 304 |
.. Will be interesting to see how my GTX 750Tis handle them, as they do better at processing longer running WUs than they do the shorties. Ok, I got bored. Suspended al the other tasks to get the VLARs running. End result- longer run times than the estimates- 33min estimated, actual run times around 40 min. Also slows down processing of other GPU WUs if they are running while a VLAR is being done (I've got 2 * GTX 750Tis & run 2 WUs at a time.) However no effect on other programmes, no screen lag, display stuttering etc. Interestingly, GPU load is increased. On my Win7 system, GPU load is generally around mid to high 80%. With the VLARs it hits 95-99% often, but it is very bursty- without the VLARs with the lower GPU utilisation it does vary, but not as much, or by nearly as large a range as with the VLARs. Also the drop in GPU utilisation corresponds with large jumps in Memory Controller load. On my Vista system, GPU load is usually in the high 90s, so the slowdown caused by the VLARs is more pronounced. Also where as on this system the variation in GPU load is usually very sight, the VLARs cause similar drops on GPU utilisation as on the Win7 system although the over all GPU utilisation remains higher. These GPU utilisation drops also coincide with Memory Controller load increases. EDIT- I spoke too soon. 33min estimated, more like 60min actual run time. Usually with the longer running WUs a 30min estimated time will be done in 25min or less. Grant Darwin NT |
kittyman Send message Joined: 9 Jul 00 Posts: 51478 Credit: 1,018,363,574 RAC: 1,004 |
Oh, meow. LOL....Yes, all kitties are accounted for. I saw the pic and banter... "Time is simply the mechanism that keeps everything from happening all at once." |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.