Panic Mode On (109) Server Problems?

Message boards : Number crunching : Panic Mode On (109) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 32 · 33 · 34 · 35 · 36 · Next

AuthorMessage
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 1913739 - Posted: 18 Jan 2018, 16:00:29 UTC - in response to Message 1913730.  

Caches are full and RAC is steadily dropping...


Indeed... the lack of work accelerated the fall to where it was going to end up anyways; ripping off the proverbial bandage.
ID: 1913739 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11358
Credit: 29,581,041
RAC: 66
United States
Message 1913743 - Posted: 18 Jan 2018, 16:30:25 UTC

Those who use Einstein as a backup project should take note and have a sufficient cache next Tuesday or they may very well run out of work.
Einstein has decided to join the fun by stating
We are going to shut down the project next Tuesday, Jan 23rd at around 10 AM CET for an upgrade of our database backend systems to make them ready for the years to come. We're going to upgrade hardware parts, operating systems as well the databases themselves, which is why we need to shut down the entire project, including the BOINC backend and this very website.

We should have the pleasure of a double outrage.
ID: 1913743 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1849
Credit: 268,616,081
RAC: 1,349
United States
Message 1913778 - Posted: 18 Jan 2018, 18:42:44 UTC - in response to Message 1913743.  

Those who use Einstein as a backup project should take note and have a sufficient cache next Tuesday or they may very well run out of work.
Einstein has decided to join the fun by stating
We are going to shut down the project next Tuesday, Jan 23rd at around 10 AM CET for an upgrade of our database backend systems to make them ready for the years to come. We're going to upgrade hardware parts, operating systems as well the databases themselves, which is why we need to shut down the entire project, including the BOINC backend and this very website.

We should have the pleasure of a double outrage.

Yeah, it is a shame about the scheduling. Any other day would have sufficed ...
ID: 1913778 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1913807 - Posted: 18 Jan 2018, 20:58:17 UTC - in response to Message 1913778.  

Can always choose another backup project. I'll have MilkyWay and GPUGrid.net as backups also. Though if I build a big enough cache of Einstein work, that shouldn't be an issue either.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1913807 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11358
Credit: 29,581,041
RAC: 66
United States
Message 1913813 - Posted: 18 Jan 2018, 21:21:18 UTC - in response to Message 1913778.  

Yeah, it is a shame about the scheduling. Any other day would have sufficed ...

But a double outrage is something to behold.
ID: 1913813 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1913833 - Posted: 18 Jan 2018, 23:01:11 UTC - in response to Message 1913807.  

Can always choose another backup project. I'll have MilkyWay and GPUGrid.net as backups also. Though if I build a big enough cache of Einstein work, that shouldn't be an issue either.
Last time I crossed swords with MilkyWay, it was appallingly badly managed. And GPUGrid is giving me RSI in the mouse-click finger, because I run it but have extreme difficulty snagging new work.
ID: 1913833 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1913844 - Posted: 19 Jan 2018, 0:05:46 UTC - in response to Message 1913833.  

Can always choose another backup project. I'll have MilkyWay and GPUGrid.net as backups also. Though if I build a big enough cache of Einstein work, that shouldn't be an issue either.
Last time I crossed swords with MilkyWay, it was appallingly badly managed. And GPUGrid is giving me RSI in the mouse-click finger, because I run it but have extreme difficulty snagging new work.

MilkyWay?? Badly managed?? Wow, very different experience here.. MW is the most set and forget project I have run. I never have to micromanage it at all. I love the hard limit of 80 tasks per gpu at any one time. Never a chance of getting too much work and never any chance of running out. I only crunch gpu tasks so that means I have run the Binary Pulsar Search while it lasted and now run the Gamma Ray Pulsar Search. The only issues I have seen with the project is the occasional bad work unit that promptly gets tossed out very quickly. The servers seem to stay up for very long times, months at a time in fact.

Yes, I have just recently joined GPUGrid.net and the gpu work availability is very spotty and random. The tasks when they are made available are quickly gobbled up by many fingers bashing the update button. The cpu work for Linux hosts has had work available pretty much all the time. No problem getting cpu work for the Linux host. We are asking the project scientists to make the cpu work available for Windows too.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1913844 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1913850 - Posted: 19 Jan 2018, 0:27:43 UTC - in response to Message 1913844.  

Well, I found it necessary to make Post 58550. Read through to Post 58572, and note his titles.
ID: 1913850 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1913852 - Posted: 19 Jan 2018, 0:40:27 UTC - in response to Message 1913850.  
Last modified: 19 Jan 2018, 0:46:25 UTC

Well, I found it necessary to make Post 58550. Read through to Post 58572, and note his titles.

Well, even Project Scientists and Developers are human. Witness our own Eric K. and the recent spate of typo errors. I do remember they (MW) having issues initially with the n-body mt application but they sorted it out evidently and I didn't follow any of the threads since as I stated I don't do MW cpu work. I haven't seen many posts about n-body issues other than host configuration questions.

And the mt documentation must be mostly stable and understood by now as the mt cpu app deployed at GPUGrid just this month by a student had a relatively easy startup. It was nice to find it obeyed the app_config core usage setting so it didn't hog all cores on my Ryzen 1800X. I am using 4 cores to process the cpu tasks leaving the other cores for Seti cpu tasks.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1913852 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1913859 - Posted: 19 Jan 2018, 1:09:44 UTC

MW is fine, but IIRC to do something productive there you need to have an AMD GPU, NV stuff has troubles to work with DP used by MW.

Does that changes?

GPUGrid with it's long time to crunch WU is not really a project to use as a backup. IMHO
ID: 1913859 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1913874 - Posted: 19 Jan 2018, 1:39:21 UTC - in response to Message 1913844.  

MW is the most set and forget project I have run. I never have to micromanage it at all.
I only keep a backup available on one of my crunch-only machines, just to make sure it maintains a little heat in the bedroom on chilly nights when SaH runs out of work. My first choice is Asteroids, but they're often out of work, too, so I added MilkyWay as a backup to the backup. The last time it ran on Windows was about 3 years ago. It worked fine. But that machine is now Linux, and when MilkyWay kicked in one night a couple months ago, it turned out to be a colossal waste of time. I don't remember how many tasks it ran, but when I checked the results the next day, I found that all but one of them had been marked Invalid. I think they all ran to completion without throwing any errors, but it was all just wasted electricity (except for the little bit of extra heat). I never did try to figure out what might have happened, just turned off MilkyWay and added Einstein for the next time that both SaH and Asteroids ran out.
ID: 1913874 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1913877 - Posted: 19 Jan 2018, 1:45:56 UTC - in response to Message 1913859.  
Last modified: 19 Jan 2018, 1:52:26 UTC

MW is fine, but IIRC to do something productive there you need to have an AMD GPU, NV stuff has troubles to work with DP used by MW.

Does that changes?

GPUGrid with it's long time to crunch WU is not really a project to use as a backup. IMHO

Milkyway used Double Precision calculations.
Most GeForce GPUs are allowed DP performance of 1/32 SP
Radeon GPUs are allowed DP performance of 1/16 SP

So if both GPUs were 6000 GFLOPS in Single Precision the GeForce would be 188 GFLOPS DP and the Radeon 375 GFLOPS

If you move to the workstation GPUs they will have DP performance of up to 1/2 SP. Which is likely why they have 4 digit price tags.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1913877 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1913883 - Posted: 19 Jan 2018, 1:59:59 UTC - in response to Message 1913859.  

MW is fine, but IIRC to do something productive there you need to have an AMD GPU, NV stuff has troubles to work with DP used by MW.

Does that changes?

GPUGrid with it's long time to crunch WU is not really a project to use as a backup. IMHO

No, MW has no issue with Nvidia as long as the card can do double-precision. Probably any card greater than Kepler or avoiding the lowest denomination of any family. Nvidia doesn't have the same degree of performance in double-precision as ATI/AMD but they still work fine. I do a Gamma Ray Binary Pulsar task in 190 seconds and get awarded 227 credits for it. The credit is static for all task types.

The longest running GPUGrid gpu task so far I've run was 8 hours and was awarded 387,150 credits. The shortest task was 3 hours. The longest cpu task was 1 hour and the shortest 20 minutes.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1913883 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1913914 - Posted: 19 Jan 2018, 4:03:02 UTC

In progress back up around 5 million, Received-last-hour back over 120k.
WU-awaiting-deletion climbing, splitter output dropped down to lower level again. Will clearing awaiting-deletion fire up the splitters again?
We'll just have to wait and see!
Grant
Darwin NT
ID: 1913914 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1913916 - Posted: 19 Jan 2018, 4:11:09 UTC - in response to Message 1913914.  

In progress back up around 5 million, Received-last-hour back over 120k.
WU-awaiting-deletion climbing, splitter output dropped down to lower level again. Will clearing awaiting-deletion fire up the splitters again?
We'll just have to wait and see!

NEWS at 10!
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1913916 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1913943 - Posted: 19 Jan 2018, 6:17:52 UTC

Around 4min 55sec on my GTX 1070s for the current BLC_02s, now we have some BLC_02s that aren't VLARs. About 50sec quicker to crunch, but they do cause some noticeable system/display lag.
Grant
Darwin NT
ID: 1913943 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1913965 - Posted: 19 Jan 2018, 9:07:18 UTC

And there we go.
Awaiting-deletion backlog cleared, splitters crank out the WUs.
Grant
Darwin NT
ID: 1913965 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1913967 - Posted: 19 Jan 2018, 9:12:11 UTC - in response to Message 1913965.  
Last modified: 19 Jan 2018, 9:12:26 UTC

I'd say that is pretty convincing evidence that the two are directly linked. If you overlay the graphs, they are coincident.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1913967 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1913974 - Posted: 19 Jan 2018, 9:46:21 UTC - in response to Message 1913967.  
Last modified: 19 Jan 2018, 9:46:49 UTC

I'd say that is pretty convincing evidence that the two are directly linked. If you overlay the graphs, they are coincident.

Correlation isn't causation, but yeah, when returned-per-hour hits it's present highs, and work-in-progress gets right up there, the deleters & splitters certainly aren't able to both run at 100% at the same time; the splitters crank out the work, the deleter backlog grows. It gets to a certain point & the splitters slow down and stay there till the delete backlog clears. And it's continued to occur after the weekly outage.
It's choking on it's own I/O.
Grant
Darwin NT
ID: 1913974 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1914035 - Posted: 19 Jan 2018, 17:50:14 UTC - in response to Message 1913974.  

I'd say the administrators need to shorten up the cron job interval on the deleters purge task so that we could maintain a higher average RTS buffer quantity. Or if the purge is threshold based, to lower it.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1914035 · Report as offensive
Previous · 1 . . . 32 · 33 · 34 · 35 · 36 · Next

Message boards : Number crunching : Panic Mode On (109) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.