The Server Issues / Outages Thread - Panic Mode On! (118)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 41 · 42 · 43 · 44 · 45 · 46 · 47 . . . 94 · Next

AuthorMessage
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2028638 - Posted: 20 Jan 2020, 13:26:20 UTC - in response to Message 2028636.  

It could be my "Seti Toaster"

To win a SETI Toaster you need to crunch at least 1 Billion credits. LOL


. . I don't think my current toaster will last that long :(

Stephen

:)
ID: 2028638 · Report as offensive
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2028642 - Posted: 20 Jan 2020, 13:49:00 UTC

Others hit their record racs in backup projects but I have no backups and my S@H rac has now climbed to its all time high!

After the extra long Tuesday outage last week I seem to have received consistently higher than normal credit from the tasks I have crunched. And also because the server is behind and trying to catch up, I have received credit for more than 24 hours of work per day.
ID: 2028642 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2028648 - Posted: 20 Jan 2020, 15:34:52 UTC

We are approaching the "magic" number of 23 MM of WU.

Hope all back to normal soon.
ID: 2028648 · Report as offensive
Profile Retvari Zoltan

Send message
Joined: 28 Apr 00
Posts: 35
Credit: 128,746,856
RAC: 230
Hungary
Message 2028652 - Posted: 20 Jan 2020, 16:46:26 UTC - in response to Message 2028630.  
Last modified: 20 Jan 2020, 16:53:32 UTC

So E@H doesn't generate as much "heat"?
GPU projects / apps (I am running) in the order of heat generation:
(i3-4160 3.6GHz, 2x4GB DDR3 1333MHz, RTX2080Ti PCIe3.0x16 RAM@13600MHz)
1.       GPUGrid / Acemd3 (cuda10)              GPU@1700MHz 331W
2.     SETI@home / GPU special app (cuda10.2)   GPU@1875MHz 325W
3. Einstein@home / O2MDF 2.07 GW-OpenCL-NVidia  GPU@1875MHz 295W
4. Einstein@home / FGRPB1G 1.20 OpenCL-NVidia   GPU@1875MHz 293W
The power consumption shown is the peak average power consumption while the task is running.
The long term heat output is the best for GPUGrid, as it's running for 1h41m without significant change in the power consumption.
The SETI@home special app has lower heat output in the long term, as it frequently drops to 95W during the workunit change (can be fixed by using mutex bulid).
Einstein@home also has lower heat output in the long term than 295W, as it drops to ~130W at 99% with ~250W spikes.
ID: 2028652 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2028655 - Posted: 20 Jan 2020, 17:14:47 UTC
Last modified: 20 Jan 2020, 17:16:06 UTC

all data distribution stats on the SSP have been stale for over an hour now. but looking at my systems, it seems like I'm actually getting a consistent stream of work now.

come on forum phantom, work your magic!
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2028655 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2028657 - Posted: 20 Jan 2020, 17:19:33 UTC - in response to Message 2028630.  

So E@H doesn't generate as much "heat"?

Einstein@home definitely uses less power and generates less heat, at least when compared to the Linux CUDA Special application. I can see in the nvidia-smi output that each card pulls less and the temps drop.

no idea on the power draw for SETI SoG but its probably less than the special app since it's much less optimized.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2028657 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13960
Credit: 208,696,464
RAC: 304
Australia
Message 2028661 - Posted: 20 Jan 2020, 18:11:53 UTC

Server status values still frozen.
Grant
Darwin NT
ID: 2028661 · Report as offensive
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2028663 - Posted: 20 Jan 2020, 18:22:47 UTC - in response to Message 2028655.  
Last modified: 20 Jan 2020, 18:29:39 UTC

all data distribution stats on the SSP have been stale for over an hour now
When the top of the page lists an 'as of' time that is hours old and those individual stats too have hours in their 'as of' column, do I have to add those hours together to get the true age of the data or does the page actually update without updating the time stamp at the top but still keeping the 'as of' column fresh?

And then there is the replica database lag. So when the data distribution stats say "as of 2h", Replica is 23.8 hours behind master and the time stamp at the top is half an hour in the past, I guess the numbers actually reflect the situation 2.3 hours ago yesterday!
ID: 2028663 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2028666 - Posted: 20 Jan 2020, 18:37:32 UTC - in response to Message 2028663.  
Last modified: 20 Jan 2020, 18:38:28 UTC

you can trust the "as of" times as being the value "as of" whatever time ago. so those were the values right now say 2 hours ago. you do not add them up, they are independent.

the replica database is a lag value already. telling you how far behind it is. that will usually remain up to day, until it stops updating, then the value of lag is itself lagged when it stops updating. you just wont know if the replica delay is getting better or worse since there is no update. right now, probably still getting worse.

the only value that is current/updating is the master db queries/s
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2028666 · Report as offensive
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2028668 - Posted: 20 Jan 2020, 19:24:59 UTC - in response to Message 2028666.  

you can trust the "as of" times as being the value "as of" whatever time ago. so those were the values right now say 2 hours ago. you do not add them up, they are independent.
2 hours ago when the page was updated. But if the timestamp at the top is one hour in the past when I load the page, then 'as of 2h' really means 3 hours ago.

Ssp seems to be updating normally now and replica is almost 27 hours behind! The numbers of wus and results have started shrinking quite a bit faster!
ID: 2028668 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1647
Credit: 12,921,799
RAC: 89
New Zealand
Message 2028669 - Posted: 20 Jan 2020, 19:55:44 UTC

With the results out in the field back to over 6 million, I have a feeling they may have reached the number they were looking for. When I went to bed last night it was somewhere around 5.5 million
ID: 2028669 · Report as offensive
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2028676 - Posted: 20 Jan 2020, 22:23:35 UTC - in response to Message 2028669.  

With the results out in the field back to over 6 million, I have a feeling they may have reached the number they were looking for.
The situation is not yet ok. The replica database is still falling behind about 24 minutes per hour. It's now one day, 4 hours and 40 minutes behind.
ID: 2028676 · Report as offensive
Boiler Paul

Send message
Joined: 4 May 00
Posts: 232
Credit: 4,965,771
RAC: 64
United States
Message 2028694 - Posted: 21 Jan 2020, 1:46:47 UTC

I checked the SSP and as of 21 Jan 2020, 1:40:04 UTC they took the replica off line.
ID: 2028694 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2028695 - Posted: 21 Jan 2020, 1:52:36 UTC - in response to Message 2028694.  

I checked the SSP and as of 21 Jan 2020, 1:40:04 UTC they took the replica off line.

See that, wasn't much point to it other than telling us how far it had fallen behind the main database. Maybe it can be synced up offline or something.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2028695 · Report as offensive
Boiler Paul

Send message
Joined: 4 May 00
Posts: 232
Credit: 4,965,771
RAC: 64
United States
Message 2028696 - Posted: 21 Jan 2020, 1:57:48 UTC

I guess that they took it off line for tomorrows outage FWIW
ID: 2028696 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2028700 - Posted: 21 Jan 2020, 2:15:43 UTC - in response to Message 2028696.  

I guess that they took it off line for tomorrows outage FWIW

I assume it will help the servers recover because I am sure it has an I/O impact on the main database. Anything that can help the servers clear the validations and purge/delete backlogs will be welcome.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2028700 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2028707 - Posted: 21 Jan 2020, 3:32:43 UTC
Last modified: 21 Jan 2020, 3:32:58 UTC

had a good run for about half the day. now back to no tasks available
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2028707 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2028711 - Posted: 21 Jan 2020, 4:01:24 UTC

Think the splitters have commandeered the I/O and the schedulers are getting short-changed in responding to work requests.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2028711 · Report as offensive
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2028713 - Posted: 21 Jan 2020, 4:18:58 UTC - in response to Message 2028711.  

Think the splitters have commandeered the I/O and the schedulers are getting short-changed in responding to work requests.
Or all the stuff that hit the replica before is now bombing the master and commandeering the I/O.
ID: 2028713 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2028716 - Posted: 21 Jan 2020, 4:54:23 UTC - in response to Message 2028713.  

We'll see if the splitter eventually throttles down after 1.2M or so. At least the replica is back in sync after being taken offline.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2028716 · Report as offensive
Previous · 1 . . . 41 · 42 · 43 · 44 · 45 · 46 · 47 . . . 94 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118)


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.