Low available work.

Message boards : News : Low available work.
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
KWSN - Sir Nutsalot

Send message
Joined: 4 Jun 99
Posts: 5
Credit: 22,114,565
RAC: 47
United Kingdom
Message 2029158 - Posted: 25 Jan 2020, 10:28:00 UTC - in response to Message 2029038.  

I have Machines still getting work and others now getting nothing at all for example :-

AMD 3600 with Amd rx580 empty no units coming in for CPU and GPU.
AMD 2600 with Nvidia1660ti empty no units coming in for CPU and GPU.
Intel 2600k with Amd rx580 still occasionally getting units and cruching plenty of units
Intel 6700k with Amd rx590 still occasionally getting units and cruching but running out.
Intel 8750h with Nvidia 1060 still occasionally getting units and cruching but nearly running out.

Clearly still issues and for many months never had a problem.
ID: 2029158 · Report as offensive
Profile Kissagogo27 Special Project $75 donor
Avatar

Send message
Joined: 6 Nov 99
Posts: 716
Credit: 8,032,827
RAC: 62
France
Message 2029162 - Posted: 25 Jan 2020, 10:48:50 UTC

does all the computed WU were reported ?
ID: 2029162 · Report as offensive
grote

Send message
Joined: 7 May 01
Posts: 2
Credit: 2,056,727
RAC: 0
United States
Message 2029166 - Posted: 25 Jan 2020, 11:24:30 UTC - in response to Message 2029036.  

Is this issue still present? I noticed most of the servers show as running but I still have zero tasks I keep getting Communication Deferred.


Still waiting for the tasks to begin!!
ID: 2029166 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9958
Credit: 103,452,613
RAC: 328
United Kingdom
Message 2029209 - Posted: 25 Jan 2020, 18:49:54 UTC - in response to Message 2029166.  

Is this issue still present? I noticed most of the servers show as running but I still have zero tasks I keep getting Communication Deferred.


Still waiting for the tasks to begin!!


Your machine has not contacted the servers since the 16th of Jan
ID: 2029209 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3807
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2029241 - Posted: 25 Jan 2020, 21:48:21 UTC
Last modified: 25 Jan 2020, 21:49:53 UTC

(cc'ed from NC)

I contacted Dr. Korpela and he inidcated that there still is some throttling going on to keep the total results less than 20M (lest we have the same issue where the results table exceeds memory) which is probably why the BLC splitters were disabled earlier. No doubt the "shorty storm" from blc35_2bit_guppi_58691_* is causing this. In the interim things seem to be improving and I'm getting just enough work to keep my machines busy, so it should be over soon.
ID: 2029241 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13855
Credit: 208,696,464
RAC: 304
Australia
Message 2029275 - Posted: 25 Jan 2020, 22:44:03 UTC - in response to Message 2029241.  

(cc'ed from NC)

I contacted Dr. Korpela and he inidcated that there still is some throttling going on to keep the total results less than 20M (lest we have the same issue where the results table exceeds memory) which is probably why the BLC splitters were disabled earlier. No doubt the "shorty storm" from blc35_2bit_guppi_58691_* is causing this. In the interim things seem to be improving and I'm getting just enough work to keep my machines busy, so it should be over soon.
I thought the point of the extended outage database reorganisation was to rejig the database to it could deal with the increased load & process the work & keep those backlogs from occurring?
Grant
Darwin NT
ID: 2029275 · Report as offensive
KWSN - Sir Nutsalot

Send message
Joined: 4 Jun 99
Posts: 5
Credit: 22,114,565
RAC: 47
United Kingdom
Message 2029276 - Posted: 25 Jan 2020, 22:49:08 UTC - in response to Message 2029241.  

All my machines are now full to the apparent 150 GPU and CPU unit limit :)

Therefore i am now back to usual.

Apart from the real scheduled outages uploading units was never an issue to my machines.
ID: 2029276 · Report as offensive
Profile John Ellis

Send message
Joined: 22 Jun 05
Posts: 4
Credit: 23,932,244
RAC: 4
United States
Message 2029426 - Posted: 26 Jan 2020, 19:10:17 UTC
Last modified: 26 Jan 2020, 19:11:30 UTC

Greetings,

Seems this issue has lasted longer than originally expected to resolve. Is there any "official" estimate for returning to normal operations?

Only one of my two SETI rigs is processing, the one idling has been in that state for hours now.

Thanks,
John
ID: 2029426 · Report as offensive
Miklos M.

Send message
Joined: 5 May 99
Posts: 955
Credit: 136,115,648
RAC: 73
Hungary
Message 2029575 - Posted: 27 Jan 2020, 22:33:48 UTC

I am eager for the day when work will be plentiful again. Only two of the four rigs get work and it is rarely.
ID: 2029575 · Report as offensive
Mike Ryan

Send message
Joined: 24 Jun 99
Posts: 46
Credit: 24,363,752
RAC: 47
United States
Message 2029640 - Posted: 28 Jan 2020, 9:57:01 UTC

Not sure where to post this (it's really getting hard to find a specific tree in the forest) but I'll try here because it seems to be a "dishing out the work" issue... perhaps related to allowing more work (yeah!) to be sent to each device.

I had three work units time out as "no response" but the time given to compute (which is typically a month or so) was less than 7 minutes (6:55 to be exact). Never seen that happen before. With a current queue on this machine of 300 units in progress (50/50 split between CPU and Nvidia GPU) it does seem a bit odd to request three of the units to be finished within 7 minutes when all the other running tasks before they were downloaded would run first. Here's the work units in question:
Work unit / Sent / Time Reported or deadline / Status
3857435802 27 Jan 2020, 22:52:34 UTC 27 Jan 2020, 22:59:29 UTC Time out - no response
3857435646 27 Jan 2020, 22:52:34 UTC 27 Jan 2020, 22:59:29 UTC Time out - no response
3857435722 27 Jan 2020, 22:52:34 UTC 27 Jan 2020, 22:59:29 UTC Time out - no response

Not a huge deal, but seems like if this was widespread there would be a LOT of unnecessary duplicate work units sent out for processing.
ID: 2029640 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 36873
Credit: 261,360,520
RAC: 489
Australia
Message 2029641 - Posted: 28 Jan 2020, 10:05:47 UTC - in response to Message 2029640.  

Not sure where to post this (it's really getting hard to find a specific tree in the forest) but I'll try here because it seems to be a "dishing out the work" issue... perhaps related to allowing more work (yeah!) to be sent to each device.

I had three work units time out as "no response" but the time given to compute (which is typically a month or so) was less than 7 minutes (6:55 to be exact). Never seen that happen before. With a current queue on this machine of 300 units in progress (50/50 split between CPU and Nvidia GPU) it does seem a bit odd to request three of the units to be finished within 7 minutes when all the other running tasks before they were downloaded would run first. Here's the work units in question:
Work unit / Sent / Time Reported or deadline / Status
3857435802 27 Jan 2020, 22:52:34 UTC 27 Jan 2020, 22:59:29 UTC Time out - no response
3857435646 27 Jan 2020, 22:52:34 UTC 27 Jan 2020, 22:59:29 UTC Time out - no response
3857435722 27 Jan 2020, 22:52:34 UTC 27 Jan 2020, 22:59:29 UTC Time out - no response

Not a huge deal, but seems like if this was widespread there would be a LOT of unnecessary duplicate work units sent out for processing.
They were ghosts that never got delivered to you and as they're in red they don't count against you. ;-)

Cheers.
ID: 2029641 · Report as offensive
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2029657 - Posted: 28 Jan 2020, 13:38:01 UTC - in response to Message 2029641.  

They were ghosts that never got delivered to you and as they're in red they don't count against you. ;-)
Instantly timed out ghosts do get counted against you. I've had a lot of them because every time I have tried to redownload ghosts using the ghost recovery protocol, the server has decided to instantly expire them instead of letting me download them. When a big bunch of those happens, the host and app involved gets its max tasks per day heavily penalized. When it drops lower than the number of tasks already downloaded that day, the server stops giving the host any more work for that app :(
ID: 2029657 · Report as offensive
Profile Chris

Send message
Joined: 3 Apr 06
Posts: 1
Credit: 2,824,282
RAC: 3
United States
Message 2030248 - Posted: 1 Feb 2020, 7:36:28 UTC - in response to Message 2029657.  

My Raspberry Pi cluster has not been getting work. Two machines are idle and the others are just finishing up what they have.
ID: 2030248 · Report as offensive
Nostra
Avatar

Send message
Joined: 18 Feb 01
Posts: 2
Credit: 1,612,863
RAC: 7
United Kingdom
Message 2030253 - Posted: 1 Feb 2020, 8:13:05 UTC - in response to Message 2030248.  

Same here. No work 😭
ID: 2030253 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3807
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2030257 - Posted: 1 Feb 2020, 8:47:29 UTC
Last modified: 1 Feb 2020, 8:56:53 UTC

This thread (and its future descendants) in the Number Crunching forum is a good go-to whenever there are problems like this, as it's regularly updated with findings of issue causes. Also Computing > Server Status is helpful to see what project machines are disabled or down, and the work stats.

In this case, work generation has been turned off as there are too many results in the field because of of the millions of "shorties" or "noise bombs" from the most result data files... work units that take only a few seconds to be determined as noise and completed. For example "Results returned and awaiting validation" is currently over 13M.
ID: 2030257 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11416
Credit: 29,581,041
RAC: 66
United States
Message 2030269 - Posted: 1 Feb 2020, 11:14:39 UTC
Last modified: 1 Feb 2020, 11:15:24 UTC

Almost out of CPU work and the GPU is back to Einstein.
ID: 2030269 · Report as offensive
Profile bigSPAM Project Donor

Send message
Joined: 12 May 00
Posts: 4
Credit: 62,452,292
RAC: 292
United States
Message 2030287 - Posted: 1 Feb 2020, 13:09:19 UTC
Last modified: 1 Feb 2020, 13:49:42 UTC

My most productive machine (and likely most power hungry) has about two hundred WUs in transfers, 'Download pending' and no work for the last 18 hours.
Are there likely to be some sent or should I just shut this one down until Wednesday?
Thanks!
Just looked at my #2 machine, and it too has a few hundred in 'Download pending" state and running only GPU, WUs

Edit
Noticed #1 PC did download but mostly GUPPI WUs
Those all ran in a couple minutes with half my CPU working the sole 8 remaining AstroPulse and 'normal' WUs.
Also got and a bunch of GPUs to crunch.
The GUPPI units have issues I gather?
ID: 2030287 · Report as offensive
Gary Easton

Send message
Joined: 14 Nov 00
Posts: 9
Credit: 12,118,453
RAC: 65
United Kingdom
Message 2030302 - Posted: 1 Feb 2020, 14:27:30 UTC

No Work coming in again, this just makes me wanna go Hmmmmm!
ID: 2030302 · Report as offensive
Profile bigSPAM Project Donor

Send message
Joined: 12 May 00
Posts: 4
Credit: 62,452,292
RAC: 292
United States
Message 2030316 - Posted: 1 Feb 2020, 15:24:04 UTC - in response to Message 2030287.  

My most productive machine (and likely most power hungry) has about two hundred WUs in transfers, 'Download pending' and no work for the last 18 hours.
Are there likely to be some sent or should I just shut this one down until Wednesday?
Thanks!
Just looked at my #2 machine, and it too has a few hundred in 'Download pending" state and running only GPU, WUs

Edit
Noticed #1 PC did download but mostly GUPPI WUs
Those all ran in a couple minutes with half my CPU working the sole 8 remaining AstroPulse and 'normal' WUs.
Also got and a bunch of GPUs to crunch.
The GUPPI units have issues I gather?


Blew through all the downloads except the remaining two Astropulses on #1
#2 is in the same boat.
ID: 2030316 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3807
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2030320 - Posted: 1 Feb 2020, 15:31:05 UTC - in response to Message 2030310.  
Last modified: 1 Feb 2020, 16:18:24 UTC

I've only been a member since the 10th of January, so I have to ask, are these maintenance shutdowns and work slow downs a regular weekly occurrence? It seems just as I settle into a routine, everything gets screwed up.


The maintenance shutdowns prior to this problem lasted only a few hours; the last two have lasted a day or more. It takes close to the same amount of time for things to stabilize after a shutdown as it lasted as the longer it's down, the more short of work the participants' computers get.

The recent issues which have been for two weeks of long outages and no work are an anomaly. They seem to be caused by a backlog of the servers not processing returned work so it accumulates. The hope is that eventually the SETI@Home team is going to run a script or equivalent to clear the logjam and let things get back to normal.

Edit: To add some details to this, this is from the last look at Computing > Server status:

Results out in the field = 3,932,277
Results returned and awaiting validation = 12,302,507
Results waiting for db purging = 5,075,815


The total of these three is 21,310,599, which is too large. Dr. Korpela has noted that the functional limit of the active result table is 20M. After this it may no longer fit into memory (which is maxed out) on the machine it's running on. Probably at this point virtual memory will start paging it to disk which is extremely slow, hundreds or thousands of times slower than memory's bandwidth, so everything grinds to a halt. Thus the splitters are disabled from generating more work until this total falls lower than 20M.

However this is a band-aid. It is not falling nearly as fast nor as low as it should, thus the need to run a process to get the validation and purging done (the latter should be very close to zero.) That process/script may not even exist yet requiring someone (inevitably already-overloaded project director Dr. Korpela) to develop it.

Hope this explains things a little. :^)
ID: 2030320 · Report as offensive
Previous · 1 · 2 · 3 · Next

Message boards : News : Low available work.


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.