How to Fix the current Issues - One man's opinion

Message boards : Number crunching : How to Fix the current Issues - One man's opinion
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19047
Credit: 40,757,560
RAC: 67
United Kingdom
Message 2029638 - Posted: 28 Jan 2020, 9:23:33 UTC - in response to Message 2029632.  

The deadline for AP is 25 days and I haven't seen any problems with that and as they take longer to crunch than MB, I would think a max for MB at 21 days would be reasonable.
ID: 2029638 · Report as offensive
Profile Siran d'Vel'nahr
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 7379
Credit: 44,181,323
RAC: 238
United States
Message 2029649 - Posted: 28 Jan 2020, 11:27:59 UTC

Greetings,

What you guys may or may not know or understand is that there is already a 2nd project here at SETI. It's call Beta. That project resides on the same servers as SETI Prime does. What you are suggesting is for a 3rd project to be installed. I don't see the point. Just one man's opinion on this topic. ;)

Have a great day! :)

Siran
CAPT Siran d'Vel'nahr - L L & P _\\//
Winders 11 OS? "What a piece of junk!" - L. Skywalker
"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 2029649 · Report as offensive
Profile Retvari Zoltan

Send message
Joined: 28 Apr 00
Posts: 35
Credit: 128,746,856
RAC: 230
Hungary
Message 2029775 - Posted: 29 Jan 2020, 17:00:11 UTC - in response to Message 2029649.  

Greetings,

What you guys may or may not know or understand is that there is already a 2nd project here at SETI. It's call Beta. That project resides on the same servers as SETI Prime does. What you are suggesting is for a 3rd project to be installed. I don't see the point. Just one man's opinion on this topic. ;)

Have a great day! :)

Siran
Perhaps the Beta should be created to be able to handle a few task size doubling now, and several more in the future.
ID: 2029775 · Report as offensive
Profile Retvari Zoltan

Send message
Joined: 28 Apr 00
Posts: 35
Credit: 128,746,856
RAC: 230
Hungary
Message 2029776 - Posted: 29 Jan 2020, 17:09:27 UTC - in response to Message 2029627.  

This amount is exponentially decaying as we go back in time, but the volunteers of this project can provide the computing power to convert (even re-calculate) that amount of data (as the computing power is exponentially growing), but I'm not sure if it should be converted at all. The architecture of the science database can be changed without changing the meaning the data in it, so this project can use a different architecture in the future.
Not as exponential as you might think, as the number of active users has diminished over that period. For several reasons, BOINC and credit screw to name but two.
That trend will follow the uptime/downtime ratio of this project (plus many other aspects).
The goal should be to reduce downtime (ideally to 0), as the frequent and extended downtime periods resulted in counterproductive user action.
ID: 2029776 · Report as offensive
Profile Siran d'Vel'nahr
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 7379
Credit: 44,181,323
RAC: 238
United States
Message 2029778 - Posted: 29 Jan 2020, 17:33:48 UTC - in response to Message 2029775.  
Last modified: 29 Jan 2020, 17:38:59 UTC

Greetings,

What you guys may or may not know or understand is that there is already a 2nd project here at SETI. It's call Beta. That project resides on the same servers as SETI Prime does. What you are suggesting is for a 3rd project to be installed. I don't see the point. Just one man's opinion on this topic. ;)

Have a great day! :)

Siran
Perhaps the Beta should be created to be able to handle a few task size doubling now, and several more in the future.

Hi Retvari,

Beta IS a project in and of itself and does not need to be created. It already exists on the same severs as SETI Prime does. It is there to test new apps and server software, hence the name Beta. I don't see messing with Beta when Prime needs more fixing. I am currently without any work on my main host and have been for quite some time now. My Pis and Linux PC have just over a days work on each and my laptop, just over 2 days

Beta is shut down right now, I assume, so that the SETI team can concentrate on fixing Prime.

Have a great day! :)

Siran

[edit]
My main just got some WUs. Woohoo! :)
[/edit]
CAPT Siran d'Vel'nahr - L L & P _\\//
Winders 11 OS? "What a piece of junk!" - L. Skywalker
"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 2029778 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22186
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2029779 - Posted: 29 Jan 2020, 17:34:26 UTC - in response to Message 2029775.  

What do you actually mean by "doubling task size"?
Do you mean just adding more data points to increase the file size from 700k to 1400k?
Do you mean doubling the resolution, so doubling the file size?
Do you mean putting two data sets into one file, so doubling the file size?
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2029779 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22186
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2029780 - Posted: 29 Jan 2020, 17:38:53 UTC - in response to Message 2029778.  

Where do you get that Beta is shut down just now?
There are tasks ready to send, the splitters are not disabled, the board is alive and kicking.
Remember Beta is not about processing "real" data, it is for testing something prior to release on main, and if there is nothing in the Beta test schedule just now then it just sits there idle until such time as there is something to test.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2029780 · Report as offensive
Profile Siran d'Vel'nahr
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 7379
Credit: 44,181,323
RAC: 238
United States
Message 2029781 - Posted: 29 Jan 2020, 17:44:15 UTC - in response to Message 2029780.  

Where do you get that Beta is shut down just now?
There are tasks ready to send, the splitters are not disabled, the board is alive and kicking.
Remember Beta is not about processing "real" data, it is for testing something prior to release on main, and if there is nothing in the Beta test schedule just now then it just sits there idle until such time as there is something to test.

Hi Rob,

I was going by something I read here in the forum several days ago. When I got the link to the server status page at Beta, I was just looking at the server names and nothing else. I suppose if I'd looked at the other stats I would not have made that statement. My bad. Sorry. :(

Have a great day! :)

Siran
CAPT Siran d'Vel'nahr - L L & P _\\//
Winders 11 OS? "What a piece of junk!" - L. Skywalker
"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 2029781 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2029782 - Posted: 29 Jan 2020, 18:05:29 UTC - in response to Message 2029780.  

Where do you get that Beta is shut down just now?
There are tasks ready to send, the splitters are not disabled, the board is alive and kicking.
Remember Beta is not about processing "real" data, it is for testing something prior to release on main, and if there is nothing in the Beta test schedule just now then it just sits there idle until such time as there is something to test.
The next thing to test will be the 715 server code fix, so BOINC can proceed with a full 'Server Stable v1.2.1' release for the benefit of other projects. I still got an 'internal server error' with anonymous platform when I tested this morning. Unfortunately, we only have one Eric.
ID: 2029782 · Report as offensive
Profile Retvari Zoltan

Send message
Joined: 28 Apr 00
Posts: 35
Credit: 128,746,856
RAC: 230
Hungary
Message 2029784 - Posted: 29 Jan 2020, 18:29:06 UTC - in response to Message 2029779.  

What do you actually mean by "doubling task size"?
Do you mean just adding more data points to increase the file size from 700k to 1400k?
Do you mean doubling the resolution, so doubling the file size?
Do you mean putting two data sets into one file, so doubling the file size?
I would go for the 1st option. A task which covers a longer period in time would also mean that less overlap (=less network traffic, less disk space) is necessary for data processing / transfer.
The ideal solution would be to send as much data to a host that the actual device (CPU/GPU) could process it in 1~2 hours. For example a very fast host would receive up to 256 times longer chunks of data to process. I can easily spot tasks, which were processed by my wingman over 400 times slower. In other words my host puts 400 times higher load on the servers than the other host does. This is not necessary. The ability to reduce the workload on the servers should be adopted in the way the data is split between hosts, as future GPUs will be even faster.
I'm aware that the storage limits of the given workunit for the found spikes / pulses / triplets / Gaussians should be increased as well.
The 2nd option is also viable, but the 3rd wouldn't change things much.
The point is to reduce the number of tasks out in the field, and the number of server-client transactions to make it easier for the servers to handle their job.
ID: 2029784 · Report as offensive
Profile Retvari Zoltan

Send message
Joined: 28 Apr 00
Posts: 35
Credit: 128,746,856
RAC: 230
Hungary
Message 2029785 - Posted: 29 Jan 2020, 18:38:49 UTC - in response to Message 2029778.  
Last modified: 29 Jan 2020, 18:39:31 UTC

Beta IS a project in and of itself and does not need to be created. It already exists on the same severs as SETI Prime does. It is there to test new apps and server software, hence the name Beta. I don't see messing with Beta when Prime needs more fixing.
We're discussing ideas this project needs to adopt to get fixed for good. That's what beta could be used for. Tinkering with the old stuff couldn't achieve that in the long term.
ID: 2029785 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22186
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2029786 - Posted: 29 Jan 2020, 18:55:05 UTC

There are a couple of things to consider with just simply doubling the number of data points delivered as a task.
Currently each successive pair of work units have an overlap of the data, this does have an impact on the analysis, which is not easy to predict.
Also, as has already been identified is that the maximum number of signals per task is set to 30, and this is set in the servers; to change this would mean a revision to the database structure.

As to the concept of determining task size by predicting the performance of the host performing so predicting the calculation time. This is very fraught with difficulties, as one would end up with multiple task sizes. Which would mean the splitters would have to split each task (or group of tasks) for each "type" of host, thus one would loose the diversity in processor that is inherent in the fixed size, randomly assigned method of working that is currently employed. It would work if there were no overlap between work units - I think some other projects that do not have overlapping data do use this sort of approach.

Sadly, I suspect we are stuck with the 700k task size unless the project wants to re-structure the database and do some re-writing of all the applications to cope with the increase in permitted number of signals.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2029786 · Report as offensive
J. Mileski
Volunteer tester
Avatar

Send message
Joined: 9 Jun 02
Posts: 632
Credit: 172,116,532
RAC: 572
United States
Message 2029816 - Posted: 29 Jan 2020, 22:31:41 UTC

I don't remember seeing whether or not astropulse style work can be created from the Green Bank data.
ID: 2029816 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2029824 - Posted: 30 Jan 2020, 0:23:29 UTC

No it can't unless they develop a AP splitter for GBT work. Not sure whether a new application is needed though. Someone here knows the answer definitively.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2029824 · Report as offensive
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2030010 - Posted: 31 Jan 2020, 1:59:46 UTC - in response to Message 2029620.  

The clients could process the bigger workunits in several parts producing multiple independent sets of results each covering similar time window as before. The assimilators would have more work to do per workunit but not any more per source tape than with the small workunits.
Wouldn't that just produce the same number of results as the present system and we would still have 11 million "Results returned and awaiting validation".
No. Because this result set would still be just one result file, so only one database row is needed to reference it.
ID: 2030010 · Report as offensive
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2030013 - Posted: 31 Jan 2020, 2:20:33 UTC - in response to Message 2029784.  

I'm aware that the storage limits of the given workunit for the found spikes / pulses / triplets / Gaussians should be increased as well.
If you have really long work units, then increasing the limits for returned signals is not enough. A rfi spike will fill any reasonable limit and this will then mask all the good parts of the data. Bigger time windows mean more observation time is lost due to these events.

This is why I suggested in another post that the clients would process the long workunits in multiple parts that would match the size of the current workunits and produce result data separately for each part. So you could have result overflow for one part but good results for the rest.
ID: 2030013 · Report as offensive
alanb1951 Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 May 99
Posts: 10
Credit: 6,904,127
RAC: 34
United Kingdom
Message 2030054 - Posted: 31 Jan 2020, 6:07:53 UTC - in response to Message 2030013.  

I'm aware that the storage limits of the given workunit for the found spikes / pulses / triplets / Gaussians should be increased as well.
If you have really long work units, then increasing the limits for returned signals is not enough. A rfi spike will fill any reasonable limit and this will then mask all the good parts of the data. Bigger time windows mean more observation time is lost due to these events.

This is why I suggested in another post that the clients would process the long workunits in multiple parts that would match the size of the current workunits and produce result data separately for each part. So you could have result overflow for one part but good results for the rest.

MIlkyWay@home already does this batching up of sub-tasks. That works provided the different parts all validate! However, if one sub-task fails to validate the whole batch of tasks is sent out to another client to resolve the mismatch, and they keep trying until two clients send in a matched set or the limits are hit; fortunately, MilkyWay tasks are reasonably short...

I suspect that almost any attempt to deal with validation errors in a more sub-task related way would introduce unwanted levels of complexity; perhaps not an issue if there was enough developer time available, but I fear that is not the case.

However, something needs to be done; I'm just not sure what!

Cheers - Al.
ID: 2030054 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2030105 - Posted: 31 Jan 2020, 12:53:29 UTC
Last modified: 31 Jan 2020, 13:06:05 UTC

Please forgive me but i have another opinion that could help to solve the current issues since change the WU size or server configuration (separate the type of hosts as suggest) at this moment will only add more gasoline to the already burning problem.

In the past when we talks about the spoofed client, somebody post (not remember who), the problem with the DB was not the size was the number of queries/second. I disagree at that time since i always believe, size matters. At the end we all could see the decision of keep the release of the spoofed client in a closed loop was right. The impact of change the user side limits hit hard the servers.

Now i could see, the real answer is BOTH. And there is why the answer pass for solve both an the same time. Fixing one without fix the other could be make us circle around and never solve the entire problem.

In desperate times we need to take desperate measures.

Few controversial actions must be taken:

- back to the old and well tested validation of 2 system.
- stop sending new WU to faulty GPU/driver host.
- reduce drastically the death line of the new WU.
- reduce even more the limits.
- reduce the number of days of the client WU cache size.
- send the expired/not validated WUs with a small death line (maybe a week) only to the top fastest hosts with high APR.
- stop to make changes on the servers until the system is back to working fine.

This all will take time to do their jobs and reduce both the number of queries/seconds and the DB size, be aware that time will be weeks, there are so many WU with death line of march and probably more even longer, sting around waiting for the wingman who will probably never appears. To reduce the size of the DB we need to try to clear the WUs as fast as possible.

Yes i know all are controversial but after about a month we all see something extreme must be done, for the good of the project and the sanity of the DB.

Keep the project producing a lot of new WU/day without solving the core of the problem will make us fall in a non return black hole.

Then after the system back to stability and with all working fine, some measures could be slowly changing. One at a time.

One point we need to agree the decision to rise the limits and change the server version without any major test was wrong and leave us to all that mess. The problem with the cross validation caused by the faulty GPU/Driver just added more gasoline to the fire.

my 0.02
ID: 2030105 · Report as offensive
Profile Retvari Zoltan

Send message
Joined: 28 Apr 00
Posts: 35
Credit: 128,746,856
RAC: 230
Hungary
Message 2030119 - Posted: 31 Jan 2020, 14:59:52 UTC - in response to Message 2030013.  
Last modified: 31 Jan 2020, 15:00:13 UTC

If you have really long work units, then increasing the limits for returned signals is not enough. A rfi spike will fill any reasonable limit and this will then mask all the good parts of the data. Bigger time windows mean more observation time is lost due to these events.
RFI spikes can be easily detected and omitted by the app from the result, so no observation time would be lost.

This is why I suggested in another post that the clients would process the long workunits in multiple parts that would match the size of the current workunits and produce result data separately for each part. So you could have result overflow for one part but good results for the rest.
This would leave the load on the servers unchanged. Further tweaking and optimizing client behavior would make the servers' job harder, this isn't the right way. There's no easy way to fix the problems we face.
ID: 2030119 · Report as offensive
Profile Retvari Zoltan

Send message
Joined: 28 Apr 00
Posts: 35
Credit: 128,746,856
RAC: 230
Hungary
Message 2030121 - Posted: 31 Jan 2020, 15:41:41 UTC

In my opinion this project needs a new splitting / validation process which is able to handle the ultra high performance of the present and future GPUs as well as the oldest CPUs. It could be achieved by sending larger chunks of data to fast hosts (expanding in the power of 2, limited by the actual processing speed of the slowest device (GPU/CPU) in the given system).
It needs a new client app also, as it should omit the parts of the data poised by RFI.
I think the need for the transition to that adaptive splitting algorithm is now.
Please share your ideas! (Besides that it can't be done.)
ID: 2030121 · Report as offensive
Previous · 1 · 2

Message boards : Number crunching : How to Fix the current Issues - One man's opinion


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.