The spoofed client - Whatever about

Message boards : Number crunching : The spoofed client - Whatever about
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 621
Credit: 2,040,078
RAC: 2,696
United States
Message 2004569 - Posted: 28 Jul 2019, 1:14:28 UTC

I think the seti team has been doing a great job of upgrading the servers (with more upgrades planned). The unplanned outages are only a bad memory, and the planned outages have been of a reasonable (your definition may vary) length.

There tends to be a horrible crunch right after the outage, and I think the issue isn't with the people who are on the boards trying to do their best to lessen the load on the system one way or another. I think we can have a healthy debate with science and numbers to figure out what works.

But hopefully as Grant said "Personally i'm hopeful that the new upload server, which also does file deletion & Validation duties, will help reduce the present bottlenecks (or better yet, remove them. Which would of course expose the next bottleneck area)." these discussions are only important in the moment as the system changes and thus the bottleneck changes.

I'm glad we have a community who is passionate about how they SETI, and that we can all SETI the way we want.

The team recently even changed the code to allow more ghosts to be recovered at one time. It means they are paying attention and tweaking things within the money/time constraint to help the system. If the spoofing caused a big problem, I'm guessing a message would be sent out, or something put into place.
ID: 2004569 · Report as offensive
MarkJ Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 08
Posts: 1132
Credit: 80,672,044
RAC: 332
Australia
Message 2004609 - Posted: 28 Jul 2019, 7:27:09 UTC

So following the logic of reducing the out-in-the-field number surely shortening the deadlines would improve that. The project could adjust it, wait a while to see the effect and then look at increasing the per-host limits. Once that’s done you don’t need the spoofed client.
BOINC blog
ID: 2004609 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 11871
Credit: 184,449,026
RAC: 235,832
Australia
Message 2004610 - Posted: 28 Jul 2019, 7:43:23 UTC - in response to Message 2004609.  
Last modified: 28 Jul 2019, 7:44:22 UTC

So following the logic of reducing the out-in-the-field number surely shortening the deadlines would improve that.
Not as much as you might think.
I can't remember who did it, but they spent a considerable amount of time & effort & looked at how long it took for their WUs to be Validaated. I can't remember the actual numbers, but I think it was something like well over 90% of WUs are returned and Validated within 48 hrs or so, the next 5% or so within a week. For all of the WUs people see in their Pending Validation list from 1, 2 or even more months a go the vast majority of work sent out is actually returned within 48hrs (or less).

The only way it would have a significant effect would be to make the deadlines something like 96 hours (4 days). But who hasn't had an issue with their ISP? Power failure? System crash? Moved house? Gone on a holiday?
While a 1 week deadline probably would reduce the number of Results-out-in-the-field significantly, there really needs to be some extra time to allow for problems that crop up to be sorted out, and still allow to be able to rerun the work and not have it just time out. I reckon a 1 month deadline would be a good compromise, but it's effect on Results-out-in-the-field would be minimal.
Grant
Darwin NT
ID: 2004610 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 26013
Credit: 51,039,725
RAC: 21,834
United States
Message 2004628 - Posted: 28 Jul 2019, 14:01:13 UTC - in response to Message 2004555.  

Host 1; State: All (29150) · In progress (500) · Validation pending (17846) · Validation inconclusive (352) · Valid (10452) · Invalid (0) · Error (0)

Host 2; State: All (28428) · In progress (3200) · Validation pending (14583) · Validation inconclusive (425) · Valid (10220) · Invalid (0) · Error (0)

Host 3; State: All (26577) · In progress (5172) · Validation pending (10703) · Validation inconclusive (374) · Valid (10229) · Invalid (2) · Error (97)

Another look at those Three Top Hosts;
Host 1; State: All (28282) · In progress (500) · Validation pending (17600) · Validation inconclusive (337) · Valid (9845)

Host 2; State: All (27806) · In progress (3200) · Validation pending (14143) · Validation inconclusive (448) · Valid (10015)

Host 3; State: All (27821) · In progress (6500) · Validation pending (10860) · Validation inconclusive (378) · Valid (9987)

This is more like what I would expect, the 'All' numbers are very close together. Any perceived 'Help' to the database by lowering the cache setting is offset by an increase in Pending tasks.
This is for Hosts completing 10000 tasks a day, is it the same for those completing 5000 per day? 1000 a day? Since Validated tasks are removed after 24 hours I assume we need to consider this a daily cycle.


Hmmm, I suppose I need to order another 1070 to break out of this deadlock...

Thought it might be interesting to compare to a host at the extreme other end of the world, very slow
SLOW State: All (16) · In progress (2) · Validation pending (9) · Validation inconclusive (1) · Valid (4) · Invalid (0) · Error (0)
You would think that wingmates would on average have already returned so what is up with the validation pending?!

Remember for each w/u there are at least two files on the server, one for each wingmate. As you can't validate one of your machines against another of your machines your work can get paired with a super slow machine (there are lots more of them) and have files sitting server side for a long long time. There will be a sweet spot.
ID: 2004628 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 13300
Credit: 165,806,317
RAC: 216,425
United Kingdom
Message 2004631 - Posted: 28 Jul 2019, 14:24:01 UTC - in response to Message 2004628.  

Thought it might be interesting to compare to a host at the extreme other end of the world, very slow
SLOW State: All (16) · In progress (2) · Validation pending (9) · Validation inconclusive (1) · Valid (4) · Invalid (0) · Error (0)
You would think that wingmates would on average have already returned so what is up with the validation pending?!
Well, from that link you can explore every wingmate for the pending tasks. The first five have an average turnround time ranging from 0.9 days to 8.33 days - remember that other people's computers may have a low resource share for SETI.
ID: 2004631 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 8249
Credit: 518,664,011
RAC: 398,699
Panama
Message 2004634 - Posted: 28 Jul 2019, 14:55:55 UTC - in response to Message 2004628.  
Last modified: 28 Jul 2019, 14:57:58 UTC

There will be a sweet spot.

You never will satisfy all the hosts sauces but you could help a lot by making something like GPUGrid, they set a maximum time to return a WU, ok there is a couple of days bit not need to be extreme like that. But set a deadline around 1 month could fit most of the slow hosts and could help to squeeze the DB a little. That could be clearly explained on the project rules. I believe even a very slower host could return a WU crunched in 1 month.

I was looking at my WU buffer and i have some (more than 100) WU with a deadline up to Oct 10... about 3 months from now! Not need to explain the consequences of keep this to the DB size. Now imagine if a couple of them goes to those "death hosts" another 3 months of wait?
ID: 2004634 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 17944
Credit: 408,906,691
RAC: 33,123
United Kingdom
Message 2004642 - Posted: 28 Jul 2019, 16:38:41 UTC

It would be helpful to know which group of tasks has a twelve week deadline instead of the more normal six weeks.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2004642 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 13300
Credit: 165,806,317
RAC: 216,425
United Kingdom
Message 2004646 - Posted: 28 Jul 2019, 16:47:30 UTC - in response to Message 2004642.  

It would be helpful to know which group of tasks has a twelve week deadline instead of the more normal six weeks.
I don't have any in my home cache at the moment.

But looking at deadlines revisited, they will be tasks with AR 0.22549 or just above - rare, but not unknown.
ID: 2004646 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 26013
Credit: 51,039,725
RAC: 21,834
United States
Message 2004647 - Posted: 28 Jul 2019, 17:05:40 UTC - in response to Message 2004631.  

Thought it might be interesting to compare to a host at the extreme other end of the world, very slow
SLOW State: All (16) · In progress (2) · Validation pending (9) · Validation inconclusive (1) · Valid (4) · Invalid (0) · Error (0)
You would think that wingmates would on average have already returned so what is up with the validation pending?!
Well, from that link you can explore every wingmate for the pending tasks. The first five have an average turnround time ranging from 0.9 days to 8.33 days - remember that other people's computers may have a low resource share for SETI.

Very true, and the last two have now timed out and are now unsent. If this little host is "average" and 20% of work units are timing out ... look what that does to the database. Since the link it has returned the 2 that were in progress, one validated immediately.
ID: 2004647 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 8249
Credit: 518,664,011
RAC: 398,699
Panama
Message 2004648 - Posted: 28 Jul 2019, 17:12:28 UTC - in response to Message 2004642.  
Last modified: 28 Jul 2019, 17:25:27 UTC

It would be helpful to know which group of tasks has a twelve week deadline instead of the more normal six weeks.


Application
Local: setiathome_v8 8.01 (cuda90)
Name
blc41_2bit_guppi_58642_03113_PSR_B1133+16_0015.19208.0.22.45.235.vlar
State
Ready to report
Received
Wed 17 Jul 2019 07:53:23 PM EST
Report deadline
Thu 10 Oct 2019 09:36:14 PM EST
Resources
1 CPU + 1 NVIDIA GPU
Estimated computation size
6,806 GFLOPs
CPU time
00:00:15
Elapsed time
00:00:56
Executable
setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101


Application
Local: setiathome_v8 8.01 (cuda90)
Name
blc41_2bit_guppi_58642_02075_3C295_0008.32290.0.22.45.30.vlar
State
Ready to report
Received
Wed 17 Jul 2019 06:14:45 PM EST
Report deadline
Fri 27 Sep 2019 09:31:39 AM EST
Resources
1 CPU + 1 NVIDIA GPU
Estimated computation size
247,577 GFLOPs
CPU time
00:00:15
Elapsed time
00:00:52
Executable
setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101

and a lot others from the same set, 295 WU on one host only.

But i see the same on the others hosts with others set's.

Is that what you ask for?
ID: 2004648 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 13300
Credit: 165,806,317
RAC: 216,425
United Kingdom
Message 2004650 - Posted: 28 Jul 2019, 17:28:55 UTC - in response to Message 2004648.  

Is that what you ask for?
Well, the Task ID 7871006337 is easier for others to look at, but it leads us in the right direction - thanks.

WU true angle range is : 0.026279 corresponds to Joe's graph.
ID: 2004650 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 4928
Credit: 666,871,123
RAC: 1,418,784
United States
Message 2004711 - Posted: 29 Jul 2019, 4:22:47 UTC

Those Three Top end machines are about the same as before. I looked at the lower producing machines and was able to make a general assessment. Basically, the In Progress number shouldn't exceed the Host's Daily output, or Valid number. If the Host caches more tasks than it can complete in a Day the Database impact rises. This means a Host with a Valid number of 2000 shouldn't have a cache of 6400, or as in the old days, a Host completing 100 tasks a day shouldn't have a cache of 10000.

It basically agrees with the suggestion I made a while back, the Maximum allowed cache should be set to One Day. This means a Host completing 50 tasks a day should have a cache of 50, and so forth. This will also lower the Pending number a bit as Pending tasks generally would only stay Pending a Day. If the Wingman doesn't run his machine, at least there will only be One day of tasks on his machine instead of 10 days. Of course lowering the Deadlines would also help, but, there seems to be reluctance to lower deadlines.
ID: 2004711 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 17944
Credit: 408,906,691
RAC: 33,123
United Kingdom
Message 2004723 - Posted: 29 Jul 2019, 7:52:22 UTC

With a max one day cache you automatically alienate those who grab a few days work at the weekend, process it off-line during the week then return and refill at the weekend, that is folks who use their cache properly not those who just like to have a huge cache that they do not need apart from polishing their egos by having thousands of tasks sitting around that they will never process. But, as with the current 100+100 cache people would eventually find out ways to work around it.

One solution to part of the "problem" is of course to seriously restrict the flow of tasks to hosts that fail to return large numbers by their deadline to one one non-error task returned, one task sent out (I can think of one user who would have their cache reduced from >>30k to a single figure if this was implemented, and I dare say there are other hosts doing much the same)
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2004723 · Report as offensive
Profile tazzduke
Volunteer tester

Send message
Joined: 15 Sep 07
Posts: 161
Credit: 26,026,864
RAC: 13,620
Australia
Message 2004729 - Posted: 29 Jul 2019, 10:02:58 UTC - in response to Message 2004723.  

With a max one day cache you automatically alienate those who grab a few days work at the weekend, process it off-line during the week then return and refill at the weekend, that is folks who use their cache properly not those who just like to have a huge cache that they do not need apart from polishing their egos by having thousands of tasks sitting around that they will never process. But, as with the current 100+100 cache people would eventually find out ways to work around it.

One solution to part of the "problem" is of course to seriously restrict the flow of tasks to hosts that fail to return large numbers by their deadline to one one non-error task returned, one task sent out (I can think of one user who would have their cache reduced from >>30k to a single figure if this was implemented, and I dare say there are other hosts doing much the same)


+1
ID: 2004729 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 4928
Credit: 666,871,123
RAC: 1,418,784
United States
Message 2004768 - Posted: 29 Jul 2019, 15:18:23 UTC - in response to Message 2004723.  

With a max one day cache you automatically alienate those who grab a few days work at the weekend, process it off-line during the week then return and refill at the weekend, that is folks who use their cache properly not those who just like to have a huge cache that they do not need apart from polishing their egos by having thousands of tasks sitting around that they will never process. But, as with the current 100+100 cache people would eventually find out ways to work around it.

One solution to part of the "problem" is of course to seriously restrict the flow of tasks to hosts that fail to return large numbers by their deadline to one one non-error task returned, one task sent out (I can think of one user who would have their cache reduced from >>30k to a single figure if this was implemented, and I dare say there are other hosts doing much the same)
This sounds as though it is two Very small groups of users. As far as somehow fixing the 'bad Host' problem, good luck with that. SETI has been trying to 'fix' that problem forever and I doubt it will be fixed anytime soon. As for the other 99% of SETI users, tests indicate the most efficient cache setting is to set the cache just below the Host's Daily output. That setting will result in the best In Progress and Pending validation ratio along with the Smallest Impact on the Database, for those who can actually download that many tasks. For the most productive Hosts who will still be stuck with just a couple hours worth of cache, you can take solace in the knowledge that at least those few people that can't seem to find the Internet for up to a week at a time will still be able to run those dozen or so tasks a day.
ID: 2004768 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 17944
Credit: 408,906,691
RAC: 33,123
United Kingdom
Message 2004774 - Posted: 29 Jul 2019, 15:57:28 UTC

From the evidence in front of me it would appear that the former group (the weekly hitters) represent somewhere five and ten percent of the users that are contributing to slow validations.
The second group (the never return in time) who have huge caches it only takes a very small number of them to get a very large number of trapped tasks - 10 such hosts, each with 30,000 is 300,000, just now there are ~5,400,000 tasks out in the field - that's about 5.5% of the tasks out in the field trapped on only ten machines, which is far from an insignificant fraction, and well worth releasing back so they can be processed in an acceptable period of time.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2004774 · Report as offensive
pututu

Send message
Joined: 21 Jul 16
Posts: 10
Credit: 8,634,354
RAC: 966
United States
Message 2005605 - Posted: 4 Aug 2019, 1:35:56 UTC
Last modified: 4 Aug 2019, 1:36:53 UTC

This PC is not mine but I give a thumb up for his/her effort to find ways to download more tasks per rig.
Running multiple BOINC instances on one PC. Appears there are 4 instances running on one 1080 card most likely due to memory size limitation hence the run time for 1080 appears about 4 times slower than mine.
https://setiathome.berkeley.edu/show_host_detail.php?hostid=8782554
https://setiathome.berkeley.edu/show_host_detail.php?hostid=8782555
https://setiathome.berkeley.edu/show_host_detail.php?hostid=8782556
https://setiathome.berkeley.edu/show_host_detail.php?hostid=8782557

There are many guides on how one can setup BOINC multiple instances. The upcoming Wow! event has certainly created some ingenuity in overcoming the 100 tasks limitation per GPU/CPU.
ID: 2005605 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 8249
Credit: 518,664,011
RAC: 398,699
Panama
Message 2005606 - Posted: 4 Aug 2019, 1:42:10 UTC - in response to Message 2005605.  
Last modified: 4 Aug 2019, 2:04:26 UTC

There are many guides on how one can setup BOINC multiple instances. The upcoming Wow! event has certainly created some ingenuity in overcoming the 100 tasks limitation per GPU/CPU.

See this: https://setiathome.berkeley.edu/show_host_detail.php?hostid=8773146
This is what a real spoofed client host could do. Observe there are no slow in the crunching time.
Now imagine what could do a fleet of 10 or more of this babes.
This show why spoofed hosts must be managed with extreme care to avoyd swam the servers.
ID: 2005606 · Report as offensive
Stephen "Heretic" Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 4785
Credit: 164,389,980
RAC: 241,103
Australia
Message 2005613 - Posted: 4 Aug 2019, 2:32:50 UTC - in response to Message 2004321.  

Right, because all 90,000 active users are running Linux, and have the skills and means to do this. Pull the other one, Keith, it's got bells on...


"With great power . . . comes great responsibility"

...comes something, that's for sure ... Was it your intent to sound that arrogant?


I guess my attempt at levity fell on deaf ears. Yes that was a popular movie reference. I interpreted your comment as an affront to the GPUUG. We take our responsibilities seriously.

If you look at the Top 100 participants list of those running Linux, you will notice that not everyone is a member of the GPUUG. So without the assistance of the GPUUG, independent users have been able to figure out how to spoof the client. It is not rocket science. Have you even bothered to download the master and look at the code in the client directory? The instructions for compiling the BOINC code is published right on the BOINC pages. All it takes is some effort on the part of the participant to read the code, figure out what needs to be changed and follow the instructions to make your own client that spoofs gpus. Heck you don't even have to be running Linux to run a spoofed client. Make the Windows client and run the SoG app.

All I was trying to get across is that we shouldn't willy-nilly make it as easy as downloading the AIO for the masses and have everyone spoof multiple gpus. I'm sure the database and the servers will crumble under the load. You can always use a rescheduler if you are running out of work on Tuesday's. Heck the need for a spoofed client is less and less now that the outages are reasonable again at 4-5 hours. The only reason the spoofed client came into existence was having to deal with the 12-14 hour or longer outages we were experiencing for a great long while.


. . Hi Keith

. . Your comment about 'you have to be a member of GPU Users group' to get the information did come across as a bit elitist at the least and rather political at worst, but I would have thought that Jimbocous has been around long enough to know that most of the serious contributors take their efforts and responsibilities seriously. My fear is not as much about the abuse of ridiculously oversized caches (personally my philosophy has always been your cache does not need to be bigger than 1 days work) but rather a plague of screwed up clients misbehaving and causing havoc.

. . Sadly not all of us have programming skills/experience to bring to the party so this remains beyond our personal abilities. And you of course right in pointing out that a great proliferation of humungous caches would cause possibly terminal database bloat.

. . Just my 2 cents worth ...

Stephen

:)
ID: 2005613 · Report as offensive
Stephen "Heretic" Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 4785
Credit: 164,389,980
RAC: 241,103
Australia
Message 2005614 - Posted: 4 Aug 2019, 2:51:17 UTC - in response to Message 2004405.  

These restrictions were put into place to stop huge amounts of data being returned after an outage as it caused database problems. That is it, no other reason.
When I raised this point before I was assured that people who spoofed were aware of this fact and controlled their task return so as not to flood the servers. Is this still correct?


. . Hi Bernie, this is a very good point but I think not the only reason. Database bloat is another and the database/s have bumped their heads on the ceiling on more than one occasion. But while that may have been improved with the updates that have been implemented it is still worth remembering.

Stephen

:-{
ID: 2005614 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : The spoofed client - Whatever about


 
©2019 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.