The Server Issues / Outages Thread - Panic Mode On! (118)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 80 · 81 · 82 · 83 · 84 · 85 · 86 . . . 94 · Next

AuthorMessage
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 2032186 - Posted: 13 Feb 2020, 6:49:15 UTC - in response to Message 2032183.  
Last modified: 13 Feb 2020, 6:52:24 UTC

The limit is per CPU not per core. So we only get 150 WUs whether you have 4 cores or 64 cores.


See...this is where we are messing up. We could create a SETI@home Pro, as a tiered level system.
Free just as it currently is.
$5/mo for 2 days advanced work
$10/mo for 3
up to $50 for 10.

Then an addition $5/mo for customizations. pick and choose which data types to work with for each machine.

Edit: Ok, I'll stop with the comedy.
ID: 2032186 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2032187 - Posted: 13 Feb 2020, 6:49:59 UTC - in response to Message 2032165.  

As Stephen mentioned, each host no matter how many cpus or cores gets 150 cpu tasks. That's it. That is all. No more. So you can never carry more than the server allotment of 150 on any host.

So I start the outage with 150 tasks and retire 24 tasks per hour. So I am out of cpu tasks in 6 1/4 hours. And that is if I get no overflows or shorties with high angle range which can be finished in 20 minutes or less.

Since our outages lately have gone on for 12 hours or more before you actually start getting replacement work, the Ryzen cpus sit cold for several hours with no work.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2032187 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19214
Credit: 40,757,560
RAC: 67
United Kingdom
Message 2032189 - Posted: 13 Feb 2020, 7:03:31 UTC - in response to Message 2032187.  

As Stephen mentioned, each host no matter how many cpus or cores gets 150 cpu tasks. That's it. That is all. No more. So you can never carry more than the server allotment of 150 on any host.

So I start the outage with 150 tasks and retire 24 tasks per hour. So I am out of cpu tasks in 6 1/4 hours. And that is if I get no overflows or shorties with high angle range which can be finished in 20 minutes or less.

Since our outages lately have gone on for 12 hours or more before you actually start getting replacement work, the Ryzen cpus sit cold for several hours with no work.

I thought it was per core.

I can only conclude the system is illogical.
ID: 2032189 · Report as offensive
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 2032192 - Posted: 13 Feb 2020, 7:35:06 UTC - in response to Message 2032189.  

I can only conclude the system is illogical.


I get all tingly inside when I see deductive humor.
ID: 2032192 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19214
Credit: 40,757,560
RAC: 67
United Kingdom
Message 2032193 - Posted: 13 Feb 2020, 8:17:49 UTC - in response to Message 2032192.  
Last modified: 13 Feb 2020, 8:18:25 UTC

I can only conclude the system is illogical.


I get all tingly inside when I see deductive humor.

LMAO.

Next step in the conclusion.
The special sauce needs more ingredients, so that each core or thread is translated into a CPU when asking for work.
ID: 2032193 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2032195 - Posted: 13 Feb 2020, 9:00:25 UTC - in response to Message 2032193.  

I can only conclude the system is illogical.


I get all tingly inside when I see deductive humor.

LMAO.

Next step in the conclusion.
The special sauce needs more ingredients, so that each core or thread is translated into a CPU when asking for work.

We already tried. The scheduler wants nothing to do with spoofed cpus. A host can only have one cpu so only gets 150 tasks.

The only way around the issue is to reschedule gpu tasks to the cpu.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2032195 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19214
Credit: 40,757,560
RAC: 67
United Kingdom
Message 2032196 - Posted: 13 Feb 2020, 9:12:07 UTC - in response to Message 2032195.  

I can only conclude the system is illogical.


I get all tingly inside when I see deductive humor.

LMAO.

Next step in the conclusion.
The special sauce needs more ingredients, so that each core or thread is translated into a CPU when asking for work.

We already tried. The scheduler wants nothing to do with spoofed cpus. A host can only have one cpu so only gets 150 tasks.

The only way around the issue is to reschedule gpu tasks to the cpu.

I had dual CPU computer on here at one time and I'm pretty sure the cache was per CPU, that's why I'm surprised the cache is not per core or thread.

That computer isn't listed here these days, but it is still in the list at Beta https://setiweb.ssl.berkeley.edu/beta/show_host_detail.php?hostid=6318
ID: 2032196 · Report as offensive
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2032197 - Posted: 13 Feb 2020, 9:38:00 UTC

My Ryzen 7 3700X crunches 150 tasks in about 12 hours, so it was just barely able to coast over the last downtime without idling. It has 16 threads but I run only 8 parallel cpu tasks on it. That's the number of true cores it has and going over that number would give very little benefit as the two tasks running in the threads of same core would be fighting for the same fpu.
ID: 2032197 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13790
Credit: 208,696,464
RAC: 304
Australia
Message 2032199 - Posted: 13 Feb 2020, 9:41:46 UTC - in response to Message 2032196.  

I had dual CPU computer on here at one time and I'm pretty sure the cache was per CPU, that's why I'm surprised the cache is not per core or thread.
The allocation of 150 (or whatever) WUs per GPU was a glitch in the system. It was meant to be the same as for CPUs- xxx number of WUs allocated to that computing resource (CPU or GPU; whether there was only one, or 500).
An Octal Socket CPU system will still only get 150WUs.
Grant
Darwin NT
ID: 2032199 · Report as offensive
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2032200 - Posted: 13 Feb 2020, 9:46:14 UTC - in response to Message 2032196.  

I had dual CPU computer on here at one time and I'm pretty sure the cache was per CPU, that's why I'm surprised the cache is not per core or thread.
That computer isn't listed here these days, but it is still in the list at Beta https://setiweb.ssl.berkeley.edu/beta/show_host_detail.php?hostid=6318
In 2006 CPUs were orders of magnitude slower than these days, so few people ever hit the task number limit before hitting their configured time limit. If there even was task number limiting back then.
ID: 2032200 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1855
Credit: 268,616,081
RAC: 1,349
United States
Message 2032207 - Posted: 13 Feb 2020, 10:52:57 UTC - in response to Message 2032196.  

I had dual CPU computer on here at one time and I'm pretty sure the cache was per CPU, that's why I'm surprised the cache is not per core or thread.

If so, I haven't seen a way to configure it, and don't think there is one.
This machine has dual hexa-core Xeons with hyper-threading, so 12 physical cores in two chips, total of 24 logical threads. I have it set for 14 CPUs in cc_config.xml, as that makes the most sense for my config. Regardless of that setting, I'm limited to 150 (was 100 until recently) tasks at a time. Changing the CPU count effects work unit assignment, including number of tasks in progress at once, but not maximum tasks in queue.
Wish it wasn't so ...
ID: 2032207 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19214
Credit: 40,757,560
RAC: 67
United Kingdom
Message 2032209 - Posted: 13 Feb 2020, 10:56:30 UTC - in response to Message 2032200.  

I had dual CPU computer on here at one time and I'm pretty sure the cache was per CPU, that's why I'm surprised the cache is not per core or thread.
That computer isn't listed here these days, but it is still in the list at Beta https://setiweb.ssl.berkeley.edu/beta/show_host_detail.php?hostid=6318
In 2006 CPUs were orders of magnitude slower than these days, so few people ever hit the task number limit before hitting their configured time limit. If there even was task number limiting back then.

That has to be so, but since then the sensitivity of the MB app has been doubled twice, and it follows the inverse sq law so each doubling, quadruples the processing. And the tasks have been doubled in length.
So the time to crunch has gone down as much as you might think. A modern day i5 is only about twice as fast as my mid range PentiumM, with similar clock speeds, which I got back in 2008.
The real increase in computing power has been in the use of GPU's.
ID: 2032209 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2032210 - Posted: 13 Feb 2020, 10:56:34 UTC - in response to Message 2032197.  

It has 16 threads but I run only 8 parallel cpu tasks on it. That's the number of true cores it has and going over that number would give very little benefit as the two tasks running in the threads of same core would be fighting for the same fpu.

That's a misconception about Ryzen 3000 carried over from the FX cpu days. There is no penalty for having two threads performing two FP operations. It can even do a FP operation at the same time as an integer operation.
https://en.wikichip.org/wiki/amd/microarchitectures/zen_2#Floating_Point_Unit

Much more sensible dispatching of operations. Plus the FP register is 256bits wide now.
Four ALUs, two AGUs/load–store units, and two floating-point units per core.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2032210 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 2032221 - Posted: 13 Feb 2020, 13:59:20 UTC - in response to Message 2032186.  


Edit: Ok, I'll stop with the comedy.


Actually I wonder about the non-profit and similar issues with the idea? I am presuming the technical mechanics of the idea are possible. And heck, even non-profits are allowed to charge fees as long as they don't make money at the end of the year.

Tom
A proud member of the OFA (Old Farts Association).
ID: 2032221 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2032238 - Posted: 13 Feb 2020, 16:10:24 UTC - in response to Message 2032193.  

I can only conclude the system is illogical.


I get all tingly inside when I see deductive humor.

LMAO.

Next step in the conclusion.
The special sauce needs more ingredients, so that each core or thread is translated into a CPU when asking for work.


. . The special sauce is only on Nvidia GPUs, it does not apply to CPUs.

Stephen

:(
ID: 2032238 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2032239 - Posted: 13 Feb 2020, 16:18:10 UTC - in response to Message 2032209.  
Last modified: 13 Feb 2020, 16:25:21 UTC

I had dual CPU computer on here at one time and I'm pretty sure the cache was per CPU, that's why I'm surprised the cache is not per core or thread.
That computer isn't listed here these days, but it is still in the list at Beta https://setiweb.ssl.berkeley.edu/beta/show_host_detail.php?hostid=6318
In 2006 CPUs were orders of magnitude slower than these days, so few people ever hit the task number limit before hitting their configured time limit. If there even was task number limiting back then.

That has to be so, but since then the sensitivity of the MB app has been doubled twice, and it follows the inverse sq law so each doubling, quadruples the processing. And the tasks have been doubled in length.
So the time to crunch has gone down as much as you might think. A modern day i5 is only about twice as fast as my mid range PentiumM, with similar clock speeds, which I got back in 2008.
The real increase in computing power has been in the use of GPU's.


. . The doubling is sample size was the change from 2 bit to 4 bit per sample in the data. It improves the resolution but testing showed it made very little difference to run times. Barely even noticeable.

. . My i5-6600 will crunch a task in about 1/8 the time it took on my old Pentium4, but much of that improvement was the advent of AVX, and that is per core, of which the Pentium4 had only one not 4 like the i5. So the output of the i5 is many magnitudes greater than a Pentium4 which could only do a few tasks per day (3 to 5) compared to over 100.

Stephen

:)
ID: 2032239 · Report as offensive
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 2032248 - Posted: 13 Feb 2020, 17:21:52 UTC

When I said it didn't take long, all I meant was it didn't affect my main machine nearly as much as the last couple of outages, so I'll take it, and be happy with it.
https://imgur.com/aWojoXt
ID: 2032248 · Report as offensive
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 2032257 - Posted: 13 Feb 2020, 17:55:45 UTC - in response to Message 2032221.  
Last modified: 13 Feb 2020, 18:06:59 UTC

[quoteActually I wonder about the non-profit and similar issues with the idea? I am presuming the technical mechanics of the idea are possible. And heck, even non-profits are allowed to charge fees as long as they don't make money at the end of the year.[/quote]

Actually TomM,
The mechanics would be nearly impossible without a very large group of dedicated volunteer programmers. Something like that would require a top to bottom overhaul of the BOINC client to support, as well as the changes at SETI itself. Now, I'm not suggesting that BOINC doesn't need a massive overhaul, because there are several places it could be made much better, to include prioritization of tasks, possible communications between hosts for coordination of tasks, getting rid of the RPC component, adding SSH as communication between client and manager, multiple setup options such that LAN and DATACENTER operations can customize the apps for things like central storage, local task handling by a network manager as opposed to a host manager, and the backend at the projects to be able to help set priorities like Science Priority, Science Necessity, Time Necessity, Storage Necessity, and have the manager be able to set local priorities, which would only be additive to the backend priorities. I mean there are a lot of things which could be done to improve the overall projects both BOINC and SETI. Question really is, does the manpower exist. I've seen this community fund things in record times, such as that recent hard drive purchase. I don't think money is as big of an issue as it appears to be on paper.

Guy

Edit: One thing I would definitely love to see added is a heartbeat message from clients to servers which could be a signal to preemptively reassign work units to active machines and cut tasks from machines which have gone dark. That helps both the number crunching community and the storage at the backend.
ID: 2032257 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19214
Credit: 40,757,560
RAC: 67
United Kingdom
Message 2032263 - Posted: 13 Feb 2020, 18:37:27 UTC - in response to Message 2032239.  
Last modified: 13 Feb 2020, 18:40:21 UTC

Way. way long before that. https://setiathome.berkeley.edu/forum_thread.php?id=18850&postid=154163#154163
I didn't even consider that event in my resume of changes.
ID: 2032263 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3779
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2032265 - Posted: 13 Feb 2020, 18:50:42 UTC

Thread is getting offtopic... I suggest for proposed improvements another thread is used such as this one. Don't want to have to go moving dozens of posts. Thanks!
ID: 2032265 · Report as offensive
Previous · 1 . . . 80 · 81 · 82 · 83 · 84 · 85 · 86 . . . 94 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.