Intel security flaw

Message boards : Number crunching : Intel security flaw
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next

AuthorMessage
rob smith Special Project $250 donor
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 16295
Credit: 325,919,319
RAC: 227,934
United Kingdom
Message 1913719 - Posted: 18 Jan 2018, 13:29:10 UTC

...Ouch - that's some hit in performance and would certainly explain the change in behaviour that we've seen from the splitters in the last few days.
It makes me think is there a better way of managing the splitting and distribution process to reduce the number of i/o actions required per task transaction? But nothing pops to mind right now, and such a change would probably need some serious re-engineering of the underlying databases.....
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1913719 · Report as offensive
kittyman Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 50015
Credit: 923,576,938
RAC: 167,903
United States
Message 1913720 - Posted: 18 Jan 2018, 13:33:41 UTC

I am waiting for the class action lawsuits to start. People claiming that they no longer are getting the performance levels they paid for. I am sure there are lawyers just chomping at the bit.

Meow.
Happy is the person who shares their life with a cat. (Or two or three or........) =^.^=

Have made friends here.
Most were cats.
ID: 1913720 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 12055
Credit: 121,956,825
RAC: 68,357
United Kingdom
Message 1913722 - Posted: 18 Jan 2018, 13:55:50 UTC
Last modified: 18 Jan 2018, 13:59:49 UTC

And to add to the woes:

Intel fix causes reboots and slowdowns

The company said it had reproduced the problem and was "making progress toward identifying the root cause".
Reading further down, Intel now acknowledges:

The most significant reduction in performance involved computer servers that store and retrieve large volumes of data. For those, the slowdown could be as severe as 25%.
That's more honest - theory and reality begin to match at least.
ID: 1913722 · Report as offensive
kittyman Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 50015
Credit: 923,576,938
RAC: 167,903
United States
Message 1913725 - Posted: 18 Jan 2018, 14:12:54 UTC - in response to Message 1913722.  

And to add to the woes:

Intel fix causes reboots and slowdowns

The company said it had reproduced the problem and was "making progress toward identifying the root cause".
Reading further down, Intel now acknowledges:

The most significant reduction in performance involved computer servers that store and retrieve large volumes of data. For those, the slowdown could be as severe as 25%.
That's more honest - theory and reality begin to match at least.

Looks like Moore's Law took a little bit of a hit there, eh?
Happy is the person who shares their life with a cat. (Or two or three or........) =^.^=

Have made friends here.
Most were cats.
ID: 1913725 · Report as offensive
Al Special Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1635
Credit: 361,146,476
RAC: 287,341
United States
Message 1913732 - Posted: 18 Jan 2018, 14:47:09 UTC

Ugh. So, my next question is, I presume they are incorporating the 'fix' into the silicone, at least for the upcoming generations of procs that are going to be released, but my understanding of the problem is that it was introduced when they implemented the pre-fetching years and years ago, which of course boosted performance, if the data was sitting in the cache instead of it having to be read from memory.

So, is the fix to just disable it from now on, and this performance 'hit' is going to be the new normal? Sort of a back to the future? Or is there a way to securely do the pre-fetching to preserve the performance gains, but shield it from the security exposures? I haven't yet read anything about that yet, though I would think that that would be a huge concern, especially for their server side business?

ID: 1913732 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 12055
Credit: 121,956,825
RAC: 68,357
United Kingdom
Message 1913738 - Posted: 18 Jan 2018, 15:44:49 UTC - in response to Message 1913732.  

I've just been out for a walk to fetch my evening read (newspaper printed on dead trees - no power supply needed), and I was musing on much the same question as I walked.

The worst hit of all is going to be rebooting Centurion from a cold start. So, by the time I got home, I'd got as far as:

Could we keep a tiny, tiny bit of independent disk storage (100 MB of SSD, say - even a flash card) to hold configuration and status information - the most critical being the list of active tape files/channels being split (I gather each channel is interleaved along the entire length of the disk file, so the whole 50 GB has to be read (again!) for each new channel.

Then, the cold boot loader would start a user session with the sole purpose of reading that config, and then pipelining the contents of the required disk files into memory. Could such a boot load session sleep so deep that it would effectively eliminate the kernel mode switching?

Then, once the data was loaded, the boot loader could trigger the full BOINC server working environment, start the daemons, and set to work - suffering the kernel hit, but to a much lesser extent than it would during pre-load. Can it be done? Ideas?
ID: 1913738 · Report as offensive
rob smith Special Project $250 donor
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 16295
Credit: 325,919,319
RAC: 227,934
United Kingdom
Message 1913744 - Posted: 18 Jan 2018, 16:32:53 UTC

ARGGGGGHHHHH - how inefficient is that :-(
Spinning the whole tape each time a new channel is to be split is just plain *p
The reason for the data being "along" to the whole "tape" is to keep its synchronicity, but we aren't interested in that. So using a bit of Richard's logic in his earlier post to keep things together, and adding a bit from a few years back when I was dragging data from multi-channel data loggers that recorded data in much the same way.
Assuming we know how many "channels" there are in the "tape", when a new tape is loaded set up the required channel files.
Now, block read the tape, big blocks, n-channels wide by x segments long, into memory, parse the blocks and split dump into the files; repeat as necessary. "Wide-long" block reads are hairy to set up, but are low on i/o count, so avoid that overhead, they can also be very efficient if scaled properly. (the "hairy to set up" bit).
Now the splitters only have to work on the prepared files not the whole tape, so they don't have as much i/o to do.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1913744 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 4922
Credit: 316,470,317
RAC: 725,120
United States
Message 1913749 - Posted: 18 Jan 2018, 16:42:51 UTC - in response to Message 1913720.  

I am waiting for the class action lawsuits to start. People claiming that they no longer are getting the performance levels they paid for. I am sure there are lawyers just chomping at the bit.

Meow.

They've already started, 3 in fact in California the same week as the announcement.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1913749 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 12055
Credit: 121,956,825
RAC: 68,357
United Kingdom
Message 1913751 - Posted: 18 Jan 2018, 16:48:55 UTC - in response to Message 1913744.  

That sounds good. We certainly do know the format and structure of the files on 'tape' (remember, really disk images) - I think we designed it, and even built the bespoke data recorder which interfaces with the telescope feed. That may even be what Matt went off to do.

Assuming that to be the case (and be a little cautious - I wasn't taking technical briefing notes), it would be worth suggesting that to Eric - now, rather than later. He's beginning to think about processing the Parkes data: we touched on that in our chat too, and he mentioned some of the differences. IIRC - and I'm less certain about this bit - the philosophy at Green Bank is to search "every channel for selected sources", but for Parkes will be "selected channels for every source" - makes sense to get an overview of an area of sky we haven't seen before. So he's got to make changes anyway - what a good time to test out a new idea as well.
ID: 1913751 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 31213
Credit: 64,694,704
RAC: 23,083
Germany
Message 1913753 - Posted: 18 Jan 2018, 17:03:45 UTC

Maybe this would be a good addition to the server closet.

https://www.theregister.co.uk/2017/11/21/hpe_brings_amds_epyc_processor_to_mainstream_2p2u_server_box/
With each crime and every kindness we birth our future.
ID: 1913753 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 4922
Credit: 316,470,317
RAC: 725,120
United States
Message 1913755 - Posted: 18 Jan 2018, 17:15:18 UTC

I think that at least if the servers were AMD based instead of Intel based, they wouldn't be suffering the 25% I/O penalty since they wouldn't need the Meltdown patch. Don't know if I have seen any performance degradation tests done on AMD hardware yet with a Spectre patch so that parameter is unknown.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1913755 · Report as offensive
Al Special Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1635
Credit: 361,146,476
RAC: 287,341
United States
Message 1913770 - Posted: 18 Jan 2018, 18:07:52 UTC

Any ruminations on what might be coming down the pike from the mfg's on how to mitigate this down the road, or if it is possible without re-engineering how the basics of the CPU has functioned for well over a decade?

ID: 1913770 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 4556
Credit: 267,765,544
RAC: 406,465
United States
Message 1913774 - Posted: 18 Jan 2018, 18:26:31 UTC - in response to Message 1913753.  

Maybe this would be a good addition to the server closet.

https://www.theregister.co.uk/2017/11/21/hpe_brings_amds_epyc_processor_to_mainstream_2p2u_server_box/


Would be nice. No problems be yet? Huh.. December. I hope they were talking about last month.
ID: 1913774 · Report as offensive
rob smith Special Project $250 donor
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 16295
Credit: 325,919,319
RAC: 227,934
United Kingdom
Message 1913781 - Posted: 18 Jan 2018, 18:53:55 UTC - in response to Message 1913751.  

Thanks Richard - I'll drop Eric a note to open a dialogue.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1913781 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 12055
Credit: 121,956,825
RAC: 68,357
United Kingdom
Message 1913804 - Posted: 18 Jan 2018, 20:52:26 UTC - in response to Message 1913706.  

I've just sent this email round to a small discussion group.....
And it's just produced a very interesting response. Turns out the Meltdown / Sceptre patches simply tipped the database over the edge of a problem which had been growing (un-noticed) anyway. Fortunately, WCG has access to hot and cold running database engineers, and after several interations of

'We investigated multiple paths in order to determine the issue.'

'After doing research, the team concluded that...'

... their database had multiple damaged SQL indices. A quick index drop and recreate later,

After those rebuilds were done, the database server dropped to a load of between 2-5 and a cpu utilization between 150-300%. Much lower than original 20 load and 2800% cpu utilization the server had been experiencing.
I've suggested that the report should be used to start a BOINC server administrator's Knowledge Base, and that the final diagnostic test they used should be scripted and made available to less well endowed BOINC projects as well. We'll see. (Eric is a member of the discussion group, so he'll get the full report and my suggestion directly).
ID: 1913804 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 4922
Credit: 316,470,317
RAC: 725,120
United States
Message 1913808 - Posted: 18 Jan 2018, 21:00:10 UTC - in response to Message 1913770.  

Any ruminations on what might be coming down the pike from the mfg's on how to mitigate this down the road, or if it is possible without re-engineering how the basics of the CPU has functioned for well over a decade?

All chip manufacturers will have to engineer new silicon. That means end products won't be available till 5 years from now.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1913808 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 4922
Credit: 316,470,317
RAC: 725,120
United States
Message 1913810 - Posted: 18 Jan 2018, 21:04:44 UTC

The Administrator's Knowledge Base is a great idea Richard. And it should be used as suggested to disseminate all the server database engineering discoveries and fixes.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1913810 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15659
Credit: 65,385,612
RAC: 14,102
United States
Message 1913853 - Posted: 19 Jan 2018, 0:46:36 UTC - in response to Message 1913732.  

Ugh. So, my next question is, I presume they are incorporating the 'fix' into the silicone, at least for the upcoming generations of procs that are going to be released, but my understanding of the problem is that it was introduced when they implemented the pre-fetching years and years ago, which of course boosted performance, if the data was sitting in the cache instead of it having to be read from memory.


Not prefetching. Speculative execution (Spectre) and shared page table mapping (Meltdown). For a good read: https://arstechnica.com/gadgets/2018/01/meltdown-and-spectre-every-modern-processor-has-unfixable-security-flaws/

For an in-depth explanation: https://arstechnica.com/gadgets/2018/01/whats-behind-the-intel-design-flaw-forcing-numerous-patches/
ID: 1913853 · Report as offensive
Al Special Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1635
Credit: 361,146,476
RAC: 287,341
United States
Message 1913861 - Posted: 19 Jan 2018, 1:13:33 UTC

Thanks Ozz and Keith. 5 years, huh? Not good.. I'll check out those links for some reading with the nightcap this evening.

ID: 1913861 · Report as offensive
Sirius B Project Donor
Volunteer tester
Avatar

Send message
Joined: 26 Dec 00
Posts: 18978
Credit: 2,648,040
RAC: 1,703
Ireland
Message 1913896 - Posted: 19 Jan 2018, 2:45:57 UTC - in response to Message 1913770.  

Any ruminations on what might be coming down the pike from the mfg's on how to mitigate this down the road, or if it is possible without re-engineering how the basics of the CPU has functioned for well over a decade?
On June 8th this year, x86 architecture will be 40 years old. Maybe time to move beyond that.
ID: 1913896 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next

Message boards : Number crunching : Intel security flaw


 
©2018 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.