[HN Gopher] Intel to set its FPGA unit free to pursue its own path
___________________________________________________________________
Intel to set its FPGA unit free to pursue its own path
Author : rbanffy
Score : 45 points
Date : 2023-10-04 16:42 UTC (6 hours ago)
(HTM) web link (www.nextplatform.com)
(TXT) w3m dump (www.nextplatform.com)
| thenobsta wrote:
| Every time I see an FPGA article, I feel a little sad that
| Tabula[1] didn't make it -- 1.6Ghz clock and reprogrammable on
| the fly. RIP.
|
| 1. https://en.wikipedia.org/wiki/Tabula,_Inc.
| asfarley wrote:
| Hello to Audrey and James
| almatabata wrote:
| Damn i hoped we would one day get a customizable fpga into our
| CPUs. I hoped that it would make sense to install certain
| instructions on your fpga depending on your workloads. I guess
| this either kills that possibility or pushes it into a very far
| future.
|
| I do not understand this part though:
|
| > There was talk of hybrid CPU-FPGA packages, which never seem to
| get > commercialized because no system architect likes static
| ratios of compute - > unless they are determining the ratios.
| Like the hyperscalers and cloud > builders, who can tell
| companies like Intel and AMD what their product > roadmaps need
| to look like.
|
| What do not see what they mean by ratio here. Do they mean die
| ratio between cpu and fpga?
| tverbeure wrote:
| > I hoped that it would make sense to install certain
| instructions on your fpga depending on your workloads.
|
| It's one of those things that seem like a good idea, but they
| just don't work out in practice. FPGA LUTs are just way too
| slow. You'd have to find a case where doing something on a 3GHz
| CPU clock running multiple instruction parallel gets
| outperformed by LUTs that runs at 700MHz (at best). And when
| you cascade the LUTs, they become slower too.
|
| And that's without solving the problem of closely coupling a
| CPU pipeline with FPGA logic.
|
| > What do not see what they mean by ratio here. Do they mean
| die ratio between cpu and fpga?
|
| What they mean is: in something like the Zynq FPGA family, I
| want a die with 2 CPU cores and 5000K LUTs. The other guy wants
| 8 CPU cores and 2000K LUTs. It works for narrow applications
| like signal processing where power efficiency and cost isn't a
| top concern, but for a hyperscaler, power consumption is a very
| important metric. As is the cost of paying for a significant
| part of the silicon die that's sitting there unused.
| mikewarot wrote:
| The right kind of sea of LUTs can outperform anything even if
| it's clocked at 100 Mhz... the trick is to get a pipeline
| filled, instead of trying to outrun light.
|
| Imagine an LLM with a new token every 10 nS
| duskwuff wrote:
| > It's one of those things that seem like a good idea, but
| they just don't work out in practice.
|
| GPGPU sucks a lot of air out of the room as well. There
| aren't many purely computational problems which FPGAs can
| solve better than a compute-optimized GPU; even though GPUs
| aren't quite as flexible, they clock a lot faster, they're
| cheaper, and they're easier to develop for.
| KirillPanov wrote:
| > where doing something on a 3GHz CPU clock running multiple
| instruction parallel gets outperformed by LUTs that runs at
| 700MHz
|
| Easy: go wide.
|
| Make the FPGA-CPU interface four times wider on the FPGA side
| than the CPU side. Each tick of the CPU clock reads (or
| writes) one quarter of the bits.
| j_not_j wrote:
| > FPGA LUTs are just way too slow
|
| If, and of course that is a big if, you can repackage a
| (parallelizable) calculation into FPGA look-up tables and
| implement multiples of this (e.g. 8 to 80 times) then you can
| think maybe it's quicker than CPU at 3GHz.
|
| However, you have to include DMA of the data to and fro. It's
| unlikely to be worth the very extensive effort of integrating
| two wildly different technologies.
|
| On the other hand, it may not be a complicated calculation
| but FPGA can do much lower latency and smaller variance in
| latency (hello high-frequency traders). That is a very narrow
| niche.
|
| A simple board with CPU and FPGA is the Arduino MKR Vidor
| 4000: ARM Cortex 32-bit CPU and Intel Cyclone 10 FPGA).
| Hardware cost: $85. Full suite of development software $1000
| or more (although lesser tools are available for free.)
| imtringued wrote:
| >However, you have to include DMA of the data to and fro.
| It's unlikely to be worth the very extensive effort of
| integrating two wildly different technologies.
|
| That is exactly the part where having the FPGA next to the
| CPU helps... You can transparently access the CPU cache via
| an AXI slave port on the CPU on AMD's MPSoCs at a rate of
| up to 16 bytes per cycle and you get multiple of those.
| almatabata wrote:
| Thanks for clarifying.
| amluto wrote:
| Integrating an FPGA with the actual front-end and register
| files seems so you can invoke it synchronously, with fast
| instructions at low latency, seems neat but rather complicated.
| As for an FPGA asynchronously accessing application memory, I
| tentatively expect CXL with some shared virtual memory trickery
| to succeed in this space, at least in a couple years when the
| dust hopefully settles, and then you can do whatever you want.
| dralley wrote:
| > Damn i hoped we would one day get a customizable fpga into
| our CPUs. I hoped that it would make sense to install certain
| instructions on your fpga depending on your workloads. I guess
| this either kills that possibility or pushes it into a very far
| future.
|
| Depends on what AMD does with Xilinx.
| imtringued wrote:
| I am actually surprised how AMD managed to successfully
| leverage it's FPGAs for machine learning inference. It is
| competing with Nvidia's Jetson.
| gsmecher wrote:
| > Depends on what AMD does with Xilinx.
|
| Currently the AMD/Xilinx dynamic seems to reverse this:
| "Depends on what Xilinx does with AMD".
|
| AMD's software roadmap for AI/datacentre leans heavily on
| Vitis (for software) and AI Engines (as an execution
| platform). CPUs that integrate AI engines are already
| shipping (Ryzen AI). It's Xilinx technology, but you should
| expect it to look more like a GPU accelerator than a
| traditional LUTs-and-routing FPGA. And, as duskwuff have
| pointed out, this sucks a lot of the oxygen out of the CPU-
| with-FPGA design space.
| bfrog wrote:
| "a customizable fpga into our CPUs" that already happened, it
| just didn't happen in x86 land. There have been a good number
| of products from various vendors that connect up hard cores and
| fpga fabric.
|
| power pc cores, riscv cores, and by large arm cores
| tverbeure wrote:
| That's not what OP meant though. They were talking about
| custom CPU instructions implemented with FPGA logic.
| bfrog wrote:
| That doesn't sound that beneficial honestly
| throwaway4590 wrote:
| Whenever I see talk about Intel's FPGA unit, I link back to an
| invention I submitted to Intel while I was an intern there [0].
| I went through the patent pipeline, but to my knowledge they
| never did anything with it. This was during the excitement of
| Intel's original acquisition of Altera.
|
| In fairness, I never mocked up a true enough implementation in
| Verilog to get an idea of real world speedup, and even now, I'm
| not sure exactly what operations you could see real gain with
| from small reconfigurable fabrics near the CPU. Still, I liked
| the elegance of having L1-L3+ FPGA's for speeding up operations
| of increasing levels of complexity, and I figured programmers
| smarter than me would find creative ways of using the FPGA's
| with the added instructions.
|
| [0] https://patents.google.com/patent/US10310868B2/
| almatabata wrote:
| Thanks for sharing. Small question about Image 20, does that
| represent a use case for an instruction translator? For
| example you have an arm chip and you want to run x86 code so
| you offload the x86 instructions to the fpga?
| throwaway4590 wrote:
| I believe my contributions start at Image 25 on Google.
| Images 1-24 are generic CPU boilerplate images that the
| lawyers add to most patents in the field.
| eschneider wrote:
| This doesn't make a lot of sense. I mean, there are SOCs out
| there with asymmetric cores (say, an ARM A53 and an ARM M4 on
| the same die) for folks who's workloads warrant that sorta
| thing. I'd expect there'd be s similar market for CPUs, with
| built in FPGAs of various sizes.
| tverbeure wrote:
| It only makes sense for a few applications. See the popular
| Xilinx Zynq UltraScale MPSoC product line. They are popular
| for digital signal processing, for example. But they are not
| power efficient, and they are very expensive.
|
| Good enough for a low volume custom solution for which custom
| silicon is too expensive. Not for a hyperscaler.
| varelse wrote:
| [dead]
| mips_r4300i wrote:
| Thank goodness. I've been expecting this ever since Intel bought
| Altera, they just stuck with it a couple years longer than I
| figured.
|
| They focused solely on the high end, but it turns out nobody
| really wants FPGA fabric on a CPU. You can already do
| acceleration over a PCI express link, and that's what you more
| often do with embedded applications where the CPU is acting more
| like a dispatch controller than doing the real work.
|
| Intel also have completely ignored the low end of the market. The
| only true lowend part they have is the Cyclone 10LP, which is
| literally the exact same part as the cyclone 3/4 from 2008. Just
| slightly die shrunk. No hard IP support like ddr3 controllers, no
| MIPI, nothing that people are getting from the competition now.
|
| Intel did realize this, which is why the new AgileX family
| includes some "low-mid range" parts, but they will be still much
| more expensive. Low-end to Intel means "under $1k unit cost"
| which ignores a huge part of the market.
|
| They have better tools, documentation, and support than Gowin,
| who is a recent Chinese FPGA upstart using stolen Lattice IP and
| hires. But they will lose to Gowin by default in the commodity
| space unless they do something.
| Tuna-Fish wrote:
| They did not ignore the low end by choice.
|
| The entire story of Altera inside Intel can be summarized as:
|
| Intel fabs make amazing promises about process performance and
| availability. Altera builds their product stack on that. In the
| end, the fabs fail to deliver either performance, or sufficient
| amount of manufacturing capability. Now Altera has to pick
| which products they want to ship. They obviously can the low
| end. Even the high end that ships is horribly late, because of
| manufacturing issues.
|
| There would have been massive demand for the combined
| Intel+Altera products. Many large customers built their future
| based on the marketing promises Intel made, and when they
| couldn't deliver, those customers had to redevelop everything
| on something else. As an example, look up Nokia Reefshark.
| trsohmers wrote:
| They have announced the new Agilex 3 line, which should include
| some CPLD price point parts and be a real rebirth for
| ~$100/unit modern devices.
| bfrog wrote:
| Lets see I guess... I'm not holding my breath, but it'd be
| great to not use Vivado's slow ass Java IDE one day. Quartus
| is light years faster seemingly.
| SilverBirch wrote:
| Yeah it was really funny watching Intel buy Altera at the same
| time that they were spinning out McAfee and thinking "well we'll
| see how long this lasts..."
|
| Big chunk of the team from Altera are at AMD now anyway.
|
| Hopefully they finally get back to innovating on the actual FPGA
| now. I'm so tired of the hardened rubbish and cpu integrated
| rubbish.
| aleph_minus_one wrote:
| > I'm so tired of the hardened rubbish and cpu integrated
| rubbish.
|
| Was there actually a way to access a CPU-integrated FPGA as an
| "ordinary" user/customer (i.e. not a "special customer")?
| brucethemoose2 wrote:
| > We wouldn't place heavy bets on Falcon Shores making it to
| completion unless a big HPC center adopts it, and given how
| Argonne National Laboratory was treated, we don't think there
| will be a lot of uptake unless Intel makes some pretty big
| pricing concessions. Which it can ill afford. Hybrid CPU-GPU
| devices - the original plan for Falcon Shores, have also been
| shelved.
|
| That's even more eyebrow raising than an Altera spinoff.
|
| Altera is a good side business, but Falcon Shores is like Intel's
| consolidated future. If they just let that go... What do they
| expect? That everyone will just buy Xeon CPUs and IGP laptops
| forever?
| chx wrote:
| Look at
| https://benchmark.chaos.com/v5/vray?index=1&ordering=desc&by...
| Intel can ill afford to think about forever. They have a runway
| built by illegal monopoly tactics. What that runway ends, the
| music stops unless they do something _very_ drastic to their
| CPUs _right now_. The fastest 96 core AMD CPU alone is 30%
| faster than the fastest Intel offering, 120 cores in two
| sockets -- and that 's not the fastest CPU AMD offers.
|
| This is not to say Intel will go bankrupt look at the number of
| quarters AMD spent in red but it really doesn't want to become
| #2.
| tester756 wrote:
| It will hurt them on earnings in 2024?
|
| What they gain from it? Is there some deal with TSMC behind the
| scenes?
|
| It seems like TSMC is investing in some Intel's companies IMS and
| now this
| SilverBirch wrote:
| They're just dumping the distraction from their core business,
| the Altera acquisition was one of Bryan krzanich's many mid-
| steps, buying and betting instead of running a business.
___________________________________________________________________
(page generated 2023-10-04 23:00 UTC)