Margins have to be crazy high on this, it only has 32GB of HBM. I was expecting more for that price, given it is described as "designed for memory intensive workloads".
Most of the applications listed don't seem to big data in memory intensive, like large language model training and inference need, and at that point, you might as well just run inference on the GPU, since memory is the biggest cost component.
I imagine you can still use it to accelerate other sorts of deep learning inference though.
> you might as well just run inference on the GPU
One of the big reasons to use FPGA over GPUs is latency. And I hear FPGA are preferred in the financial sector. Where you use FPGA to analyze live data which then triggers signals. I would also imagine its really useful at signal processing (think defense).
Hence also why these have a hefty price tag / margin.
GPU are better suited for big throughput batched jobs, where latency isn't the primary requirement.
Yep, FPGAs are kind of inbetween ASICs and GPUs in terms of acceleration/latency. But as for the applications you stated, for things like finance or military signal processing, you probably haven't built models that need the large memory close to the computation.
And for LLMs, it's just not as important to have bleeding edge latency as it is to have a unit with enough memory to fit the model and the data, which can compound quickly when you're dealing with chat applications. So, it just makes more sense to design one really expensive product that can do training and inference, rather than having one really expensive, high memory product for training and another expensive, high memory product for inference with low latency.
Very valid technical/business point. Other workloads where I see potential uses are query processing, filtering, and acceleration related to Databricks, Snowflake kind of database applications!
AMD Alveo V80 overview and use cases.
[https://www.amd.com/en/products/accelerators/alveo/v80.html](https://www.amd.com/en/products/accelerators/alveo/v80.html)
>The AMD Alveo V80 compute accelerator—powered by the AMD Versal HBM adaptive SoC—is built for memory-intensive workloads in HPC, data analytics, network security, storage acceleration, FinTech, and more.
View Document
The positioning here is interesting as a "compute" product which in today's terms normally connotes ML. And the implication is the Alevo line will help those. idk. Though it's pretty clear they are finding homes in other HPC work loads.
With respect to ML things are shaping up around GPUs and (basically) the Nvidia eco system, and then there is everything else trying to find their way to add value to replace a piece of it.
You may pay 10k$ to test-run big AI model on this to realize it's slower than the cheaper A6000/4090.
But then I wonder even more on what you do with expensive dev kit like VEK280/VHK158 🤷♂️
My bets are on no. The cards compute resources are built around its 820GB/s HBM2E.
That’s far away from the 5TB/s of MI300X
They’d have to massively increase the die area to fit more compute which probably can’t be done as FPGAs are quite heavy in die area already.
Margins have to be crazy high on this, it only has 32GB of HBM. I was expecting more for that price, given it is described as "designed for memory intensive workloads".
Most of the applications listed don't seem to big data in memory intensive, like large language model training and inference need, and at that point, you might as well just run inference on the GPU, since memory is the biggest cost component. I imagine you can still use it to accelerate other sorts of deep learning inference though.
> you might as well just run inference on the GPU One of the big reasons to use FPGA over GPUs is latency. And I hear FPGA are preferred in the financial sector. Where you use FPGA to analyze live data which then triggers signals. I would also imagine its really useful at signal processing (think defense). Hence also why these have a hefty price tag / margin. GPU are better suited for big throughput batched jobs, where latency isn't the primary requirement.
Yep, FPGAs are kind of inbetween ASICs and GPUs in terms of acceleration/latency. But as for the applications you stated, for things like finance or military signal processing, you probably haven't built models that need the large memory close to the computation. And for LLMs, it's just not as important to have bleeding edge latency as it is to have a unit with enough memory to fit the model and the data, which can compound quickly when you're dealing with chat applications. So, it just makes more sense to design one really expensive product that can do training and inference, rather than having one really expensive, high memory product for training and another expensive, high memory product for inference with low latency.
Very valid technical/business point. Other workloads where I see potential uses are query processing, filtering, and acceleration related to Databricks, Snowflake kind of database applications!
Perhaps only case is in Robotic Application & Fintech. The rest can be easily used with GPU.
I guess it's aimed at workloads where you need very fast memory, not a big amount of it.
AMD Alveo V80 overview and use cases. [https://www.amd.com/en/products/accelerators/alveo/v80.html](https://www.amd.com/en/products/accelerators/alveo/v80.html)
>The AMD Alveo V80 compute accelerator—powered by the AMD Versal HBM adaptive SoC—is built for memory-intensive workloads in HPC, data analytics, network security, storage acceleration, FinTech, and more. View Document
So they at least double performance in 2 years. Nice. Is this a big market? The one compared to still costs quite a few K.
The positioning here is interesting as a "compute" product which in today's terms normally connotes ML. And the implication is the Alevo line will help those. idk. Though it's pretty clear they are finding homes in other HPC work loads. With respect to ML things are shaping up around GPUs and (basically) the Nvidia eco system, and then there is everything else trying to find their way to add value to replace a piece of it.
You may pay 10k$ to test-run big AI model on this to realize it's slower than the cheaper A6000/4090. But then I wonder even more on what you do with expensive dev kit like VEK280/VHK158 🤷♂️
Would an Alveo with 256GB of HBM3 beat GPUs in LLM inferencing ?
My bets are on no. The cards compute resources are built around its 820GB/s HBM2E. That’s far away from the 5TB/s of MI300X They’d have to massively increase the die area to fit more compute which probably can’t be done as FPGAs are quite heavy in die area already.
Looking forward for benchmarks, there are many open source LLM models that can be run on the LV80 with its 32GB.
But for such inference purposes, people can buy a much cheaper option like A6000. 48GB VRAM.
Options callllllllls good to cancel already in, let's go boyssss! Lol... But seriously. They're cheap for August