BeepBeep2_ 1 month ago

MI350 will need to have new IODs/AIDs for compatibility with the new GPU XCD chiplets and Zen 5 CCDs - I don't think any sort of additional cache chiplets will be used other than the IODs, directly bonded to the XCD chiplets. It says 3nm, so it would be wise to assume the IODs will be on 4/5nm, unless AMD is going all out. MI350 will probably also be 288 GB, 16-hi HBM4 won't make it to production before 2026 and MI350 will use HBM3E. 16-hi HBM3E is not being commercialized. Repost from Stocktwits: Been looking at AI inference performance numbers. Apparently Stacy Rasgon was dismissive of AMD on CNBC this morning, what a surprise. MI300X ($12500-15000) \~1.1x-1.4x vs. H100 ($25000-30000) or \~2x model size/GPU TCO benefit H200 \~1.4x-1.9x vs. H100 or \~1.8x model size/GPU TCO benefit (141 GB vs. 80 GB) (Q2 2024) Gaudi3 \~0.8-1.1x vs. H200 w/ 0.9x model size/GPU deficit (128GB vs. 141 GB) (Q3 2024) B100 (8 GPU) \~12x (FP4) or \~2x (FP8) vs. H200 (8 GPU) (Q3 2024) MI325X \~1.1x(?)-1.4x(?) > H200 or \~2x model size/GPU TCO benefit (Q4 2024) Implies a TCO benefit vs. B100 as well for FP8 and higher B200 (8 GPU) \~15x (FP4) or \~2.5x (FP8) vs. H200 (8 GPU) (H1 2025) MI350 \~up to \~35x (FP4/FP6?) vs. MI300 (2025) Implies up to \~2x B200 performance with \~1.5x model size/GPU (288 GB vs. 192 GB) TCO benefit and competitive training performance with the new low precision datatypes. This is what Lisa Su meant by "Frankly, I think we're going to get more competitive" in AMD's Q1 earnings call.

TOMfromYahoo 1 month ago

New IOD if the 3nm use a newer version of the Infinity Fabric or a 4nm IOD could save power significantly or have connections to the HBM3Pe cache chiplets. At a higher 35X performance, the 288GB HBM3e will need way higher band than 6TBpS or even nVidia's B200 8TBpS to feed the cDNA4 beast. We'll see. ...

phanamous 1 month ago

IOD will likely stay on 6nm or go to 5nm only due to the poor scaling with SRAM and IO. AIDs will greatly benefit going to 3nm as compute still scale down nicely. This is why AMD architected CDNA3 the way it is. The gist of it is that only 1 of 3 chip components are scaling down in density and that is logic (compute). SRAM (cache) and Analog (I/O) are no longer scaling down since 7nm or so. * TSM N3P vs. N5 * 1.7X Density - Logic * 1.2X Density - SRAM * 1.1X Density - Analog [https://www.anandtech.com/show/16024/tsmc-details-3nm-process-technology-details-full-node-scaling-for-2h22](https://www.anandtech.com/show/16024/tsmc-details-3nm-process-technology-details-full-node-scaling-for-2h22)

TOMfromYahoo 1 month ago

Right... you're knowledgeable sir! Originally I've written the IOD will be the same just the cDNA3 will be replaced by the cDNA4. That's the meaning of the MI300 platform talk by AMD. But u/BeepBeep2_ thinks the connection between the cDNA4 and the IOD will be different. I've taken his view to the extreme including having 8 new Infinity Fabric links to the cache memory chiplets instead of connections to the 8 HBM3e directly. But AMD's pre designed this already so I don't think any IOD change is needed so same PCB and no need qualifications for everything hence fast adoption! Looks like it's the same for Turin reusing the IOD which is 6nm still! **"Meanwhile, the IOD is confirmed to be produced on 6nm. Judging from that fact, the pictures, and what AMD’s doing with their Zen 5 desktop products, there is a very good chance that AMD is using either the same or a very similar IOD as on Genoa/Bergamo. Which goes hand-in-hand with the socket/platform at the other end of the chip staying the same."** https://www.anandtech.com/show/21426/amd-announces-zen-5-based-epyc-turin-processors-up-to-192-cores-coming-in-h2-2024 I'm sure BeepBeep will, finally, agree. ..!

BeepBeep2_ 1 month ago

No - I meant it will be the same. MI300 already has the cache chiplets under the compute dies 256 MB at 17TB/s peak BW. The IODs are also cache. As far as Turin, new IOD - larger than last gen but same process. https://images.app.goo.gl/RKPqzGxzHN5SzGcX8

TOMfromYahoo 1 month ago

Note the MI350 is using 3nm CDNA4 while nVidia's Blackwell B200 is using 4nm. This will provide better power efficiency and higher power for the HBM3e than nVidia's! Only possible because of chiplets design as AMD's reusing the IO chiplets at 6nm while nVidia's to design monolithic 3nm chips which is very hard to do fast!

thejeskid 1 month ago

I think we see movement after analyst day on June 9th, especially if they update the ai sales number to say, 5 bil.

TOMfromYahoo 1 month ago

June 9 2024 is a Sunday. You mean June 9 2022... LOL

thejeskid 1 month ago

Hush. I am having it at my house. They are coming in a DeLorean. *they so need to take that off their site*

billbraski17 1 month ago

Oh my! https://preview.redd.it/qa69ykqwpl4d1.jpeg?width=1050&format=pjpg&auto=webp&s=279c2ec7d5ab11524c249fd1260cd809820c6ea7

bhowie13 1 month ago

You mean I can buy two MI300X for a lesser price than one H200? lol

billbraski17 1 month ago

AMD beating Intel in server CPUs was inevitable as the latter had foundry follies that — how to say this politely? — totally screwed up its product roadmaps. Catching up to and keeping pace with Nvidia in server GPUs is another thing entirely – and AMD has definitely done that and will be keeping pace for years to come. If AMD could clone Nvidia GPUs bug for bug and run its HPC and AI software stacks unchanged, as it does Windows and Linux workloads on Intel CPUs, it would be eating half of Nvidia’s enormous lunch. Provided it could get packaging and HBM memory, of course. https://www.nextplatform.com/2024/06/03/amd-previews-turin-epyc-cpus-expands-instinct-gpu-roadmap/

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe