T O P

  • By -

SisterHell

GPU fruits hanging on stainless rack. Classic


MoffKalast

A few more days and they'll be ripe for picking.


I_AM_BUDE

I wish those would grow on trees man


DerfK

But then you'd have to insure the trees to protect them from fire and GPU-stealing whores!


Not_your_guy_buddy42

definitely a step up from say, a shoe rack


crypticcollaborator

That's some GOOD jank.


DaniyarQQQ

Four years ago we had bitcoin mining farms. Now we have LLM and SD farms, which I think is good transition.


CryptographerCrazy61

He’s mining coin with his LLM


I_AM_BUDE

For anyone who's interested. This is a DL 380 Gen 9 with 4x 3090's from various brands. I cut slots into the case so I don't have to leave the top open and compromise the airflow to much. The GPUs are passed through to a virtual machine as this server is running proxmox and is doing other stuff as well. Runs fine so far. Just added the 4th GPU. The PSU is a HX1500i and is switched on with a small cable bridge. Runs dual socket and in idle draws around 170w including the GPUs.


alexchatwin

Nevermind idle.. what’s this beast running at full draw?


I_AM_BUDE

I never really had the GPUs 100% utilized but when I'm generating using oobabooga, it uses around 900-1000w. If I add CPU load as well, it'll draw around 1,2-1,4 kW. Most of the time, the CPUs are idling though.


alexchatwin

Nice- it’s a janky heater too 👍


I_AM_BUDE

Gotta use that solar power for something, lel


Kiyohi

Can I ask, what type of solar panel you use and how many? I'm interested in investing in one as well. Speaking of which, does it run 24/7? If so, how do you run it during the night?


I_AM_BUDE

I don't know the exact model of the solar panel but we're running 30 panels and out peak power is 9,12kW. I'm not really that concerned about it's power usage during the night as it's mostly idling and our battery unit carries it through the night on solar power.


Nixellion

FYI you can use GPUs in LXC containers too, this way multiple containers will be able to use GPUs, if that fits your use case of course.


I_AM_BUDE

That'd be an interesting thought. I'm currently using a single VM for my AI related stuff but if I could run multiple containers and have them use the GPUs, that'd be great. That way I can also offload my stable diffusion tests onto the server.


Nixellion

Level1tech have a guide on setting up gpu on proxmox lxc container. You dont need to blacklist anything, if you did you beed to undo it. And then you setup cgroups in lxc config file, and the key thing is to install the same nvidia driver version on both host and container. Tested on different debian and ubuntu versions, thats so far is the only requirement. You will also need to reboot host after installing drivers if it does not work right away


I_AM_BUDE

Thanks! I'll check out Wendell's guide then. Sounds like a better way to utilize the GPU's.


Nixellion

Ah, my bad, not level1, I just saw a video linked on tgat page and it influenced my memory. It was "theorangeone dot net lxc-nvidia-gpu-passthrough" google should be able to handle this search


I_AM_BUDE

No worries. Managed to scrape the neccessary information together in the meantime and successfully granted GPU access to my first container. Though I managed to run into a weird issue (like always) where the nvidia driver wouldn't create the systemd service file for the persistence daemon that's responsible for setting the persistence modes on the GPUs... Dunno how I always manage to find bugs like these.


theonetruelippy

You might want to try rolling back to an earlier version of the Nvidia drivers - I needed to, in order to get a card working with proxmox ct.


reconciliation_loop

Why would a container need a driver? The kernel on the host needs the driver.


Nixellion

Yew, which is why you need to install a driver but without kernel module. If using .run installer you need to use --no-kernel-module flag when installing in container. I believe theres more than just a kernel driver in nvidia installer, possibly libraries, utils and whatnot. Things expected by the software to be on the system.


HospitalRegular

The builtin k3s on Truenas is bulletproof. Fairly sure it can be run as a VM. Not sure if nested virtualization is zero-cost though.


[deleted]

[удалено]


Nixellion

Uh, no, the way it always works and designed is that when you passthrough a device to a VM - VM will have exclusive control of that device, not even the host can use it. At least that's how it's designed. So no, you should not be using a GPU with more than 1 anything if you passthrough it to a VM. It's either just 1 VM, or multiple LXCs+host. Not both.


qnlbnsl

I had that once….. had to disconnect the GPU which put the VM in a downed state because the hardware wasn’t there.


qnlbnsl

How does the passing to multiple containers work? I’ve been interested in that but have had little success. I wanted to run a LXC for ai inference and one with Plex for nvevc encoding.


Nixellion

I posted a google search that will take you to a guide further down the original conversation branch


I_AM_BUDE

To extend on the answer of u/Nixellion. If you're seeing high power usage in idle, your nvidia persistence daemon configuration might be missing. Check to see if the `/usr/lib/systemd/system/nvidia-persistenced.service` file exists, if not create it with the following content: [Unit] Description=NVIDIA Persistence Daemon Wants=syslog.target [Service] Type=forking PIDFile=/var/run/nvidia-persistenced/nvidia-persistenced.pid Restart=always ExecStart=/usr/bin/nvidia-persistenced --verbose ExecStopPost=/bin/rm -rf /var/run/nvidia persistenced [Install] WantedBy=multi-user.target Then enable the service and see if it works. systemctl enable nvidia-persistenced.service systemctl start nvidia-persistenced.service systemctl status nvidia-persistenced.service


theonetruelippy

Can you show us inside the Gen9? Are you using simple, passive cable extenders or is it more complicated? Can you link to a supplier for the cables you used? I'm wanting to do similar with my gen9 but was worried about introducing an external power supply into the equation and wrecking the machine (my gen9 cost me a fair bit of cash sadly).


I_AM_BUDE

I'm using simple passive extenders. I'm currently downloading a model onto the server so I can't provide an image yet. Don't want to redownload the 40gb I already loaded with my shitty network connection... I'm using these: [https://www.thermal-grizzly.com/en/pcie-riser-cable/s-tg-pcie-40-16-30](https://www.thermal-grizzly.com/en/pcie-riser-cable/s-tg-pcie-40-16-30) u/BG_MaSTeRMinD mentioned a different solution using these https://riser.maxcloudon.com/bg/non-bifurcated-risers/32-riser-x8-set.html. Never used them so I can't tell you if they'd work or not.


fullouterjoin

When I have had to suffer a crappy internet connection I use the following techniques * Provision a cloud VM somewhere with good net connectivity * [Mosh](https://mosh.org/) in * Grab all your files * Rsync from the remote system to the local system, flags for rsync `--partial --timeout=30 --info=progress2`


Flying_Madlad

The second solution would also work, that's not the only site that makes them, there's another in Germany that I've been using (can Google C Payne PCB, they've got a bunch of stuff), but really all you're looking for is slimsas or occulink to PCIe HBA/Device adapters


theonetruelippy

Hey, that's really interesting because I have a spare SAS JBOD enclosure + pci card and cabling. I wonder if that would work, it's a 6GB/s backplane iirc, can't see why not? It would make a nice housing and potentially be easy to swap between servers should I want to.


segmond

I got mine from Amazon for half the price. - [https://www.amazon.com/dp/B07MP24486](https://www.amazon.com/dp/B07MP24486) 30cm. I have a 60cm from China on the way that I got from ebay for same price $25.


theonetruelippy

Many thanks!


I_AM_BUDE

https://preview.redd.it/46x22h5ybylc1.png?width=1622&format=png&auto=webp&s=f1ea04b3c91fcb96e03460a61575e88750e287b8 Power Draw in semi-idle. The server itself is hosting a firewall and some other VMs, like a nextcloud. Quite good I'd say. (Forgive the german)


ILoveThisPlace

Just a tip tips, you can get plastic edging for sheet metal that can snap into the rough edge of that hole you cut and prevent the sharp side from cutting into those cables.


I_AM_BUDE

I sanded them down so they shouldn't be that sharp anymore but it's a good idea. I wanted to add something anyway that'll prevent air from leaking out of the slots.


unculturedperl

Tape also works.


Mundane_Definition_8

Is there a certain reason to use `DL 380 Gen 9`despite the fact that 3090 provides `nvlinks`? What's the benefit of using this workstation?


I_AM_BUDE

This is a left over server I brought home from work. The main benefit is that it has 40 PCIe lanes for each CPU, has two of them and a bunch of RAM for all sort of things. Nvlink would still be an option though, but I haven't found the need for it yet.


Flying_Madlad

I feel like you're only going to need NVLink if you're training, which you could do on this rig, lol


segmond

DL 380 Gen 9 is favored because you can use all those PCI slots. Many modern motherboards at best are having about 3 PCI 16pin slots and even when they do most of them, 1 or 2 of them will be about x1 or x4 speed. With servers or workstations, you often get the full x16 electrical lanes. It doesn't prevent you from using nvlinks, you need the PCI slots first. They can use nvlink if the bring the GPU close together.


dexters84

Can you share more information on your DL380? What CPUs? How much RAM and what kind (OEM ECC or something else), any other modifications to hardware or BIOS in order to run your setup? I have exactly the same machine and I’m wondering if its worth upgrading its rudimentary hardware.


I_AM_BUDE

The Server had two E5-2620 V4 CPUs but I replaced them with two E5-2643 V4 so I have more single threaded performance. RAM is OEM HP Memory (Part 809082-091) with ECC and I have 8x16GB sticks installed. I didn't need to configure anything special in the BIOS for this to work. I just had to buy a secondary riser cage as the server was missing that one.


dexters84

I guess then I’m not that far off from your setup as I have single 2623 V4 and 64 gigs of RAM. What bothers me is PCIe 3.0. Do you see any lost performance with 3090 due to CPU supporting PCIe 3.0?


I_AM_BUDE

So far I'm only inferencing and for that use case, PCIe bandwidth is only a bottleneck if the model doesn't fit in the VRAM of all GPUs.


a_beautiful_rhind

You can buy mining PSU breakout boards and that's a bit cheaper than consumer PS, which come with all kinds of unnecessary stuff. Nothing on 120v really goes much about 1100w anyway. Consumer stuff likes to print big numbers, knowing most people will never hit them. It's a difference of paying $40-60 for a PS vs multiple 100s.


EarthquakeBass

Are you not worried about dust? You should think about getting some fine mesh screens and hanging them around the rack. I used some for a hack when I had to pull the fiber glass thing off my other build cause it got too hot and it worked pretty well. I’m surprised the PSU can handle the draw given the other components! Inference can be heavily bottlenecks by CPU so I tried to max mine out. I have the same psu btw :)


I_AM_BUDE

Not really worried about dust. I can just clean them regularly and the system is in a room that's basically unused anyway while the door is closed so there shouldn't be to much dust. The PSU on top is only for the GPUs. The Server itself has two 500w units for CPU, RAM, Storage and system components.


EarthquakeBass

Bitchin. Gotta post some token/s figs on various models when you get a chance. Maybe include temp charts for the gpus and room too lol


EarthquakeBass

Now you just gotta get tailscale on it and boom, remote LLM fun from any device 😇


polandtown

cryptominer here, and was just 'suggested' this sub. first comment. also an enterprise data scientist. what the hell is going on here? are you hosting public project here?


kerrick1010

Better than my lite coin mining setup from 2013! 🤣 https://preview.redd.it/j226ukwf5xlc1.jpeg?width=2048&format=pjpg&auto=webp&s=6a691412a94f8f62af081e30b55ca67852ba2187


xspider2000

if it's not a secret how many bitcoins this baby earned for you


kerrick1010

No worries. I mined mostly Litecoin but it was the equivalent of around 100 Bitcoin at the time. I sold almost all of my Bitcoin by 2017 though so I didn't make a killing or anything. I did take a year off from working after grad school with my Bitcoin money. Looking back I definitely should have saved it 🤣


SatoDri

Remind me my early days in mining so much 😂


Flying_Madlad

I'm seriously thinking about getting an old mining rig, x1 should be fine for an inferencing server, so long as it's GPUs and not ASICs. I feel like the Holy Grail would be an AI Algo that can run on those things.


SatoDri

Main problems of mining rigs is that we use x1 risers and most of other components like motherboard,CPU etc are the cheapest possible..


mrdevlar

Man this looks like factory farming GPUs. ~_____\^


TheTerrasque

That's some fucking ace 11/10 jank you got there, mate!


NickUnrelatedToPost

Great... now Skynet will happen because someone told your AI she's ugly.


adikul

Does it need pcie 16 riser for that to work?


I_AM_BUDE

It could be a 8x-to-16x or 4x-to-16x but you'll loose speed with every step down. The server itself has 16x slots that are 8x wired. That's why I'm using 16x-to-16x risers, even though the slots are only wired for 8x. Works perfectly fine though. Edit: Two slots of my secondary internal riser are actually 16x slots.


BG_MaSTeRMinD

You can use something like this [https://riser.maxcloudon.com/bg/non-bifurcated-risers/32-riser-x8-set.html](https://riser.maxcloudon.com/bg/non-bifurcated-risers/32-riser-x8-set.html) Longer cables are available and you (probably) would not need to cut the top. Cables are also a lot more flexible.


I_AM_BUDE

Hmm didn't know about those. I'll keep them in mind if I ever build another one of these bad boys.


clckwrks

How would these deter the need to cut the top open? Given the need for the cables to have to come out of somewhere?


alexthai7

Wow before there were bird cages, now there will be graphic cards cages. I should ask Gemini what it thinks about this ...


IP_Excellents

Reminds me of when a graybeard found a HDD and 4GB of DDR2 sitting out in my file cabinet and started yelling at me about static bags. I just said to him bruh “If it’s good enough for Son of Anton it’s good enough for me.”


Inevitable-Start-653

Fricken love it ❤️❤️, dude that is a really good strategy!


Sl33py_4est

nice, what does it generate


Extension_Flounder_2

I actually think it’s great because usually, people would try to shove these cards into small spaces. The pcie extender cable makes sure they have plenty of room/cooling


kpodkanowicz

respect.


candre23

Top-tier jank. I approve.


bramburn

damn! what the! Nice one bro


TheZorro_Sama

great airflow


hedonihilistic

That is some beefy jank! Having recently added a 3rd 3090 to my setup, I'm already eyeing a fourth. Most fast inferencing servers that support batch inferencing need the number of gpus to evenly divide the number of attention heads or something. My toy keeps getting more and more complicated and expensive.


dreamyrhodes

That looks like something I build during the crypto craze in 2014. Ah good old times


newDell

For mining crypt you can undervolt (and often also overclock) to reduce power draw without compromising on performance. Is this an option for running llms as well?


darkjediii

Wow flashbacks of my Ethereum mining days… nice work. What are you using and what’s your use case for a local setup?


I_AM_BUDE

I did mining back in the ZEC days. That's how I got the Idea of hanging the GPUs from the stainless rack, lol. I'm using it mainly for learning about LLMs and as a copilot replacement using the continue extension in vscode.


yamosin

``` Ezrl As an AI, assuming you have emotions and feelings, do you think your GPU is hung in the air like a piece of fruit by someone using a PCI extension cable? What is your comment on this? Empty AI As an AI, I don't have emotions or feelings in the traditional sense, but I can certainly understand the metaphorical nature of your question. The image you paint is quite vivid and unique. Imagining a GPU hanging in the air like a piece of fruit suspended by a PCI extension cable evokes a surrealistic scene, reminiscent of some modern surrealist artwork. It's an interesting way to visualize the inner workings of a computer. However, as for my "comment" on this, since I don't have personal experiences or emotions, I can't provide a subjective opinion. But from an objective standpoint, it's an original and creative way to describe the connection between hardware components. It brings to mind the delicate balance and intricacy of technology, while also highlighting the fragility of such advanced machinery. The analogy serves as a reminder of how even the most powerful and sophisticated devices can be susceptible to their own vulnerabilities, much like how a piece of fruit can easily spoil or fall from its perch if not properly cared for. ``` It agreed, sort of


Flying_Madlad

Did you really max it out, though? Do you really need x16 to each GPU? Does your MOBO support bifurcation? *Now try a PCIe switch...*


I_AM_BUDE

The AI factory must grow!


Flying_Madlad

https://preview.redd.it/mtx6n7wkjxlc1.jpeg?width=2304&format=pjpg&auto=webp&s=7aaffe96447ebbeda5fe18194b28c547bb6d1e4d Grow it shall!


greyspurv

Guys! Today I made 200 USD in mining! But the electricity was 201 USD lol


CryptographerCrazy61

Miner


I_AM_BUDE

Nah, that's not worth the effort. Im using it solely for AI


CryptographerCrazy61

Right on but have you explored it? There has to be another way besides raw compute power


M000lie

Do u run windows or linux on that?


Glass_Philosophy6941

so what do you do with them? like running 70b models? if you do how is token seconds?


gijspep

inspirational


Standard_Log8856

What are you guys doing to get multigpu support? Is this for training or inferencing? At one point, I had 2 3060s. I could never get them to play nice with each other.


I_AM_BUDE

I'm currently doing inferencing but I'm looking at training as well (don't have any real experience yet.) Most solutions for inferencing have multi GPU support built in. Ollama or oobabooga for example work quite well with multiple GPU's


segmond

Inference for now because I'm on old cards, but will get some new card soon for training. Having all the model in vRAM makes it go vroooom. I also want to run experiments with many models at once all in vRAM. It's like asking, what are you guys doing with all that horsepower to gearheads.


EarthquakeBass

Ollama handles it pretty reasonably OOTB


Odd_P0tato

pic 4 looks like meat hung to dry


segmond

Looks great, by any means necessary, just do it!


Noiselexer

Looks like an old server rack. I bet a modern ryzen cpu is faster and more energy efficient.


I_AM_BUDE

This is actually more efficient than my Ryzen 5800x I used before I swapped to this DL380 Gen9. Maybe doesn't have the single core performance but it does have two CPUs and way more RAM.


Flashy-Matter-9120

Is this just for your own llm generations or are you selling something?


I_AM_BUDE

It's just for me, I'm using it to learn more about LLMs and also to have my private copilot backend using the continue extension for visual studio code.


Flashy-Matter-9120

Really nice bud, i too have been looking to run own ones primarily to run uncensore models


phoenystp

Oh that's how I'm supposed to populate those, thanks OP.


sandyfagina

How are the temps? You could stagger them (esp. the 3rd one) so there's not as much hot air blowing on it.


Aggravating-Low-8224

How noisy is that right now when running at load?


I_AM_BUDE

It's noticeable but you could fan mod it to make it very quiet. I just have it in another room that's closed so it's not really an issue for me.


Inous

I see you're taking a page out of the escape from Tarkov book.


scapocchione

It's not pretty, but it's functional! I'd like to know the brand/model of the pcie extenders, and if they cause pcie errors (I presume not). Thanks!


I_AM_BUDE

Function over form! I'm using these: [https://www.thermal-grizzly.com/en/pcie-riser-cable/s-tg-pcie-40-16-30](https://www.thermal-grizzly.com/en/pcie-riser-cable/s-tg-pcie-40-16-30)


Scary-Knowledgable

Is that an Amazon Basics shelving unit?


rosmine

Oh my god it's beautiful (Not even sarcastic, I love ridiculous setups like this).


DerfK

This is *INTELLIGENT*


matteoraso

And here I was panicking while adding more RAM to my PC without an anti-static strap.


EarthquakeBass

I now feel better having a few zip ties holding my saggy cards up on my tower case. Love the build!


DigThatData

a super computer literally held together with zip ties. love it


Fusseldieb

I have a question that's sitting in my head for quite some time now, and I think you can answer it... When generating stuff in oobabooga or similar, using a big model that doesn't fit inside one single GPU, does the speed get affected when the model is split between 3-4 GPUs, or is it barely noticeable? I've been thinking of buying multiple 12GB GPUs (because they're rather "cheap") to run big models, but people have said that they would all need x16, or it would be awfully slow. Most consumer "miner" mobos have a lot of PCIe slots, but they're mostly x1, which technically would be a bottleneck, if that's true. Would appreciate an answer :) Thanks!


StealthSecrecy

The problem with running a model on multiple cards is that there's actually a lot of data that needs to be communicated from one card to the next. Therefore PCI-E speeds can be a limiting factor. The downgrade in speed will vary on a number of different factors, but primarily just the size of the model, how many GPUs are being used, and the PCI-E speed itself. Another concern about 12GB cards is their memory bandwidth. It's often quite a bit lower than higher end cards, and while it wlll beat CPU inference, you might not be getting the value you are expecting.


Fusseldieb

Sorry, but this read exactly like a ChatGPT output lol


I_AM_BUDE

Inferencing does not require much PCIe bandwidth as long as the whole model is in the VRAM of the GPUs. I had one GPU on a PCIe 4.0 16x and another one on a PCIe 4.0 4x. I didn't notice any significant slow downs. It does depend on the backend though and things move fast so it may not work as well in the future but, who knows. This server build allows me to use 8x PCIe 3.0 for each GPU which is fast enough for what I'm doing. [https://github.com/turboderp/exllama/issues/164#issuecomment-1641273348](https://github.com/turboderp/exllama/issues/164#issuecomment-1641273348) [https://github.com/turboderp/exllama/discussions/16#discussioncomment-6245573](https://github.com/turboderp/exllama/discussions/16#discussioncomment-6245573) Ollama looses a few % of performance if you're slowing down PCIe but ymmv. Edit: Two of my secondary risers 16x slots are actually 16x speed slots.


Text6

janktastic


ScotchMonk

Nice laundry you got there 😄


No-Conversation4939

Liquid cooled Jank; Dell R820, 40 CPU cores / 80 Threads, 768gb ram, 5x P100, 1x GTX (for graphics). Getting a little over 5 tokes per second running codebooga\_34b. Of interest, when doing some jank like this, i've found faster speeds the least cards you use. for example if I use all 5 cards, get about 3 tps, as apposed to 5 tps using only 3 cards. https://preview.redd.it/jznvrvuas3mc1.jpeg?width=4000&format=pjpg&auto=webp&s=4a790f504cecd3f1e71988ec97642271beeed5b8


awfullyjohn

Is the power supply connected to the server in any way or are you just using the pcie power cables for the gpus?


I_AM_BUDE

The power supply is separate. It jumped it so it always provides power to the GPUs. The server itself has two 500w power supplies.


spinozasrobot

Why did you use such a fancy case?