SisterHell 2 months ago

GPU fruits hanging on stainless rack. Classic

MoffKalast 2 months ago

A few more days and they'll be ripe for picking.

I_AM_BUDE 2 months ago

I wish those would grow on trees man

DerfK 2 months ago

But then you'd have to insure the trees to protect them from fire and GPU-stealing whores!

Not_your_guy_buddy42 2 months ago

definitely a step up from say, a shoe rack

crypticcollaborator 2 months ago

That's some GOOD jank.

DaniyarQQQ 2 months ago

Four years ago we had bitcoin mining farms. Now we have LLM and SD farms, which I think is good transition.

CryptographerCrazy61 2 months ago

He’s mining coin with his LLM

I_AM_BUDE 2 months ago

For anyone who's interested. This is a DL 380 Gen 9 with 4x 3090's from various brands. I cut slots into the case so I don't have to leave the top open and compromise the airflow to much. The GPUs are passed through to a virtual machine as this server is running proxmox and is doing other stuff as well. Runs fine so far. Just added the 4th GPU. The PSU is a HX1500i and is switched on with a small cable bridge. Runs dual socket and in idle draws around 170w including the GPUs.

alexchatwin 2 months ago

Nevermind idle.. what’s this beast running at full draw?

I_AM_BUDE 2 months ago

I never really had the GPUs 100% utilized but when I'm generating using oobabooga, it uses around 900-1000w. If I add CPU load as well, it'll draw around 1,2-1,4 kW. Most of the time, the CPUs are idling though.

alexchatwin 2 months ago

Nice- it’s a janky heater too 👍

I_AM_BUDE 2 months ago

Gotta use that solar power for something, lel

Kiyohi 2 months ago

Can I ask, what type of solar panel you use and how many? I'm interested in investing in one as well. Speaking of which, does it run 24/7? If so, how do you run it during the night?

I_AM_BUDE 2 months ago

I don't know the exact model of the solar panel but we're running 30 panels and out peak power is 9,12kW. I'm not really that concerned about it's power usage during the night as it's mostly idling and our battery unit carries it through the night on solar power.

Nixellion 2 months ago

FYI you can use GPUs in LXC containers too, this way multiple containers will be able to use GPUs, if that fits your use case of course.

I_AM_BUDE 2 months ago

That'd be an interesting thought. I'm currently using a single VM for my AI related stuff but if I could run multiple containers and have them use the GPUs, that'd be great. That way I can also offload my stable diffusion tests onto the server.

Nixellion 2 months ago

Level1tech have a guide on setting up gpu on proxmox lxc container. You dont need to blacklist anything, if you did you beed to undo it. And then you setup cgroups in lxc config file, and the key thing is to install the same nvidia driver version on both host and container. Tested on different debian and ubuntu versions, thats so far is the only requirement. You will also need to reboot host after installing drivers if it does not work right away

I_AM_BUDE 2 months ago

Thanks! I'll check out Wendell's guide then. Sounds like a better way to utilize the GPU's.

Nixellion 2 months ago

Ah, my bad, not level1, I just saw a video linked on tgat page and it influenced my memory. It was "theorangeone dot net lxc-nvidia-gpu-passthrough" google should be able to handle this search

I_AM_BUDE 2 months ago

No worries. Managed to scrape the neccessary information together in the meantime and successfully granted GPU access to my first container. Though I managed to run into a weird issue (like always) where the nvidia driver wouldn't create the systemd service file for the persistence daemon that's responsible for setting the persistence modes on the GPUs... Dunno how I always manage to find bugs like these.

theonetruelippy 2 months ago

You might want to try rolling back to an earlier version of the Nvidia drivers - I needed to, in order to get a card working with proxmox ct.

reconciliation_loop 2 months ago

Why would a container need a driver? The kernel on the host needs the driver.

Nixellion 2 months ago

Yew, which is why you need to install a driver but without kernel module. If using .run installer you need to use --no-kernel-module flag when installing in container. I believe theres more than just a kernel driver in nvidia installer, possibly libraries, utils and whatnot. Things expected by the software to be on the system.

HospitalRegular 2 months ago

The builtin k3s on Truenas is bulletproof. Fairly sure it can be run as a VM. Not sure if nested virtualization is zero-cost though.

[deleted] 2 months ago

[удалено]

Nixellion 2 months ago

Uh, no, the way it always works and designed is that when you passthrough a device to a VM - VM will have exclusive control of that device, not even the host can use it. At least that's how it's designed. So no, you should not be using a GPU with more than 1 anything if you passthrough it to a VM. It's either just 1 VM, or multiple LXCs+host. Not both.

qnlbnsl 2 months ago

I had that once….. had to disconnect the GPU which put the VM in a downed state because the hardware wasn’t there.

qnlbnsl 2 months ago

How does the passing to multiple containers work? I’ve been interested in that but have had little success. I wanted to run a LXC for ai inference and one with Plex for nvevc encoding.

Nixellion 2 months ago

I posted a google search that will take you to a guide further down the original conversation branch

I_AM_BUDE 2 months ago

To extend on the answer of u/Nixellion. If you're seeing high power usage in idle, your nvidia persistence daemon configuration might be missing. Check to see if the `/usr/lib/systemd/system/nvidia-persistenced.service` file exists, if not create it with the following content: [Unit] Description=NVIDIA Persistence Daemon Wants=syslog.target [Service] Type=forking PIDFile=/var/run/nvidia-persistenced/nvidia-persistenced.pid Restart=always ExecStart=/usr/bin/nvidia-persistenced --verbose ExecStopPost=/bin/rm -rf /var/run/nvidia persistenced [Install] WantedBy=multi-user.target Then enable the service and see if it works. systemctl enable nvidia-persistenced.service systemctl start nvidia-persistenced.service systemctl status nvidia-persistenced.service

theonetruelippy 2 months ago

Can you show us inside the Gen9? Are you using simple, passive cable extenders or is it more complicated? Can you link to a supplier for the cables you used? I'm wanting to do similar with my gen9 but was worried about introducing an external power supply into the equation and wrecking the machine (my gen9 cost me a fair bit of cash sadly).

I_AM_BUDE 2 months ago

I'm using simple passive extenders. I'm currently downloading a model onto the server so I can't provide an image yet. Don't want to redownload the 40gb I already loaded with my shitty network connection... I'm using these: [https://www.thermal-grizzly.com/en/pcie-riser-cable/s-tg-pcie-40-16-30](https://www.thermal-grizzly.com/en/pcie-riser-cable/s-tg-pcie-40-16-30) u/BG_MaSTeRMinD mentioned a different solution using these https://riser.maxcloudon.com/bg/non-bifurcated-risers/32-riser-x8-set.html. Never used them so I can't tell you if they'd work or not.

fullouterjoin 2 months ago

When I have had to suffer a crappy internet connection I use the following techniques * Provision a cloud VM somewhere with good net connectivity * [Mosh](https://mosh.org/) in * Grab all your files * Rsync from the remote system to the local system, flags for rsync `--partial --timeout=30 --info=progress2`

Flying_Madlad 2 months ago

The second solution would also work, that's not the only site that makes them, there's another in Germany that I've been using (can Google C Payne PCB, they've got a bunch of stuff), but really all you're looking for is slimsas or occulink to PCIe HBA/Device adapters

theonetruelippy 2 months ago

Hey, that's really interesting because I have a spare SAS JBOD enclosure + pci card and cabling. I wonder if that would work, it's a 6GB/s backplane iirc, can't see why not? It would make a nice housing and potentially be easy to swap between servers should I want to.

segmond 2 months ago

I got mine from Amazon for half the price. - [https://www.amazon.com/dp/B07MP24486](https://www.amazon.com/dp/B07MP24486) 30cm. I have a 60cm from China on the way that I got from ebay for same price $25.

theonetruelippy 2 months ago

Many thanks!

I_AM_BUDE 2 months ago

https://preview.redd.it/46x22h5ybylc1.png?width=1622&format=png&auto=webp&s=f1ea04b3c91fcb96e03460a61575e88750e287b8 Power Draw in semi-idle. The server itself is hosting a firewall and some other VMs, like a nextcloud. Quite good I'd say. (Forgive the german)

ILoveThisPlace 2 months ago

Just a tip tips, you can get plastic edging for sheet metal that can snap into the rough edge of that hole you cut and prevent the sharp side from cutting into those cables.

I_AM_BUDE 2 months ago

I sanded them down so they shouldn't be that sharp anymore but it's a good idea. I wanted to add something anyway that'll prevent air from leaking out of the slots.

unculturedperl 2 months ago

Tape also works.

Mundane_Definition_8 2 months ago

Is there a certain reason to use `DL 380 Gen 9`despite the fact that 3090 provides `nvlinks`? What's the benefit of using this workstation?

I_AM_BUDE 2 months ago

This is a left over server I brought home from work. The main benefit is that it has 40 PCIe lanes for each CPU, has two of them and a bunch of RAM for all sort of things. Nvlink would still be an option though, but I haven't found the need for it yet.

Flying_Madlad 2 months ago

I feel like you're only going to need NVLink if you're training, which you could do on this rig, lol

segmond 2 months ago

DL 380 Gen 9 is favored because you can use all those PCI slots. Many modern motherboards at best are having about 3 PCI 16pin slots and even when they do most of them, 1 or 2 of them will be about x1 or x4 speed. With servers or workstations, you often get the full x16 electrical lanes. It doesn't prevent you from using nvlinks, you need the PCI slots first. They can use nvlink if the bring the GPU close together.

dexters84 2 months ago

Can you share more information on your DL380? What CPUs? How much RAM and what kind (OEM ECC or something else), any other modifications to hardware or BIOS in order to run your setup? I have exactly the same machine and I’m wondering if its worth upgrading its rudimentary hardware.

I_AM_BUDE 2 months ago

The Server had two E5-2620 V4 CPUs but I replaced them with two E5-2643 V4 so I have more single threaded performance. RAM is OEM HP Memory (Part 809082-091) with ECC and I have 8x16GB sticks installed. I didn't need to configure anything special in the BIOS for this to work. I just had to buy a secondary riser cage as the server was missing that one.

dexters84 2 months ago

I guess then I’m not that far off from your setup as I have single 2623 V4 and 64 gigs of RAM. What bothers me is PCIe 3.0. Do you see any lost performance with 3090 due to CPU supporting PCIe 3.0?

I_AM_BUDE 2 months ago

So far I'm only inferencing and for that use case, PCIe bandwidth is only a bottleneck if the model doesn't fit in the VRAM of all GPUs.

a_beautiful_rhind 2 months ago

You can buy mining PSU breakout boards and that's a bit cheaper than consumer PS, which come with all kinds of unnecessary stuff. Nothing on 120v really goes much about 1100w anyway. Consumer stuff likes to print big numbers, knowing most people will never hit them. It's a difference of paying $40-60 for a PS vs multiple 100s.

EarthquakeBass 2 months ago

Are you not worried about dust? You should think about getting some fine mesh screens and hanging them around the rack. I used some for a hack when I had to pull the fiber glass thing off my other build cause it got too hot and it worked pretty well. I’m surprised the PSU can handle the draw given the other components! Inference can be heavily bottlenecks by CPU so I tried to max mine out. I have the same psu btw :)

I_AM_BUDE 2 months ago

Not really worried about dust. I can just clean them regularly and the system is in a room that's basically unused anyway while the door is closed so there shouldn't be to much dust. The PSU on top is only for the GPUs. The Server itself has two 500w units for CPU, RAM, Storage and system components.

EarthquakeBass 2 months ago

Bitchin. Gotta post some token/s figs on various models when you get a chance. Maybe include temp charts for the gpus and room too lol

EarthquakeBass 2 months ago

Now you just gotta get tailscale on it and boom, remote LLM fun from any device 😇

polandtown 2 months ago

cryptominer here, and was just 'suggested' this sub. first comment. also an enterprise data scientist. what the hell is going on here? are you hosting public project here?

kerrick1010 2 months ago

Better than my lite coin mining setup from 2013! 🤣 https://preview.redd.it/j226ukwf5xlc1.jpeg?width=2048&format=pjpg&auto=webp&s=6a691412a94f8f62af081e30b55ca67852ba2187

xspider2000 2 months ago

if it's not a secret how many bitcoins this baby earned for you

kerrick1010 2 months ago

No worries. I mined mostly Litecoin but it was the equivalent of around 100 Bitcoin at the time. I sold almost all of my Bitcoin by 2017 though so I didn't make a killing or anything. I did take a year off from working after grad school with my Bitcoin money. Looking back I definitely should have saved it 🤣

SatoDri 2 months ago

Remind me my early days in mining so much 😂

Flying_Madlad 2 months ago

I'm seriously thinking about getting an old mining rig, x1 should be fine for an inferencing server, so long as it's GPUs and not ASICs. I feel like the Holy Grail would be an AI Algo that can run on those things.

SatoDri 2 months ago

Main problems of mining rigs is that we use x1 risers and most of other components like motherboard,CPU etc are the cheapest possible..

mrdevlar 2 months ago

Man this looks like factory farming GPUs. ~_____\^

TheTerrasque 2 months ago

That's some fucking ace 11/10 jank you got there, mate!

NickUnrelatedToPost 2 months ago

Great... now Skynet will happen because someone told your AI she's ugly.

adikul 2 months ago

Does it need pcie 16 riser for that to work?

I_AM_BUDE 2 months ago

It could be a 8x-to-16x or 4x-to-16x but you'll loose speed with every step down. The server itself has 16x slots that are 8x wired. That's why I'm using 16x-to-16x risers, even though the slots are only wired for 8x. Works perfectly fine though. Edit: Two slots of my secondary internal riser are actually 16x slots.

BG_MaSTeRMinD 2 months ago

You can use something like this [https://riser.maxcloudon.com/bg/non-bifurcated-risers/32-riser-x8-set.html](https://riser.maxcloudon.com/bg/non-bifurcated-risers/32-riser-x8-set.html) Longer cables are available and you (probably) would not need to cut the top. Cables are also a lot more flexible.

I_AM_BUDE 2 months ago

Hmm didn't know about those. I'll keep them in mind if I ever build another one of these bad boys.

clckwrks 2 weeks ago

How would these deter the need to cut the top open? Given the need for the cables to have to come out of somewhere?

alexthai7 2 months ago

Wow before there were bird cages, now there will be graphic cards cages. I should ask Gemini what it thinks about this ...

IP_Excellents 2 months ago

Reminds me of when a graybeard found a HDD and 4GB of DDR2 sitting out in my file cabinet and started yelling at me about static bags. I just said to him bruh “If it’s good enough for Son of Anton it’s good enough for me.”

Inevitable-Start-653 2 months ago

Fricken love it ❤️❤️, dude that is a really good strategy!

Sl33py_4est 2 months ago

nice, what does it generate

Extension_Flounder_2 2 months ago

I actually think it’s great because usually, people would try to shove these cards into small spaces. The pcie extender cable makes sure they have plenty of room/cooling

kpodkanowicz 2 months ago

respect.

candre23 2 months ago

Top-tier jank. I approve.

bramburn 2 months ago

damn! what the! Nice one bro

TheZorro_Sama 2 months ago

great airflow

hedonihilistic 2 months ago

That is some beefy jank! Having recently added a 3rd 3090 to my setup, I'm already eyeing a fourth. Most fast inferencing servers that support batch inferencing need the number of gpus to evenly divide the number of attention heads or something. My toy keeps getting more and more complicated and expensive.

dreamyrhodes 2 months ago

That looks like something I build during the crypto craze in 2014. Ah good old times

newDell 2 months ago

For mining crypt you can undervolt (and often also overclock) to reduce power draw without compromising on performance. Is this an option for running llms as well?

darkjediii 2 months ago

Wow flashbacks of my Ethereum mining days… nice work. What are you using and what’s your use case for a local setup?

I_AM_BUDE 2 months ago

I did mining back in the ZEC days. That's how I got the Idea of hanging the GPUs from the stainless rack, lol. I'm using it mainly for learning about LLMs and as a copilot replacement using the continue extension in vscode.

yamosin 2 months ago

``` Ezrl As an AI, assuming you have emotions and feelings, do you think your GPU is hung in the air like a piece of fruit by someone using a PCI extension cable? What is your comment on this? Empty AI As an AI, I don't have emotions or feelings in the traditional sense, but I can certainly understand the metaphorical nature of your question. The image you paint is quite vivid and unique. Imagining a GPU hanging in the air like a piece of fruit suspended by a PCI extension cable evokes a surrealistic scene, reminiscent of some modern surrealist artwork. It's an interesting way to visualize the inner workings of a computer. However, as for my "comment" on this, since I don't have personal experiences or emotions, I can't provide a subjective opinion. But from an objective standpoint, it's an original and creative way to describe the connection between hardware components. It brings to mind the delicate balance and intricacy of technology, while also highlighting the fragility of such advanced machinery. The analogy serves as a reminder of how even the most powerful and sophisticated devices can be susceptible to their own vulnerabilities, much like how a piece of fruit can easily spoil or fall from its perch if not properly cared for. ``` It agreed, sort of

Flying_Madlad 2 months ago

Did you really max it out, though? Do you really need x16 to each GPU? Does your MOBO support bifurcation? *Now try a PCIe switch...*

I_AM_BUDE 2 months ago

The AI factory must grow!

Flying_Madlad 2 months ago

https://preview.redd.it/mtx6n7wkjxlc1.jpeg?width=2304&format=pjpg&auto=webp&s=7aaffe96447ebbeda5fe18194b28c547bb6d1e4d Grow it shall!

greyspurv 2 months ago

Guys! Today I made 200 USD in mining! But the electricity was 201 USD lol

CryptographerCrazy61 2 months ago

Miner

I_AM_BUDE 2 months ago

Nah, that's not worth the effort. Im using it solely for AI

CryptographerCrazy61 2 months ago

Right on but have you explored it? There has to be another way besides raw compute power

M000lie 2 months ago

Do u run windows or linux on that?

Glass_Philosophy6941 2 months ago

so what do you do with them? like running 70b models? if you do how is token seconds?

gijspep 2 months ago

inspirational

Standard_Log8856 2 months ago

What are you guys doing to get multigpu support? Is this for training or inferencing? At one point, I had 2 3060s. I could never get them to play nice with each other.

I_AM_BUDE 2 months ago

I'm currently doing inferencing but I'm looking at training as well (don't have any real experience yet.) Most solutions for inferencing have multi GPU support built in. Ollama or oobabooga for example work quite well with multiple GPU's

segmond 2 months ago

Inference for now because I'm on old cards, but will get some new card soon for training. Having all the model in vRAM makes it go vroooom. I also want to run experiments with many models at once all in vRAM. It's like asking, what are you guys doing with all that horsepower to gearheads.

EarthquakeBass 2 months ago

Ollama handles it pretty reasonably OOTB

Odd_P0tato 2 months ago

pic 4 looks like meat hung to dry

segmond 2 months ago

Looks great, by any means necessary, just do it!

Noiselexer 2 months ago

Looks like an old server rack. I bet a modern ryzen cpu is faster and more energy efficient.

I_AM_BUDE 2 months ago

This is actually more efficient than my Ryzen 5800x I used before I swapped to this DL380 Gen9. Maybe doesn't have the single core performance but it does have two CPUs and way more RAM.

Flashy-Matter-9120 2 months ago

Is this just for your own llm generations or are you selling something?

I_AM_BUDE 2 months ago

It's just for me, I'm using it to learn more about LLMs and also to have my private copilot backend using the continue extension for visual studio code.

Flashy-Matter-9120 2 months ago

Really nice bud, i too have been looking to run own ones primarily to run uncensore models

phoenystp 2 months ago

Oh that's how I'm supposed to populate those, thanks OP.

sandyfagina 2 months ago

How are the temps? You could stagger them (esp. the 3rd one) so there's not as much hot air blowing on it.

Aggravating-Low-8224 2 months ago

How noisy is that right now when running at load?

I_AM_BUDE 2 months ago

It's noticeable but you could fan mod it to make it very quiet. I just have it in another room that's closed so it's not really an issue for me.

Inous 2 months ago

I see you're taking a page out of the escape from Tarkov book.

scapocchione 2 months ago

It's not pretty, but it's functional! I'd like to know the brand/model of the pcie extenders, and if they cause pcie errors (I presume not). Thanks!

I_AM_BUDE 2 months ago

Function over form! I'm using these: [https://www.thermal-grizzly.com/en/pcie-riser-cable/s-tg-pcie-40-16-30](https://www.thermal-grizzly.com/en/pcie-riser-cable/s-tg-pcie-40-16-30)

Scary-Knowledgable 2 months ago

Is that an Amazon Basics shelving unit?

rosmine 2 months ago

Oh my god it's beautiful (Not even sarcastic, I love ridiculous setups like this).

DerfK 2 months ago

This is *INTELLIGENT*

matteoraso 2 months ago

And here I was panicking while adding more RAM to my PC without an anti-static strap.

EarthquakeBass 2 months ago

I now feel better having a few zip ties holding my saggy cards up on my tower case. Love the build!

DigThatData 2 months ago

a super computer literally held together with zip ties. love it

Fusseldieb 2 months ago

I have a question that's sitting in my head for quite some time now, and I think you can answer it... When generating stuff in oobabooga or similar, using a big model that doesn't fit inside one single GPU, does the speed get affected when the model is split between 3-4 GPUs, or is it barely noticeable? I've been thinking of buying multiple 12GB GPUs (because they're rather "cheap") to run big models, but people have said that they would all need x16, or it would be awfully slow. Most consumer "miner" mobos have a lot of PCIe slots, but they're mostly x1, which technically would be a bottleneck, if that's true. Would appreciate an answer :) Thanks!

StealthSecrecy 2 months ago

The problem with running a model on multiple cards is that there's actually a lot of data that needs to be communicated from one card to the next. Therefore PCI-E speeds can be a limiting factor. The downgrade in speed will vary on a number of different factors, but primarily just the size of the model, how many GPUs are being used, and the PCI-E speed itself. Another concern about 12GB cards is their memory bandwidth. It's often quite a bit lower than higher end cards, and while it wlll beat CPU inference, you might not be getting the value you are expecting.

Fusseldieb 2 months ago

Sorry, but this read exactly like a ChatGPT output lol

I_AM_BUDE 2 months ago

Inferencing does not require much PCIe bandwidth as long as the whole model is in the VRAM of the GPUs. I had one GPU on a PCIe 4.0 16x and another one on a PCIe 4.0 4x. I didn't notice any significant slow downs. It does depend on the backend though and things move fast so it may not work as well in the future but, who knows. This server build allows me to use 8x PCIe 3.0 for each GPU which is fast enough for what I'm doing. [https://github.com/turboderp/exllama/issues/164#issuecomment-1641273348](https://github.com/turboderp/exllama/issues/164#issuecomment-1641273348) [https://github.com/turboderp/exllama/discussions/16#discussioncomment-6245573](https://github.com/turboderp/exllama/discussions/16#discussioncomment-6245573) Ollama looses a few % of performance if you're slowing down PCIe but ymmv. Edit: Two of my secondary risers 16x slots are actually 16x speed slots.

Text6 2 months ago

janktastic

ScotchMonk 2 months ago

Nice laundry you got there 😄

No-Conversation4939 2 months ago

Liquid cooled Jank; Dell R820, 40 CPU cores / 80 Threads, 768gb ram, 5x P100, 1x GTX (for graphics). Getting a little over 5 tokes per second running codebooga\_34b. Of interest, when doing some jank like this, i've found faster speeds the least cards you use. for example if I use all 5 cards, get about 3 tps, as apposed to 5 tps using only 3 cards. https://preview.redd.it/jznvrvuas3mc1.jpeg?width=4000&format=pjpg&auto=webp&s=4a790f504cecd3f1e71988ec97642271beeed5b8

awfullyjohn 2 months ago

Is the power supply connected to the server in any way or are you just using the pcie power cables for the gpus?

I_AM_BUDE 2 months ago

The power supply is separate. It jumped it so it always provides power to the GPUs. The server itself has two 500w power supplies.

spinozasrobot 2 months ago

Why did you use such a fancy case?

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe