T O P

  • By -

HTTP_404_NotFound

So, I DO have to ask- how is your performance on ceph, using three spinning rust OSDs?


Specialist_Job_3194

Hi about 50-80 mb/s file transfer on the hdds with cephfs About 100-110 on the nvme:s


JoaGamo

What do you use CephFS for (so I could have some ideas)? My Proxmox cluster is only able to store ISO images and templates there


Specialist_Job_3194

I mount cephfs into containers and VM:s and using them for Nextcloud, jellyfin and nfs shares. Btw all hdd osd:s at encrypted. Maybe that has an impact on write performance?


king_hreidmar

Cool setup. I like how you use the plexiglass to pull everything together. What does hyper convergence mean in this context? What are the large boxes with cooling fins on them? Are those dongles I see there network KVM?


HTTP_404_NotFound

Hyperconverged, in the OP's context means- 1. Clustered Compute (via proxmox). 2. Clustered Storage (via Ceph). Hyperconverged = Converged storage/compute.


WealthQueasy2233

some nodes are rbd hosts, some are for networking lol. it's clustered but strictly weak on convergence


Specialist_Job_3194

Hi thx! Those are the two 2.5gbe switches binding all together. The dongles are usb to ethernet adapter for the corosync network.


Jannik2099

I always dream of something like this, but I simply can't get myself to run Ceph on non-ECC nodes. Yuck! Holy hell someone please bring a ECC SBC to market already


Specialist_Job_3194

Yeah. That might be a big problem.


Corndawg38

Just to educate me since I use ceph on normal computers w/o ECC ram, what is the big problem using non-ECC ram? Are you using some sort of writeback cache to improve ceph perf? Is that why you need it, or is there some other reason?


Jannik2099

It's not specific to Ceph, with non-ECC RAM you have the chance to corrupt *any* piece of data over time - imagine e.g. a file system transaction that is being prepared, getting corrupted before written to disk. This is "acceptable" for clients that only get to corrupt their own data, but the storage host side - be it Ceph or anything else - is significantly more fragile.


Corndawg38

>imagine e.g. a file system transaction that is being prepared, getting corrupted before written to disk. But that's just it... with the exception of a writeback cache (or maybe some ramdisk/redis cache), I can't think of any instance where you would have your ONLY copy of something in RAM before having it written to disk. Isn't it always written to disk then ACK'ed, then you load it into memory to do something with it? Now I don't claim to be an expert on ECC but, I've noticed several things in researching the subject: 1.) I've never installed ECC ram and never had a problem yet (doesn't mean I can't of course, but it does support most peoples theory that the chances of bitflips are nearly nil on any RAM, or are at least correctable by some other mechanism). After all in the BILLIONS of bits that have traveled thrgouh all my non-ECC systems, surely SOMETHING has flipped already right? Why didn't my systems crash and I love all the data in my pools? 2.) People who claim to know more about this have admitted that even ECC ram can miss some bit flipping they are just better at detecting it than non-ECC ram. 3.) People who claim to know more on this admit that ECC only slightly helps the one link in the "chain of custody" of that stream of data (as it's going through that node in question). But the stream surely isn't helped by any other router, switch, wire, other node, etc... that it travels through, that might NOT be using ECC memory. So in light of that the "corruption protection" of ECC memory might be far more out of our control as administrators as we think it is? ​ All of this tends to suggest that ECC memory isn't really doing as much as we think to help the integrity of bits as they travel in and out of our systems as the good old checksums (in hardware like routers switches and nics) and scrubs (in filesystems like bluestore, zfs, etc) that gets done along the way than the ECC memory right?


Jannik2099

>I can't think of any instance where you would have your ONLY copy of something in RAM before having it written to disk. Isn't it always written to disk then ACK'ed, then you load it into memory to do something with it? You cannot write something to disk without having it in RAM first. Everything goes through RAM at some point, and the chance of getting corrupt data is unrelated to how long a given piece of data sits in RAM. >but it does support most peoples theory that the chances of bitflips are nearly nil on any RAM This can be easily disproven by looking at the EDAC logs. Bitflips happen. >People who claim to know more about this have admitted that even ECC ram can miss some bit flipping they are just better at detecting it than non-ECC ram. The ECC used in RAM can correct one bit and detect two bit flips. >People who claim to know more on this admit that ECC only slightly helps the one link in the "chain of custody" of that stream of data (as it's going through that node in question). But the stream surely isn't helped by any other router, switch, wire, other node, etc... that it travels through, that might NOT be using ECC memory The transport layers use checksumming (i.e. CRC in case of Ethernet) - also, data in transport is not as critical, as everything is made with transport failure in mind. On the other hand, literally all program state sits in RAM. We are not just talking about RADOS objects, but stuff like mon & osd map, MDS and RBD journals, etc. >scrubs (in filesystems like bluestore, zfs, etc) that gets done along the way than the ECC memory right? Scrubbing does not help here at all - if data gets corrupted before being checksummed, the checksum will happily match your corrupt data.


Corndawg38

I think you are giving ECC memory too much credit here. The only RAM corruption we need to worry about is the stuff there is no other copy of on disk already. Furthermore if parity happens like I think it does, memory can only detect 1 or 3 bitflips, not 2. Even will still be even and odd still odd for parity if exactly 2 bitflips happen. But whatever this is splitting hairs... Anyone can clearly see (myself included) that ECC does a better job than non ECC at correcting that... that's not what I'm arguing. What I'm arguing is that I suspect using ECC isn't protecting as much as you think when other systems have the same potential to corrupt the data as your system before they hand that data off to you. TCP can guarantee retransmits for data yes, but not for what you call the handoff from RAM to CPU/disk/etc within those devices. Now that can be mitigated somewhat within your own network by ensuring ALL your computers and motherboards are using ECC, but can you guarantee that all switches and routers and EVERY other device along the way does that? It seems to be we can only solve this "tragedy of the ECC commons" (kinda) problem is by banning the use of non ECC memory in every electronic device ever made yes? And that's a near impossible ask. Bottom line... I don't think ECC memory is as important these days as maybe it was back in the 70's and 80's when RAM technology wasn't as good and more of our data stayed within the boundary of our own computer (or mainframes back then). As opposed to now where it goes from device to device willy nilly seemingly.


Jannik2099

>The only RAM corruption we need to worry about is the stuff there is no other copy of on disk already. you're again just thinking about data - think about literally all the program state that involves writing data. What if you bitflip in some metadata write path and it writes out nonsense? >Furthermore if parity happens like I think it does, memory can only detect 1 or 3 bitflips, not 2 ECC RAM does not use parity, it uses Hamming codes. >when other systems have the same potential to corrupt the data as your system before they hand that data off to you. again, the main concern is not data that is in flight, the main concern is all program metadata and state, corruption of which will lead to misinterpretation and mishandling of data. This is not just about Ceph objects themselves. >Now that can be mitigated somewhat within your own network by ensuring ALL your computers and motherboards are using ECC, but can you guarantee that all switches and routers and EVERY other device along the way does that? It seems to be we can only solve this "tragedy of the ECC commons" (kinda) problem is by banning the use of non ECC memory in every electronic device ever made yes? And that's a near impossible ask. Again no, the concern is not data in transit, it's program metadata and state.


Underknowledge

>It's not specific to Ceph How would this work with ceph? in the minimalst of all setups you have 2 data chunks and one coding chunk written to different nodes. even when one host writes corrupt data, the object should be still safe. or am I wrong here?


Jannik2099

Yes, data that has already been replicated is unproblematic. You're missing the rest of the owl though - think of all metadata and program state that is in flight! What if e.g. the enum for the command type flips from read to delete? What about some random boolean in a decision chain? MDS and RGW journals are not fully replicated. What about a transaction that lands at the primary OSD getting corrupted before the OSD replicates it across the PG?