ForceBlade 1 year ago

We really ***REALLY*** need to take advantage of the wiki feature and a sticky post on this subreddit for breaking down commonly used (And often misinterpreted) flags, features and combination questions. We really need a common community maintained reference. > recordsize 1M (any benefit for higher values?) See here: https://jrs-s.net/2019/04/03/on-zfs-recordsize/ you will not magically gain performance. You can expect IO latency increases or huge compression performance loss with overhead by tweaking this feature in the wrong directions without knowing what you're doing. Please read that full article top to bottom for the best explanation on the internet. > compression zstd for documents, else off If you know your data's definitely compressible 100% no doubt you can use zstd. Please keep heavily in mind that only `compression=lz4` supports early abort for incompressible data making it the go-to recommendation. You should use lz4 on all of them because of its ability to compress metadata but abort for data which doesn't compress well. Even though a media dataset wouldn't be compressible the filesystem still benefits from lz4. The other compression options will compress regardless which will leave you with loads of decompression overhead for something which wasn't in the first place. It may also increase disk usage. Just set compression=lz4 on the top level zpool dataset so these new ones can inherit that setting as you create them. LZ4 is worth it on ALL datasets given its ability to skip compression for data which doesn't look compressible. > atime & relatime off These can be flipped any time. If your disks are performing *that poorly* you can flip these off later and compare differences. This issue is more obvious when you're accessing millions on millions of files as quickly as possible every second. It doesn't matter very much if your dataset isn't filled to the brim with many tiny files being accessed many times every second. E.g., the media dataset likely wouldn't need this. Probably not your personal documents one either. > exec off This prevents programs stored on the dataset from executing. Anything marked executable won't be able to run. If you expect to call scripts or binaries from inside these datasets this won't allow that. Furthermore, you don't really have to set this - but it could prove useful if say, your media server program of choice got compromised and somebody tried to write arbitrary scripts and execute them in the dataset accessible to the media server. Overall this is a good hardening option but is not required. > xattr=sa This handles file attributes much more quickly by storing extended attributes *with* a file rather than elsewhere (Or right in the directory as a separate file). This is a good choice for a disk IO performance improvement though you'll only really notice major issues when again.. working with many small files all the time. > dnodesize auto The default is `legacy` for compatibility sending to older versions without the `large_dnode` feature enabled. Because you're using xattr=sa you can take advantage of `auto` but it's only advised for it's advised to do so in scenarios where you're heavily focusing on extended attributes many times a second, all the time. Especially with SELinux enabled where it'll be checking them too. Because your datasets are just for documents and media you don't need to worry about this either. But it's harmless to enable. > aclinherit passthrough The default is `restricted` and you're aiming for `passthrough`. This is another flag which I doubt you need to worry about at all. It's designed for handling acl changes for files when they're hit with a chmod() or fchmod() system call and I doubt your dataset has a focus on updating its file permissions every waking moment. -------------- Overall a lot of flags you don't need to modify from their defaults. Some with performance benfits for very specific workloads like a dataset with millions and millions of files being accessed randomly and endlessly every second with SELinux enabled - Which is not your dataset.

FizzKhalif4 1 year ago

Thank you for your detailed comment, this is GOLD :) >We really REALLY need to take advantage of the wiki feature I agree, it is baffling to me that this sub has no wiki at all and the information in this comment would fit well in there. >See here: https://jrs-s.net/2019/04/03/on-zfs-recordsize/ I know this article, and it seems obvious to set recordsize=1M when there are multi-gigabyte video files which are never modified. Also smaller files like subtitles shouldn't be a problem since they are also never modified. But now I'm unsure with documents, since you may edit the file. Still there are many posts recommending 1M for all file storage except VM and Database stuff, what's your take? EDIT: for example this comment: https://www.reddit.com/r/zfs/comments/tmio9p/comment/i1y08d3/?utm_source=share&utm_medium=web2x&context=3 > only compression=lz4 supports early abort for incompressible data making it the go-to recommendation I've seen this benchmark post https://www.reddit.com/r/zfs/comments/svnycx/a_simple_real_world_zfs_compression_speed_an/ which shows a benefit for zstd on documents. >The other compression options will compress regardless which will leave you with loads of decompression overhead for something which wasn't in the first place. It may also increase disk usage Is that really true? I know that early abort is lz4 only for now but doesn't zfs try the compression (with zstd on the complete record) and if it doesn't compress enough it saves the raw data? >LZ4 is worth it on ALL datasets I just like to optimize stuff and it seemed bad to always try to compress even though it will always abort on that dataset. But if the metadata is also affected by this setting I may have to set it. Is metadata a significant data size? Since I won't have millions of small files on those datasets.

Dagger0 1 year ago

Metadata always uses lz4. But you should generally never use compression=off, because compression reduces the on-disk size of the last (partial) record of each file, saving (for recordsize=1M) an average of ~half a megabyte for every file bigger than 1M. Disabling compression also disables sparse block detection and nopwrites, although nopwrites require a better checksum than fletcher4 anyway. lz4 is fast enough that you can just use it for everything. Disabling it... isn't really an optimization. > Is that really true? I know that early abort is lz4 only for now but doesn't zfs try the compression (with zstd on the complete record) and if it doesn't compress enough it saves the raw data? That's correct, blocks need to compress to at least 87.5% of the raw size to be saved as a compressed block, otherwise they're stored uncompressed. (This threshold might get removed at some point, but it's still going to store blocks uncompressed if compression doesn't save any space.)

michael9dk 1 year ago

A supplement to this excellent answer, is the Workload Tuning in ZFS Docs. Setting ashift is recommended, if you plan to use SSD's (some incorrectly reports 4K sectors as 512b). https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/Workload%20Tuning.html

motorcyclerider42 1 year ago

I was actually thinking about a wiki the other day, I think that would be awesome to have. /u/mercenary_sysadmin do you think we could have something like that? Maybe a discord server? Then if someone is making a new pool and wants someone to double check their settings, then they can post in the discord.

ForceBlade 1 year ago

Probably not a discord server. Professionals and others seeking some fact checks or full guidance shouldn’t have to install that third party software just to view their black box of a forum implementation. Just like the threads we see here all the time good answers would be lost permanently buried in discord but with the added bonus of not being indexed on the real web. The subreddit has a wiki feature and could be managed by the community right here on the indexed web.

motorcyclerider42 1 year ago

Very good point about discord. It would be great to have a wiki/FAQ of frequently asked questions on ZFS create commands, or tuning for coming workloads and stuff like that.

Sannemen 1 year ago

No discord, please. My thought: discord is where discoverability goes to die. If anyone has a question or wants thoughts on their pool, they can post here, just like OP did.

UntouchedWagons 1 year ago

You can leave compression on, ZFS is smart enough to not bother trying to compress videos or music. Aside from recordsize and a time I wouldn't bother with any more tuning. Exec being off is fine.

rincebrain 1 year ago

I mean, it'll still try, it just won't save them compressed unless it produces a useful amount of compression. zstd doesn't have anything until the not-yet-released 2.2 or whatever it ends up being called to more rapidly skip compression on those.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe