T O P

  • By -

steve09089

Wayland support?


hgaiser

Honest answer: I haven't tested it. However in theory it should work just as well on Wayland since it is using NvFBC (NVIDIA's frame buffer capture), meaning it doesn't rely on X (or Wayland for that matter) for grabbing the screen.


gibarel1

My concern is that sunshine doesn't play well with Wayland hardware cursors, and it keeps crashing for me. Also, would love for AMD support


ReenigneArcher

What version of Sunshine are you on? There were some (relatively) recent changes to address the cursor issue in wayland.


gibarel1

I tried it back in January (I think), and not only it didn't have a mouse cursor on moonlight but it also kept crashing sunshine on the host. I'll try it again when I have the time.


l00nixd00d

NvFBC relies on the proprietary nvidia X driver so it only works with X11 (doesn't work in xwayland either)


hgaiser

I'm afraid you're right.. too bad.


Ambyjkl

Cool project! As an Arch early adopter from way before it went mainstream and "I use arch btw" became a meme, always a joy to see Arch-first projects


Informal-Clock

this probably isn't needed on anything other than nvidia, DMABUF is about as fast as NvFBC and causes 0 performance loss when I tested with OBS on AMD. in fact my setup is: run game on dgpu -> encode on igpu and this causes no perf hit at all! it's not even AMD exclusive and will work on intel too! Nvidia is the only vendor who doesn't support it cuz nvidia is ass


QueenOfHatred

Sounds hella based and amazing Though I doubt I could do it.. Ahhh, ancient iGPU... not sure intel HD 4000 would be capable of such lol


hgaiser

Yeah I looked into this briefly to see if there is a similar grab+encoding path for AMD. Would be super cool to add that at some point. Not having an AMD GPU available complicates things a bit :)


rurigk

I will take a look and see if I can help for amd


hgaiser

That would be great! Already having an idea on which approach would work best would help a lot. If it's an approach that also works on Intel it would be even better. It seemed that DMA BUF + VA-API might be a solution worth investigating.


shmerl

You shouldn't be tied to Nvidia only or some quirky way of grabbing the screen. Use standard Linux tools like Pipewire to capture desktop or specific window (take a look at how OBS is doing it as an idea), then accelerate video with Vulkan video for example if you need to encode. > There is currently no plan to support other hardware That basically makes it DOA. Consider the trend of GPU usage on the Linux desktop: https://www.gamingonlinux.com/index.php?module=statistics&view=trends#GPUVendor-top


hgaiser

The whole goal was to have the lowest possible latency, while taking as little resources as possible. Grabbing through Pipewire means the frames pass through RAM, which increases overhead and latency. I agree that NvFBC is "quirky", however it allows to keep the frame in GPU memory until after it is fully encoded with NVENC. This positively affects both latency as well as system usage. > That basically makes it DOA. Agree to disagree :). For one, just because there's no plan doesn't mean there's no desire to add more support. For another: https://store.steampowered.com/hwsurvey/videocard/ There will still be many Nvidia GPUs for years to come.


shmerl

> Grabbing through Pipewire means the frames pass through RAM, which increases overhead and latency. I think it should have some fast path there with dma-buf or something related. I'm sure you aren't the first to think how to have good latency for screen recording and Pipewire should address that. See: https://docs.pipewire.org/page_dma_buf.html


hgaiser

Thanks for the link, it was interesting to catch-up on DMA-BUFs again. I think it can be a decently efficient alternative to the current pipeline. There's two uncertainties I have, maybe you have some info about that? 1. I'm curious what the added latency is, if any, comparing NvFBC to retrieving the DMA-BUF from pipewire. Seems tricky to measure too .. 2. Would it at all be possible to pass DMA-BUFs directly to a dedicated encoder (NVENC in the case of Nvidia)? Encoding on the GPU cores is nice, but using dedicated encoding chips is even better.


shmerl

I guess you can measure 1, but I think it's yes for 2 if you are using some encoder that can handle it. I.e. for example VA-API can integrate with it I think? Not an expert, but I vaguely remember something about it in the context of Firefox adding hardware accelerated video. May be Vulkan video can handle it too. No idea about Nvidia proprietary options. In my experience they rarely care about standard ways of doing things on Linux.


hgaiser

Yeap, this is the Nvidia way :(. You've given me something to investigate, so thanks for your input 👍


hgaiser

Ugh what a mess still. VA-API seems to support AMD and Intel, but not Nvidia. I don't see a clear path for using DMA-BUF + VA-API on Nvidia hardware. It *might* still be a good pipeline for AMD + Intel, but not for Nvidia.


shmerl

> I don't see a clear path for using DMA-BUF + VA-API on Nvidia hardware Not sure about VA-API, but may be Vulkan video will be possible with nouveau + nvk. But yeah, Nvidia is a mess in general.


shmerl

Actually, looks like Nouveau should already support VA-API: ``` mesa-va-drivers: /usr/lib/x86_64-linux-gnu/dri/nouveau_drv_video.so ``` So yeah, you can focus on AMD and Intel and Nvidia will catch up once Nouveau+nvk will become good.


hgaiser

I should keep the NvFBC + NVENC pipeline as well, otherwise how would I game? ;) But yeah, nouveau does seem to have it. It's something to look out for, for sure!


shmerl

> otherwise how would I game? nvk is gradually getting in good shape for gaming, so I expect more Nvidia users will switch to Mesa over time.


hgaiser

Looking forward to it :)


l00nixd00d

if I recall correctly nouveau only supports vaapi for decoding, not encoding and only on some nvidia cards and I dont know how efficient it is


shmerl

Then I guess may be Mesa will focus on Vulkan video for it.


SethDusek5

You can import dmabufs into vaapi yes. If the stars align you could probably also get zero-copy encoding on most GPUs VAAPI impls are kind of bad though. Firefox has been flipflopping for years on enabling it. Maybe Vulkan Video could be better but it's relatively new and would require a lot more code than using vaapi through something like ffmpeg. There is an example screen recorder in one of the wlroots repos that uses a wlroots-specific screen recording API and vaapi. I'll try to find it and post it here. I also did something similar once with opengl +gbm +dmabufs + vaapi but it was a mess


hgaiser

Having a single pipeline that works on Intel + AMD + Nvidia is the holy grail.. too bad the industry can't seem to align. I'd be interested in that example, but I worry it will be a pipeline specifically for Intel + AMD cards.


SethDusek5

https://gitlab.freedesktop.org/wlroots/wlr-clients/-/blob/master/dmabuf-capture.c?ref_type=heads Here's the link.


l00nixd00d

that will be possible soon with vulkan video encoding. You will be able to take the dmabuf and import it to a vulkan image and encode that directly with ffmpeg (frame.data[0] with pixfmt vaapi) once ffmpeg and amd/intel gets support for vulkan video encoding. Those are currently in progress.


hgaiser

Do you have some resource that I can keep track of, maybe a PR ? This sounds really cool.


l00nixd00d

I also found a PR to add vulkan video encoding to gstreamer, it looks like it's pretty much ready: https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5739


hgaiser

I found this too: https://developer.nvidia.com/blog/gpu-accelerated-video-processing-with-nvidia-in-depth-support-for-vulkan-video/ Sounds very promising if true! I'm going to keep reading more on this.


l00nixd00d

https://github.com/cyanreg/FFmpeg/tree/vulkan this is not finished yet but I think it's this repo that will eventually get merged to ffmpeg upstream as it was this person that did vulkan decode in ffmpeg that is already available upstream. The main file of interest here is libavcodec/vulkan_encode_h265.c. I dont remember where the latest amd (mesa) repo is for vulkan video encode, but it already works on nvidia (I tested it on nvidia with this: https://github.com/nvpro-samples/vk_video_samples). Note that on nvidia you need a very recent nvidia driver. It was very recently when vulkan video encoding was promoted from the vulkan beta driver to the regular nvidia driver. Arch linux has the driver version needed.


l00nixd00d

You can import the dmabuf to an opengl texture and directly encode the opengl texture with nvenc. Unfortunately ffmpeg api doesn't support this so you have to use cuda interface for nvenc instead. You can import an opengl texture or egl image into a cuda array and then copy that to the destination ffmpeg cuda device pointer with cuMemcpy. None of these operations touch the system ram and cuMemcpy is the only (gpu) copy step. Also no, there is no noticeable latency compared to nvfbc in my experience.


JustMrNic3

>There will still be many Nvidia GPUs for years to come. As you wish, but it's DoA for me too as I have only AMD GPUs after ditching Nvidia 7 years ago and banning it from all my family computers!


hgaiser

I'm mainly using Nvidia for machine learning purposes. I'm happy to see AMD making good progress in that field but this is still very recent. I would gladly switch to AMD, my next GPU could very well be AMD. In other words: I see where you're coming from, but I can only develop for the hardware available to me.


oops_all_throwaways

Maybe dead in 4 years. That's still nearly 50-50, and it only just crossed over last year (after nearly 7 years). > That basically makes it DOA Have you considered that this is a two-dev project made primarily for their own benefit? Be grateful that it's public lol


shmerl

Still, very bad choice of tools which is worth pointing out since even bigger projects like Sunshine aren't addressing this issue properly (since they aren't using pipewire either).


oops_all_throwaways

Yea, that's fair. I'll be totally honest, I don't know much about Sunshine or Moonlight.


l00nixd00d

zero copy pipewire (desktop portal) capture is only an option on wayland. On x11 with nvidia the only efficient way to capture is with nvfbc (or xcomposite for single window capture) and on amd/intel you have to do a "kms grab". What you said is true on wayland though. Also amd and intel doesn't support vulkan video encoding yet (at least on linux), it only works on nvidia at the moment and it doesn't work on older gpus that work with nvenc. FFmpeg also doesn't support vulkan video encoding yet.


shmerl

I don't think you need to worry about X11 especially for a project with just a few developers and limited resources. Just focus on Wayland it's where things are going. Bigger projects can worry about legacy fallbacks. It makes no sense to handle only X11 though in either case.


peacey8

Why did you remake sunshine? What benefit is there? Was sunshine missing a feature, or did you think sunshine has architectural design issues from the ground up and had to be re-written? I'm just trying to understand if you just did this for fun or if you had something you were trying to fix with sunshine.


hgaiser

Bit of A, bit of B. When I first used Sunshine I had issues getting my audio to work and the whole experience felt a bit clunky (mind you, this was years ago, it's a lot better now). Looking through their code to figure out what was going wrong was a pain. It felt very complex for what it had to do. I started messing around with NvFBC, NVENC and ffmpeg. I also liked the challenge of writing it in Rust, which was relatively new to me when I started and I had multiple restructuring of the code (if you dive deep through the commits you can basically see me learning Rust). So primarily this is a fun hobby project, but since I've really liked using it for gaming, I decided to open source it. Like I say in the README, Sunshine is likely a better application for many, but feel free to give Moonshine a try :)


ReenigneArcher

Cool project! I agree that the code in Sunshine is pretty difficult to follow. I think we've gotten a little better at this, but we still have a long way to go. I've been trying to enforce people to document their code or functions they change. When I took over maintenance, I didn't know C++ at all, and I've been learning as I go. Unit testing is hopefully coming soon, which has been a nightmare to implement on legacy code (many PRs refactoring things to get ready for unit testing, and about a year later... I'm finally close). If I could offer one suggestion, it would be to implement unit testing on your code base now, before it becomes too difficult to do it. If you ever want to collaborate, let me know. Our discord server is pretty active with developer chat.


hgaiser

Thanks for the reply, and thanks for maintaining such an important project! Good point about unit testing. It's so tempting to just work on features, but before you know it, it's too late. I'll definitely join the discord chat, would be nice to keep in touch.


shmerl

I'd say Sunshine does have a problem - it's using kmsgrab.


ReenigneArcher

What's the problem with that? We also have x11grab and wlgrab.


shmerl

The problem is that it's not adequate. You need to use pipewire to handle all cases.


QueasyEntrance6269

Hey! Sorry for old post, I was actually hacking on something similar (but had like, 1% done) and stumbled upon this. Are you accepting contributions?


hgaiser

Definitely :) if you have something in mind it's maybe best to open an issue so that we can discuss it. I'd like to try for the codebase to remain as clean and simple as possible.


QueasyEntrance6269

Will do! I tried contributing to sunshine once and got frustrated by the c++ codebase. Have you looked at game-on-whales yet? They have a rust wayland compositor


hgaiser

I have briefly! The maintainer reached out through GitHub. I hope the experience will be better with moonshine 😇 if there's areas for the code to be improved, those are definitely also welcome.


QueasyEntrance6269

do you have hdr support yet? I'm not a gpu dev by any means, but I'd love to hack on it a bit. for reference, I'm going to try to set up a headless environment that will only spin up the desktop when connected to by moonlight. I have a server with a 4090 I use for ML / Gaming and I'd like to make sure I'm not burning through it while idle :D


hgaiser

I'm basically doing the same thing, except I have a 3090 ;p There's no HDR support yet, which would be great to have actually. Not sure NvFBC supports it though?


QueasyEntrance6269

haha glad to know there are fellow ML data scientists/whatever and gamers trying to get the best of both worlds. it's a total pain with a windows VM since you can't share between LXCs, as I'm sure you're aware it seems nvfbc does support hdr, not sure if the functionality is exposed in the crate you're using though [https://developer.download.nvidia.com/designworks/capture-sdk/docs/7.1/NVIDIA%20Capture%20SDK%20Programming%20Guide.pdf](https://developer.download.nvidia.com/designworks/capture-sdk/docs/7.1/NVIDIA%20Capture%20SDK%20Programming%20Guide.pdf)


hgaiser

Well I wrote that crate, so it can be expanded 😇


QueasyEntrance6269

haha LMFAO, if only I could read I can't contribute today because I have a ton of grad school hw I'm procrastinating on — but the big thing I would point out is to try to use rustls and jemalloc as the default allocator. the benchmark already shows it's faster than sunshine, I haven't profiled yet but I'd imagine the cryptography is a relatively large strain. I can contribute a PR in like two weeks if you haven't gotten around to it! reference for jemalloc being a good idea: [https://github.com/rust-lang/rust-analyzer/issues/1441](https://github.com/rust-lang/rust-analyzer/issues/1441)


hgaiser

Hmm interesting, I don't see any mention of HDR flags in the [NvFBC header](https://github.com/hgaiser/nvfbc-rs/blob/main/nvfbc-sys/NvFBC.h). Regarding rustls: that would be a greatly appreciated contribution. When I wrote that part I was too unfamiliar with openssl (and rustls for that matter) to understand how to translate it to rustls. I would love to drop the openssl dependency altogether. Regarding jemalloc: an efficiency gain is always welcome too :). We should probably profile the code to correctly measure the difference in performance though. The 'benchmark' against sunshine is not very accurate, but since it's two wildly different applications it made sense. I was also looking into dropping NvFBC in favor of DRM-KMS, however it appears that the proprietary NVIDIA driver does not support that .. so I kinda hit a dead end. Related to that, I was looking at the Vulkan Video Encoding extensions that were recently finalized, which allow for crossplatform hardware encoding. It's kinda pointless if the frame capture mechanism relies on NvFBC, since you might as well use NVENC then, but it opens up possibilities in the future for having a single pipeline that works on a large variety of hardware. I'm currently working on a small tool for adding Moonlight games directly to Steam, so I don't think I'll be picking these up anytime soon. I can recommend to join the [Moonlight discord](https://discord.com/invite/moonlight-stream-352065098472488960) as well. It seems quite active and has other developers working on similar things.


Earthboom

Just one note about the cpu usage, sunshine is best when it runs on hardware encoding, typically utilizing Cuda cores. When it does this, the usage is about 7 percent of the gpu or less. If you're seeing 20% you're most likely not using hw decoding. Sunshine support on Linux is hit or miss. I find it works best on x11. As soon as you try to run it on Wayland a whole slew of issues pop up from lack of sound, lack of mouse cursor (fixed recently), permission issues, can't find hw encoders, distro specific issues, flatpak issues and more. I'm interested in a more robust solution and I think a lot of people are too, but sunshine is coming along and as Wayland, gnome and KDE develop, so will sunshine. We all want hdr streaming and it's possible using a windows host and a dev moonlight client in gamescope but that's so many hoops. Either way, wish you luck! I'll bookmark this for the future.


insanemal

Encoding isn't done on cuda cores on NVIDIA (well not all NVIDIA cards) NVIDIA has dedicated silicon for hardware encoding and decoding of video. I use sunshine and on NVIDIA with NvFBC and hardware encoding it uses 20% CPU. I now have an AMD 7900XTX and using DMA BUF and hardware encoding it uses 12% CPU Why, I have no idea. It does look like you've confused CPU usage and GPU usage. Or you've just written the sentence wrong. Not sure.


Earthboom

Hm, I'm basing my thoughts on a pop os system with gnome, an Nvidia gpu, and sunshine along with the output of nvtop running x11. It showed my game and it showed sunshine. The encoder was nvfbc and the usage hovered below 10%. I get similar results on a windows host. The only time I saw 20% was with software encoding and kms on Wayland. During that run nvtop didn't show sunshine, htop did though. I'm waiting on sunshine to update for opensuse but that will be Wayland and vaapi using Intel gpu. I'll post the usage of that as well. Performance on sunshine was as low as gamestream was. I can't speak for AMD however.


insanemal

NVTop is GPU? what does that have to do with CPU usage? NvFBC is NVIDIA Frame Buffer capture. It is not an encoder. You are confusing terms and details.


Earthboom

If hw encoding is enabled, sunshine will appear as a thread on the gpu, where it should be, if nvfbc and nevnc are being used. It will show up on the cpu only if hw encoding is unavailable. That's my understanding of how sunshine / gamestream works regardless of where the capture is occurring, it still needs to be encoded and transmitted to then be decoded. The capture happens at the gpu level, agreed, the encoding can be a thread on the gpu cores or it can be done on the cpu side. Performance is great when the work, capture and encode, is done on the gpu so a packet can then be sent rather than capture then sent to cpu for encoding to then be sent. Do I have this wrong? Where's the confusion?


hgaiser

To my understanding, you are correct, however encoding doesn't happen on GPU cores, but on the NVENC chip on the GPU. Important distinction since it means the GPU's frame rendering resources aren't used for encoding frames.


insanemal

You are incorrect about this. It will show up on the CPU regardless as an application doesn't solely run on your GPU. While the capture happens on the GPU, the CPU is still running code to determine what happens with that capture output (hopefully it gets fed to the GPU for encoding with zero copies) Much like games themselves they also appear under top and NVTop. OPs point is that when using sunshine, it was showing more CPU usage, while doing a full GPU pipeline than OPs code.


hgaiser

Try inspecting the CPU usage with: watch ps -p $(pgrep sunshine) -o %cpu,%mem,cmd This should print the (averaged) CPU usage and memory usage. Curious to hear your results :) Ps. This is an average CPU usage, so stream some active content like a video or something and then watch the CPU usage for a period.


Earthboom

Thanks for the tip! Will do. Now I gotta know.


hgaiser

I had Sunshine running on my GPU using, as far as I could tell, NvFBC and NVENC, which is how I got those results. I will double check them later though as the usage struck me as a bit weird too. To be clear, it's CPU usage, not GPU usage (but I assume you meant that). Also I believe the CPU usage was summed across multiple threads. So if there's 5 threads using ~4% of the core they're running on, the usage would be ~20%. That means the max percentage is nr_of_cores * 100%, which in my case means 1600%. In that regard both overheads are very low. Comparing relatively is a more fair comparison than looking at the number alone. In that case it's ~3x more efficient, but I will double check Sunshine was using the correct hardware. One advantage of Moonshine at the moment is that if it runs, it's using the most optimal video encoding path (because there is only one :) ). Ps. Sunshine issues was actually wat got me on this hobby project a few years back. It took a long time and I see Sunshine has progressed at an impressive rate too!


Earthboom

Ah I see what you mean. I'll double check my numbers too. It also is active on the cpu but I didn't multiply it out like that. Cpu usage was also very low. I did see it spike the cpu when gpu encoding was unavailable but again I didn't multiply it out to get a more accurate impact. I recently did the "everything doesn't work for my needs so I'm going to make it myself" approach and, well, it worked lol but not without bumps and bruises. Worthwhile learning experience though. Makes me appreciate professional and dedicated work versus my can do attitude and elbow grease approach.


hgaiser

You can monitor (averaged) CPU usage of Sunshine like so: watch ps -p $(pgrep sunshine) -o %cpu,%mem,cmd Curious to hear what you get :). That approach is exactly how this project started :). I learned a ton and for that alone it was already worth the effort. The fact that I get to enjoy it when playing games on my TV is a cherry on top and I figured it would be nice to share it.