Skip to main content

Low latency streaming in HD with selective audio

Want to stream a program in HD with just the sound of said program and your mic to a small group of friends, with low-latency, and nothing but opensource software? Here's WIP solution that works on linux

TL;DR Use WebRTC in a mesh/P2P configuration, share your screen, and use pulseaudio's module-null-sink, module-loopback, module-remap-source and module-combine-sink.

Before I start, here are few terms:

WebRTC mesh

All peers know of each other and directly connected to each other in a P2P fashion. As expected, this is the most expensive for the peers as they need to send out their stream to multiple peers and also have enough bandwidth to receive the streams from other peers.

WebRTC Multipoint Control Unit (MFU) and Selective Forwarding Unit (SFU)

In an MFU setting, all peers stream to a server, the server knits the images and audio together to then send it out to the peers. All peers have one input and one output.

In an SFU setting, all peers stream to the server, the server replicates the incoming streams to those participants that want it. All peers have multiple inputs and one output.

See webrtc.ventures for a good description.


Now, from the beginning. You can skip to the solution at the bottom.

Attempts

Along the road to this, there were many attempts that I'll just document here. Hopefully not in long-form... let's see.

Share screen to browser

So, this is the most obvious solution, right? Open up Jitsi, Signal, or some other closed-source solution like Google Meet, Microsoft Teams (🤮), Zoom, etc., share your screen or game window aaaand - nope.

The input sure is 1080p, but what comes out on the other side? 720p or worse; especially when there's movement.

The reason for this is that they use WebRTC in an MTU or SFU (Selective Forwarding Unit) setting. It's expensive for the server and especially if you aren't a paying client, it makes sense to limit the quality in order to save on bandwidth + give incentive to pay to stream HD.

But not only is the quality a problem, it's also a strain on your GPU. If you have a shitty one, which is highly likely if you just have a standard mac or an equivalent mid-range laptop with an onboard GPU, you'll probably drop frames, your whole PC will struggle, fans will spin, and the viewer EXP will be dog.

Get an HDMI capture card and stream that through aforementioned services

Soo, yeah. I didn't know that the services were throttling or limiting quality, therefore, of course this was bound to fail.

Use HDMI capture card to stream through Nextcloud Talk

Nextcloud talk uses a WebRTC mesh setting --> all peers are P2P!

It seemed like the best solution, but if you're running it on docker (like I am), with nginx (like I am), don't know the intricacies of PHP (guess who!), and believe the "just install" marketing (oh yeah, me)... you're in for a great time.

Let me introduce you to Multiple polling of new messages API calls slows down the whole nextcloud instance #6738. No disrespect to the developers of Nextcloud Talk, honestly, this really shouldn't be an issue. Long-polling a chatroom to know who's in it, shouldn't require an advanced configuration and in-depth optimization of your server.

After a single peer joined, Nextcloud nearly ground to a halt and couldn't add any more participants.

Stream to Owncast, call through another app

Owncast is a nice streaming solution, if you just want to viewers and async feedback through a text chat. It uses the omnipresent protocol used by the most popular streaming services Twitch, Youtube, Facebook, Periscope, et al.:

Real-Time Message Protocol (RTMP)

It uses the term "Real Time" veeery loosely as latencies between 5-20 seconds are very normal. Wowza has a great article on protocols and latency.

After one attempt, it was clear that this was not the answer. Listening to me react to an event and seeing the event 5 seconds later is not acceptable.

My solution

Video

Not to beat around the bush: https://p2p.chat (Github) solved the video streaming problem for me. Under the hood it's a WebRTC mesh. Since I have good upstream bandwidth, streaming to multiple people in HD isn't a problem (2Mb/s per person).

Unfortunately, it doesn't support sharing the desktop, so I had to use OBS with a virtual camera (v4l2-loopback).

Audio

Since p2p.chat runs in a browser, it doesn't have advanced access to audio input (just like Jitsi, Teams, Zoom, etc.). That means, if I want only the game audio + my mic, I can't just let it access my desktop audio, because that would mean my friends hearing themselves.

Luckily, pulseaudio (the current default sound server on linux - to be replaced with pipewire) has a bunch of modules, which allow a lot of scenarios: including mine!

Pulse audio has "modules" that are basically extensions to pulseaudio that you one can load or unload dynamically. Sources are inputs and sinks are outputs.

  • null-sink is basically a faux sink (like virtual, fake speaker)
  • combine-sink acts as a sink that forwards the audio to other sinks
  • remap-source creates a source from a sink
  • loopback simply duplicates the packets from a source and send them to a sink - no extra program necessary

With these four modules we can:

  • Route the audio of as many mics into a null-sink using the loopback module
  • Create a new source out of the null-sink using the remap-source module
  • Use the new source as the input to the browser
  • Send the game audio to both the null-sink and the physical speakers using a combine-sink

A quick overview of what I'm aiming for.

As a script it looks like this

#!/bin/bash

# Find the name of the speaker with `pactl list sinks`

pactl load-module module-null-sink sink_name=Virtual-Speaker sink_properties=device.description=Virtual-Speaker
pactl load-module module-loopback sink=Virtual-Speaker 
pactl load-module module-loopback sink=Virtual-Speaker
pactl load-module module-remap-source source_name=Remap-Source master=Virtual-Speaker.monitor
pactl load-module module-combine-sink sink_name=Splitter slaves=alsa_output.pci-0000_0e_00.6.analog-stereo,Virtual-Speaker

🎉🎉🎉

That's it.

Notes

It would be good if there were a program that allowed making this audio graph easily and visualizing the flow of audio. A project for another time....