The problem with those is that there's still no good open-source ecosystem around SIP. There are ad-hoc libraries that implement the low level protocol, but little when it comes to a full B2BUA to which you can then connect a SIP client.
The closest to that is Asterisk but IMO it's far from user-friendly, has very poor & incomplete documentation, an arcane configuration language and is plagued by lots of legacy telecoms-specific crap (obscure protocols that are no longer used, etc) that the majority of users won't need.
So there is SIP hardware, but then you just moved the problem elsewhere and now you need a user-friendly SIP server to run it.
You're not wrong regarding SIP, however, having spent time wrapping my head around it; a SIP video door bell, on a typical /24 residential home network, could simply direct-dial a single computer w/o a SIP server, running MicroSIP (on windows) or Linphone (On Linux). You'd probably have to disable DHCP. If you wanted to connect from multiple devices/computers, then the asterisk setup for that is not TOO bad, it would be a few entries to define the endpoints + passwords, and another few lines to make a call from one ring all the others.
I was thinking more along the lines of replicating the Ring (or other proprietary cloud-based doorbell), which means even if you go through all the effort of configuring Asterisk and have a static IP you'll be exposing Asterisk to the internet which I don't feel too comfortable considering it's a huge pile of legacy C code, or otherwise you'll need to put a VPN gateway in front (and have the mobile client VPN into it) which is even more work.
The problem with those is that there's still no good open-source ecosystem around SIP. There are ad-hoc libraries that implement the low level protocol, but little when it comes to a full B2BUA to which you can then connect a SIP client.
The closest to that is Asterisk but IMO it's far from user-friendly, has very poor & incomplete documentation, an arcane configuration language and is plagued by lots of legacy telecoms-specific crap (obscure protocols that are no longer used, etc) that the majority of users won't need.
So there is SIP hardware, but then you just moved the problem elsewhere and now you need a user-friendly SIP server to run it.