Update default UDP reader interface setup (INADDR_ANY), clarify bind and sockopts logic#552
Update default UDP reader interface setup (INADDR_ANY), clarify bind and sockopts logic#552ce-rvi wants to merge 3 commits into
Conversation
…st read, and remove redundant IP_MULTICAST_IF sockopt Signed-off-by: Charlie Evans <charles.evans@csiro.au>
Signed-off-by: Charlie Evans <charles.evans@csiro.au>
|
Thanks for this! I'd love to have @veggiemike sanity check this, as he's much closer to where the rubber meets the road with respect to UDP. |
Thanks David. I'd definitely appreciate @veggiemike 's perspective as there was obviously a reasoning behind defaulting the interface in this way for multicast which would be great to understand. |
|
FYI, I'm planning to review this sometime this week... just struggling to find the time. ;-) |
This is traditionally how I've done it on other projects, in an attempt to guess a "correct" multicast interface to use for the IGMP requests, hopefully limiting what has to get set in a config file. Since multicast traffic doesn't follow normal routing principles, you have to explicitly specify which ethernet interface to use when joining the IGMP group. Doesn't matter when you only have one, but does when you have multiple (at least, it did when I tested this thoroughly years ago). Could alternatively default to (Not sure why my brain jumped off the track and started talking about localhost up there... ignore that) |
| "`interface` parameter. (You can ignore this message if you're " | ||
| "actually just doing loopback testing.)") | ||
| sock.setsockopt(socket.IPPROTO_IP, socket.IP_MULTICAST_IF, | ||
| socket.inet_aton(self.interface)) |
There was a problem hiding this comment.
We really do need set the multicast interface. Otherwise people running this on multi-homed machines won't be able to specify which interface to receive multicast traffic on.
There was a problem hiding this comment.
Actually, now that I've said that, I think you're right... you only have to do the IP_MULTICAST_IF call if you're sending multicast traffic. The 2nd param to the IP_ADD_MEMBERSHIP call below is how we specify the interface to receive on.
There was a problem hiding this comment.
That's my understanding as well -- IP_MULTICAST_IF for sending, IP_ADD_MEMBERSHIP for receiving 👍
| ) | ||
|
|
||
|
|
||
| # bind to mc_group:port |
There was a problem hiding this comment.
The intent here, even though you don't have to bind to the mc_group IP, was to hopefully have netstat -pnave --udp output display the mc_group IP instead of INADDR_ANY. That being said, I don't recall if it actually had that effect. It does work, though, and is simply more explicit than binding to INADDR_ANY.
There was a problem hiding this comment.
Fair enough, happy to roll this back 👍
| # NOTE: This means hostname cannot be an alias to localhost, or | ||
| # you won't be able to send IGMP packets correctly. | ||
| # | ||
| self.interface = socket.gethostbyname(socket.gethostname()) |
There was a problem hiding this comment.
The way this was written, we always set the multicast interface using self.interface, but we try to guess a sane default if it wasn't specified. I was trying to avoid using INADDR_ANY by default for this because if your system has multiple ethernet interfaces, the OS will "pick one", the behavior of which is undefined as far as I know, almost definitely varies by OS, and probably not what you want at least half the time.
If what you'd like to see happen is let the OS pick if we haven't specified (which is admittedly probably more likely to be expected by developers), instead of making this resolv-your-hostname guess, then I'd say we should simply set self.interface = socket.INADDR_ANY right here and leave the existing IP_ADD_MEMBERSHIP setsockopt code below alone.
But then, if we're changing the class's behavior, you have to update the docstring at the top of init as well.
There was a problem hiding this comment.
This is interesting and I'd like to understand better how our environments differ, because at least in my environment, I would never expect the current implementation to provide a sane default, but would usually expect the proposed implementation to.
From my side, we are deploying on an Ubuntu server. As you would expect for a 'normal' Ubuntu / Debian server, there is an entry in /etc/hosts like 127.0.1.1 my-example-hostname. As our /etc/nsswitch.conf configuration is also left to defaults, /etc/hosts will be preferred over DNS as a source for host resolution. So, if I call socket.gethostbyname(socket.gethostname()) on my system, I am going to be returned 127.0.1.1, which is not a usable interface for multicast because it's in the loopback range.
In contrast, notwithstanding that the OS might not pick the right or best interface if I specify socket.INADDR_ANY, it should at least pick a non-loopback interface, and (I won't pretend to have a great understanding of what's happening under the hood, but) an interface that it expects to be able to receive on from the specified multicast group.
Curious how this reads from your side, is there anything obviously different in our environments that would account for our different experiences?
There was a problem hiding this comment.
This is interesting and I'd like to understand better how our environments differ, because at least in my environment, I would never expect the current implementation to provide a sane default, but would usually expect the proposed implementation to.
From my side, we are deploying on an Ubuntu server. As you would expect for a 'normal' Ubuntu / Debian server, there is an entry in /etc/hosts like 127.0.1.1 my-example-hostname. As our /etc/nsswitch.conf configuration is also left to defaults, /etc/hosts will be preferred over DNS as a source for host resolution. So, if I call socket.gethostbyname(socket.gethostname()) on my system, I am going to be returned 127.0.1.1, which is not a usable interface for multicast because it's in the loopback range.
This is practically a religious debate among old-school Linux users. ;-) Traditionally, RHEL does NOT set the hostname to an alias in /ec/hosts because servers should be able to resolve their IPs (and it's actually considered INCORECT for your hostname to resolve to localhost). In contrast, Fedora sets a localhost-ish alias to the hostname in /etc/hosts just like Ubuntu now does, because these distros cater to home/mobile users. Ubuntu Server does the same thing as regular Ubuntu (pretty sure), but then they're practically the same product compared to the RHEL/Fedora split.
So, in a RHEL environment (at least as of a few years ago), it's perfectly safe to assume your hostname resolves to your actual IP address. I primarily use Ubuntu (now, because new job was all Debian/Ubuntu), but can't stand the localhost alias and remove it on my servers (I've run down multiple troubleshooting rabbit holes over the years that went off the rails because of bad assumptions being made because the hostname is resolving to localhost). I even go so far as to patch the OpenRVDAS install script to NOT make that alias (pretty sure it still does that).
That being said, years removed from writing this code, I agree you're right: the use case for my resolve-hostname-to-guess is overcomplicated and is going to definitely be wrong on a non-trivial percentage of distros. Just using INADDR_ANY will at least be correct all the time on systems with only a single NIC. Defaulting to INADDR_ANY seems much more "general case".
In contrast, notwithstanding that the OS might not pick the right or best interface if I specify socket.INADDR_ANY, it should at least pick a non-loopback interface, and (I won't pretend to have a great understanding of what's happening under the hood, but) an interface that it expects to be able to receive on from the specified multicast group.
The problem is that in an environment with multiple network interfaces, the kernel CANNOT make a smart choice. Multicast traffic doesn't follow normal routing rules, so it's going to do something silly but hopefully at least deterministic (e.g., always pick the interface with the lowest MAC address). So if you have 2 interfaces, say eth0 on 10.1.10.0/24 and eth1 on 192.168.1.0/24, and INADDR_ANY happens to cause eth1 to be used, your IGMP messages go to 192.168.1.255. But if the actual multicast server is sitting on 10.1.10.0/24, it won't get your IGMP message at all... and you'll get no multicast traffic even though you "joined the group".
There was a problem hiding this comment.
Thanks for the explanation @veggiemike , I've never worked with RHEL so that's news to me but useful to know! And point well taken RE: kernel not being able to make a smart choice between multiple NICs.
Notwithstanding, I'm taking away that you're happy to adopt INADDR_ANY as the default case all things considered (correct me if I'm wrong)
| # | ||
| sock.setsockopt(socket.IPPROTO_IP, socket.IP_ADD_MEMBERSHIP, | ||
| socket.inet_aton(self.mc_group) + socket.inet_aton(self.interface)) | ||
| if self.interface: |
There was a problem hiding this comment.
As suggested above, just blindly use self.interface as originally written w/out any conditional code. We've either been passed in a value or already assigned it to INADDR_ANY.
There was a problem hiding this comment.
I'm totally happy in principle to avoid having conditional code here, however it's a little fiddlier than just setting self.interface = socket.INADDR_ANY in the constructor, because the existing code here calls inet_aton on the value, which we can't pass socket.INADDR_ANY to. We could set self.interface = socket.inet_aton(interface) or socket.INADDR_ANY in the constructor to avoid conditional logic here.
An advantage of the conditional code is that it leaves your battle-tested implementation in place for the one path, and uses our Investigator-battle-tested implementation for the other path.
No strong feelings about this one though, happy to do what the crowd is happy with :)
There was a problem hiding this comment.
I'm totally happy in principle to avoid having conditional code here, however it's a little fiddlier than just setting self.interface = socket.INADDR_ANY in the constructor, because the existing code here calls inet_aton on the value, which we can't pass socket.INADDR_ANY to. We could set self.interface = socket.inet_aton(interface) or socket.INADDR_ANY in the constructor to avoid conditional logic here.
Yep, I missed that. I still like setting it while checking params at the top of the function and then just using it, just make sure it's the same data type. So something like this:
if interface:
# resolve once in constructor
self.interface = socket.inet_aton(socket.gethostbyname(interface))
else:
self.interface = socket.INADDR_ANYThen further down we can just use self.interface consistently w/out calling inet_aton.
While we're at it, we should also prep self.mc_group in the constructor if specified (i.e., make sure if inet_aton is gonna fail, it happens during construction). So stick something like this right after we set self.interface:
if mc_group:
# resolve once in constructor
self.mc_group = socket.inet_aton(socket.gethostbyname(mc_group))
else:
self.mc_group = NoneThat way both self.interface and self.mc_group are ready to get passed into getsockopt calls after construction.
There was a problem hiding this comment.
Agree with what you've suggested here, thanks -- slightly under the pump with other work but will aim to push a commit soon!
There was a problem hiding this comment.
No problem, I know the feeling.
|
And lastly, apologies for taking so long to look at this. Saying "I've been busy" would be a gross understatement. 🤣 |
|
Hi @veggiemike, no worries and thanks for finding the time! I've replied to your comments individually. |
|
Cheers from somewhere near Hawaii! |
|
Hi all - checking in on whether we're close to having this resolved... |
…uctor. Use inet_ntoa to convert addresses back to strings for sock.bind calls. Signed-off-by: Charlie Evans <charles.evans@csiro.au>
|
@veggiemike apols again for the delay, I've been pulled every which way this month! Would you mind taking another look at this in light of my latest commit? I've moved all the conditional Thanks in advance if you've time to take a look! |
Hi folks,
Opening this PR with some suggested changes to the UDP reader, see details below. Feedback welcomed, cheers!
self.interface = socket.gethostbyname(socket.gethostname())is used to derive a default value for the interface. This will resolve to127.0.1.1on many Linux machines which is not a useable interface for reading UDP multicast traffic.Additionally, this PR makes some minor changes in the interest of clarity:
sock.bind((self.mc_group, self.port))tosock.bind(('', self.port)). To my understanding there is no need to specify the group in the bind once the IGMP group has been joined with the preceding logic.socket.IP_MULTICAST_IFsockopt for multicast. This option only controls which interface is used for multicast send, and this class does not need to send.