Multicasting is the strategy needed to efficiently distribute messages to many recipients without running into size limitations (also known as scalability problems). Its primary purposes in PSYC thus is to deliver presence, chatrooms, newscasts and whatever else you may choose to share with more than one other person.
PSYC has a routing layer formerly called MMP. It has a fundamental builtin notion of context and multicasting. The actual implementation of multicasting albeit is flexible and free for any brilliant mind to improve. Let's get into some details on this continously amazing topic in elaborate chat technology. See CryptoChat for encrypted multicast chatrooms.
Contents |
Routing Concepts and Strategies
Three meanings of multicast
The word multicast is frequently used to intend the IP Multicast protocol, which has confusing implications since IP Multicast has more and more been abused to do other jobs like broadcasting in LANs, which has nothing to do with the original idea behind the word multicast.
IP Multicast is an implementation of the multicast concept which doesn't work for a lot of scenarios that PSYC operates in, so PSYC looks at implementing several of its own ideas of multicast.
Recently folks in the XMPP have started using the word multicast too, but they watered it down to mean any form of addressing that will eventually reach multiple recipients. According to their idea of multicast, whenever you write an e-mail with Cc: headers, you are multicasting. Of course everyone at IETF would disagree that SMTP is suddenly to be rediscovered a multicast protocol.
In fact there is a general inflationary trend to use the word multicast as a usage pattern rather than as a technical requirement. On http://ask.slashdot.org/article.pl?sid=08/09/28/1952245 there was a request for multicast document sharing. It prompted a big choice of fine solutions, none of them actually implementing multicast if I'm not mistaken. I presume those solutions typically work for up to a dozen users, so it doesn't matter.
What is Multicasting then?
So we are talking about the concept of multicasting and of the many possibilities to implement it ourselves. See also Wikipedia:Multicast.
Here's to a quick FAQ session:
- <Shoraneth> like ... if I have 15 buddies on jabber.org whom I need to notify of my presence, my server would instead send only one notification to the server and let the receiving server notify the other users, rather than me notify each one individually?
- <lynX> not just that, that's what we call "smart unicast." It's better than sending a copy to each recipient like XMPP does, but we want more. What we are talking about when using the word multicast works like this: you live in Austria and have 15 buddies in Australia (on several servers, because we like decentralization). Multicast only sends one notification down under, and a router down there (like a trustworthy buddy or server) distributes a copy to all involved servers which then create copies of the messages for each recipient connected to them. That's multicasting in essence, but it can get a lot more complicated than that.
- <coyo> So, GeoIP + Trust + Latency as the routing metrics? sounds awesome, actually. very doable.
- <lynX> not just that, that's what we call "smart unicast." It's better than sending a copy to each recipient like XMPP does, but we want more. What we are talking about when using the word multicast works like this: you live in Austria and have 15 buddies in Australia (on several servers, because we like decentralization). Multicast only sends one notification down under, and a router down there (like a trustworthy buddy or server) distributes a copy to all involved servers which then create copies of the messages for each recipient connected to them. That's multicasting in essence, but it can get a lot more complicated than that.
In telephony there is a similar concept called anti-tromboning, usually limited to just two communicating parties whose routing may however have ended up looking like a trombone. Multicast routing is an evolutionary step beyond that.
- german speakers: fefe hat hübsche Slides geschrieben zu Multicast und IP Multicast (pdf). Dasselbe nochmal genauer als Artikel für die c't. PSYC operiert allerdings auf einer makroskopischeren Ebene als IP Multicast. Dies sei nur zur Ergänzung des Wissens also.
Multicasting is the thing to do when you need to send data from a source to several recipients. It is usually based on a subscription model to avoid designating recipients and routes each time, so it isn't stateless. Luckily the typical applications like presence, newscasting and chatrooms already expect a subscription model to be in place. This subscription model also helps avoid SPAM.
Even so-called private messages from one person to another may better be modeled as minimal multicast contexts, because quite often graphical chat applications like to provide an "Add person to this chat" function, which would otherwise require a cumbersome transformation of protocol methods.
IRC channels
Just to remind you how IRC works... In IRC the channel has a role equivalent to that of the multicast group; their existence is dynamic and the actual conversation carried out on a channel MUST only be sent to servers which are supporting users on a given channel. Moreover, the message SHALL only be sent once to every local link as each server is responsible to fan the original message to ensure that it will reach all the recipients. Internet Relay Chat: Architecture
IRC is a simplistic form of multicast, as it only has one possible distribution tree: the network itself, which in many cases isn't the optimal distribution path. This is however already very effective and successful. Only when the network gets large, central links in the network become bottlenecks. They can cause so-called netlag or netsplits. Also those central nodes are popular targets for vandalism, since they disrupt the entire system so successfully. Additionally, IRC's huge presence database puts a heavy load on its network. See the IRC page for details.
XMPP
Jabber instead has no concept of multicast routing at all, so from this point of view it is technically inferior to IRC: It cannot do efficient data distribution. It usually sends a copy of each message to each recipient. At best it can carry a recipient list for several recipients on the same server: XEP-0033 at least turns XMPP into a kind of SMTP with XML-like syntax.
Extending XMPP to multicast means changing fundamental aspects of the protocol RFCs in the IETF, then fixing many XEPs like MUC and pubsub.
<coyo> time to start publishing XEPs, eh? ;P
PSYC junctions
Junctions are useful for contexts with large number of recipients, like newscasting services. They are currently hand-knitted into a network much like IRC networks are hand-knitted by their configuration files. But it is up to you to step in and implement strategies for automatic construction of multicasting trees. It's all there for you to play around with. And of course you will find people you can discuss these things in the PSYC developer room. Please inspect junctions - an intermediate multicasting strategy in PSYC for slightly more details and Junctions for the current practice.
PSYC context slaves
Context slaves are automatically used for small groups and provide what we call "smart unicasting." This means whenever more than one person from server A enters a context on server B, only one message is sent from B to A, then distributed to all recipients on A. This may sound ridiculously obvious, but XMPP for instance doesn't do that - it will send multiple copies over the same TCP link.
In practice, all entities requiring to receive a certain context use the internal subscription API to link into its distribution (register_context() library function in psyced). The authorization to do so has been settled beforehand by means of subscription negotiation. Thus, a list of recipients for a certain context never needs to be sent over the wire.
Context slaves are a building block in the construction of more advanced forms of multicasting, as most strategies need to be able to address all recipients on Server A in one go.
Packet Ids
Some advanced forms of multicast do not strictly think in terms of tree distribution. They transmit packets in more than one way while the recipient needs to figure out which packets are identical. This is done by means of packet identifications.
This allows for redundant network topologies where a message is sent along several routes of a tree either to ensure reliability in unstable environments or to gather intelligence for optimizing routing.
Please read Packet Ids for details.
Friendcasting
In PSYC you have a channel of all of your friends. By adding a friendivity value, such a message can be forwarded by your friends to their respective friends, making it travel a tree of channels of people, the notorious social network. Friendcasting is not a topologically useful strategy, but semantically. It gives you the safety of knowing who you are talking to and the power of the web of trust. I think the smartest approach is to send invitations by friendcast, then the invitees enter a context which uses a more efficient multicasting approach. Ideally the context manager would retain the web-of-trust information for each member, so that it can still be used for security purposes like SPIM deflection. See command for _request_do_cast in action.
Friendcasting makes full use of trust metrics, by letting the original sender provide a _priority we can let her decide how important a piece of news is. If she abuses this to promote uninteresting things, she risks losing trust, but if she has something truly important to say, she can reach out to a thousand people who should actually find this interesting. This is more upfront than "retwittering" but of course any recipient can still add his own importance to a piece of news by re-sharing it.
Onion Routing
el's wrote an experimental implementation of Onion Routing into psycion (perlpsyc) giving PSYC a more P2P twist. We haven't digged deeper into this however and today it is probably easier to use an established pseudonymizing technology for onion routing and encapsulate PSYC packets into it.
BitTorrent Routing
Didn't we design PSYC's multicast layer to be pluggable? Let's have BitTorrent as one out of several routing solutions. It's not completely set out on the task, it's not so realtime, it's binary, but so is TCP and UDP, too. It's here and it works for data intensive casts (See File transfer and Streamcasting too).
- Then again, there is a need for topology-aware P2P routing, so it would probably be wise to use smarter strategies than BitTorrent.
Multipeer Multicast
A possible future distribution algorithm in PSYC, where several distribution routes exist from each transmitting node to the receiving group of nodes. See Routing for some message types that syntactically support this approach. Once a strategy for one-to-many multicasting is implemented, it is feasible to set several of them up within the same context and thus optimize for multipeer usage. Packet counting must be separate for each multicast source (as if it were multiple channels). If a consistent look is required for all receiving clients, simply add timestamps.
Here's a document that has been spending some brainpower on the issues involved with multipeering: http://www.tm.uka.de/forschung/sonstiges/klung98/node6.html
The Multicast Sanity Rule
There is a rule that comes with PSYC multicasting: You may never never modify the contents of a packet during its journey along the multicast structure. Even if that may have nifty useful effects in a tree structure, it certainly won't work in a full mesh situation. Especially when you are using Packet Ids to keep track of packets and do recovery of packets, then you need to be sure that a packet will be the same no matter who you are asking for a copy of it.
This doesn't mean you can't apply MMP state for optimization within your transmission circuit, as long as the resulting packet structure at the receiver's end is the same as at the sender's, and in consequence the context master. The order of variables of course is irrelevant. For debugging or paranoia purposes we could define a way to order the variables (like.. alphabet) so that a checksum can be made and checked along the trip.
But when looking at it from the application layer, there is nothing wrong with changing a message in the context master that you received from a submitter and you are about to send out as a multicast.
- In PSYC2 this isn't a "requirement" anymore, since packet contents is a piece of encrypted data, anyway. You can't go wrong. ;-D
Questions
Excluding one recipient
MysticOne/Shoraneth asked, whether one can send to a group/room/context excluding a single recipient.
In psyc that is a non-trivial operation, because you either defeat the multicast principle by fanning out things manually to all intended recipients, or you define a new multicast context with all people except one.
It is however currently not possible to create a new context without "inviting" the people over and needing interaction from each of them. This is one possible use for channels.
A third option to implement your "exclusion" thing would be to implement PSYC conference control. we don't have that yet, either. it tells all involved routers on a multicast route how certain things are to be done. One of those directives could be to declare a certain member "deaf" and not let him receive content. This could be abused temporarily.. set xy deaf; speak to all except xy; unset xy deaf.
So, the final pragmatic note: You can implement a /sendexcept command in your room where you foreach over the list of members and sendmsg() to each without the guy you want to skip. The message will (or should) then not appear as regular room talk and you will not make use of the multicast tree but instead send a single copy of the message to each recipient.
Let's look at other chat protocols for a second. In IRC you cannot do this either, because a multicast is a multicast and you have to remove someone from the channel to make him not receive something. In Jabber/MUC implementing this is easy, because it doesn't optimize delivery anyway. The /sendexcept hack suggested above would be the "official" way to do it in MUC, as MUC is argueably a hack itself.
The two "future" options where PSYC can do something like this and still distribute by real multicast is where PSYC ventures into new technological depths where no chat system has gone before. It is okay if we first get PSYC do the regular job before we dig into that. :)
Submitting input to the master
- Would it be a good idea to submit input along the multicast tree so that the master isn't in danger to become a bottleneck of incoming TCP connections?
Since we aren't multipeering yet, we need to submit our messages to the master for multicast. The current strategy is to let anyone who has something to say get directly in touch with the master by TCP or UDP. In most circumstances, even in popular contexts, there will only be a few people submitting messages (either because most want to only listen, or because to avoid chaos somebody has decided to switch on moderation anyway). Alternatively it would also be possible to stay within the tree, but this is likely to not only cause overhead but also headaches. We should not implement solutions to problems that we may never face, but it is a good idea to think of what would have to be done, should the problem exist.
Related Work
XCAST, as a special case of IP Multicast which doesn't use IP Multicast. If it were functionable we could possibly plug it into PSYC as a distribution strategy. But I have a feeling what it provides is so minimal, that we already provide more than that ourselves. Can anybody look into this to make sure I am correct?
Secure Multicast Chat Plug-In for Lucane Groupware comes with nice illustration of a cryptographic multicast algorithm called Tree-based Group Diffie-Hellman (TGDH).
See also Wikipedia:Multicast, but apparently there aren't so many alternatives when it comes to application-level multicast. In fact the Wikipedia page is always lagging behind our own ones, and every now and then there's some person who thinks (s)he knows better and denies the existence of any multicast which isn't IP Multicast. How short-sighted.
I should mention that there are huge amounts of theoretical materials and implementations around IP Multicast. The unpractical aspect about them, is that the PSYC approach comes up with a sufficiently different scenario (the group masters as programmable rendezvous points for example) that half-way through the study of them you realize, that they just don't apply to the PSYC case. Yet I am sure some of the research would be very useful to us. In fact I had always hoped we would not have to resolve the multicast problem ourselves, but a suitable back-end for PSYC never showed up in the past years.
With the release of an open source implementation of PGM, this may look different. PGM is a multipeer multicast transport protocol proposed by the IETF.