Google Talk infrastructure
Here's a tech talk about GoogleTalk held by the charismatic Reza Behforooz.
Even for Google, presence is the number one scalability challenge in IM systems. The mathematical formula presented in the talk confirms that Google Talk is still using broad unicasting in a Jabber style.
presence packets = connected users * buddy list size * online state changes
Concurrently connected users multiplied by how many people they know multiplied by the many availability and other presence changes, partly automatically triggered by the clients.
This formula is not correct when employing multicast routing. From a virtual point of view you are still having that amount of notifications, but the actual amount of packets going over the wire can be optimized radically by a good protocol and a good strategy how to distribute the users on their servers. Reza uses the computer game terminus shard.
Probably several Google services would benefit from a multicast distribution strategy, and maybe some smarter ones actually do. Wikipedia:Google File System should, probably over a single hard-configured topological tree, but that's just a wild guess.
Reza also mentions lots of TCP connections going everywhere between the shards, which means they are not enjoying the lightweightness of UDP, but it doesn't mean they have to be using XMPP between their nodes.
In fact, since all users have static centralistic node names, they must be using their own distribution strategy, not XMPP. The history of GTalk also confirms they had a prototype running long before they took the political decision to support XMPP to the outer world.
I can imagine it has been quite a credibility plus to embrace the XMPP positivity train, and most of all pretending they are using Jabber technology all over gives them a huge advantage over any potential competition: Should someone like say Akamai try to build a distributed chat system using XMPP protocols only, they would run into much fiercer scalability problems than Google has, by secretly using a powerful proprietary backend.
Also it isn't clear which protocol the client applications like GMail are using to communicate with the GTalk backend. Knowing the problems of XMPP it is unlikely they are actually throwing XMPP at each other.
Google probably will get into trouble should users start having serious amounts of friends in their lists and a majority of them really being in outer XMPP. But so far such a development is not happening, and should this be the case, Google is still the best equipped company to handle the load.
Another question is, will so many people care about Google Talk at all? Apparently they do, and apparently they are running into scalability issues.. or maybe something else is causing some interserver trouble with Google Talk. Stay tuned for news.
Google's Android opting for a binary protocol
|android says:||"The com.google.android.xmppService package has been replaced by the com.google.android.gtalkservice package. This is driven by the fact that the GTalk API is not XMPP compliant, and will be less so going forward. The reason is that XMPP is too verbose and inefficient for mobile network connection, and the GTalk API will be moving to a binary encoding for the protocol between the client and the server. There will also be mobile specific protocols added. For M5, however, XMPP is still used in general, but not 100% compliant."|
Note: this is for android only (i.e. mobile phones). And they're talking about the future.
And from the reaction of the xmpp people it looks like Steve Jobs is right: “I actually think Google has achieved their goal without Android, and I now think Android hurts them more than it helps them. It’s just going to divide them and people who want to be their partners.”