Things are scalable when they also work with large amounts. In our case we usually intend large numbers of people. See also: Wikipedia:Scalability.

Here's a paper on the scalability requirements of social networks and chat systems.. http://social.psyced.org

Contents

1 A historic note on XMPP's scalability

If you're going to do a standard for Instant Messaging (even 1-1 messaging), ignoring the requirements imposed by the possibility of group communication would likely lead you to a protocol that doesn't satisfy anyone's actual requirements. 
Larry Masinter, Parc Xerox, co-author of HTTP/1.1, Mon, 18 Oct 1999 16:41:44 PDT on the IETF working group IMPP mailing list (<004b01bf19c2$56085640$15d0000d@copper.parc.xerox.com> archived by the wayback machine). His comment was ignored (and various people who agreed with him), and so now the IMPP and subsequently XMPP (Jabber) specifications enforce a standard which fans out presence to a group of peers without a distribution strategy for it.

2 How does PSYC scale better?

Many technologies spend much time and sweat into making their implementations as fast and scalable as possible, like an ircd for IRC. In the case of IRC this is necessary, because adding more servers also raises the overall amount of IRC protocol overhead.

psyced scales pretty well, too. But we typically never put more than a thousand people on one instance, as the select() system call doesn't scale that well. Yes, we have heard of poll() and kernel signaling etc. But making the server dramatically fast is not the point. If it were, then we would have chosen C++, not LPC.

The point is, PSYC is the only protocol which does not cause overhead when distributing millions or billions of people on psyced servers across the Internet. It is consciously designed that way, and has a history of success stories of high load deployments.

Even if you plan to run a large service, it's easier to run a farm of psyceds in parallel than optimizing the hell out of a single server's networking and scalability. The optimal PSYC network is when every interested group of people has their PSYC server somewhere like they have a mail or web servers today. You may want to call this concept federation, but the difference is, with PSYC it can be expected to actually work.

Update: With PSYC2 we no longer recommend federation, instead we run a distributed network of personal nodes that lets groups of people share information according to their necessities (using channels, trust levels etc). On the general PSYC philosophy of scalability compared to federation and cloud technology, see also http://secushare.org/pubsub about pubsub.

2.1 Registration for the Serverless

So what do you do when a new user starts a client and needs to get started somewhere? In PSYC you have two possibilities. Either you only accept registered uniforms, so you send the users to a directory of Public Servers to register first, or the server who got the registration request issues a nice and friendly _redirect to some other server it thinks is appropriate for this person (you can even take network topology in consideration in this case).

2.2 See also

3 Suprising Stories from Alternate Realities

3.1 ejabberd: 6000 users is a lot?

process-one announces an AJAX extension to ejabberd which is supposed to be very very scalable, just look at how very often they use the words big, large and scalable. The article says

We also tried a simple Tsung HTTP benchmark. On the same hardware, we reach 6000 simultaneous users, serving a maximum of 1500 requests per second in our scenario. As you can see, all those new developments are really high-performance.

Alright, 6000 simulated users. symlynX has already had more than 10000 users in real world business applications on its comet-like HTTP push webchat interface, using decentralized PSYC servers. So PSYC has already shown to be more scalable five years ago.

3.2 the only store on Massively Scalable Avenue

So Jabber.com thinks they are they only scalable presence server system. Well I can understand that in Jabberland they might even be, because considering that the XMPP protocol itself cannot scale well (see Jabber for details), as it doesn't solve the many-to-many distribution problem that presence is, the only way to handle large amounts is to keep them all in huge mainframes and ideally run something smarter than XMPP between the CPUs of such a mainframe. So why aren't we doing such mainframe thingos yet? Well, by multicasting presence we can keep such a userbase truly decentralized so there is no need to use enterprise style mainframes - we can use the Internet as it is and intended to stay - decentralized (If you really need to provide servers for large amounts of users, just set up a network of servers - ideally distributed topologically all over the Internet). See also load balancing.

3.3 ejabberd developer discovers: interserver XMPP doesn't scale

Mickaël Rémond starts seeing the difficulties come up in the distance somewhere, and speaks of meta routing nodes as a way to work around multicast routing. There are some lessons from IRC to be learned, and of course you could have a look at PSYC and PSYC as Jabber S2S. Mickaël says:

This will exercice another possible scale limit: What will happen when two servers (for example AIM and Google) need to connect on each others for any reason, for example if one of the server need to restart ? If those two servers have a large overlap in their users base, that is to say if many users from one servers are linked to many users in the other, those two server would need to synchronize presence for millions of users at once.

This is notable on IRC as the connection burst, welcome on the terrain of issues that have been addressed decades ago. First of all, neither AOL nor Google run a single server that suddenly needs to restart, so the situation isn't that bad. Still synchronizing presence over XMPP is a tough nut. XMPP doesn't have multicast subscriptions to make sure you don't have to send the same presence to each recipient, XMPP also doesn't have decentralized state which allows a restarting server to merely send the updates to the other side rather than dumping the entire thing. So your only option is to make your servers smarter to pretend they never really restarted, save all presence information - restart - then see which of all clients reconnect. Those who do not return you can announce as being offline. Looks like XMPP servers are going down that path, or rather.. they should.

See also.. http://blogs.sun.com/mridul/entry/s2s_connection_availability_state_recovery

3.4 Is Google really using Jabber technology?

There is good reason to doubt that Google Talk is using XMPP stuff internally. Have a read on Google Talk.

3.5 Nur 180 Teilnehmer im öffentlichen Fernsehen

Zuerst läuft ein Tickertext durch das WM-Achtelfinalespiel, welches von Millionen Zuschauern verfolgt wird (so geschehen bei Italien-Australien vom 26.6.2006). Und wenn man dann auf die Website geht, erhält man dieses Pop-Up:

sport.ARD.de.witz.gif

Wie bitte? Nur 180 Personen dürfen teilnehmen? Und dafür muss man Werbung im Fernsehen machen? Um 180 Personen zusammenzukriegen?

Ein Chat hat gefälligst genausoviele Menschen zu verkraften wie daran teilnehmen wollen. Dass man nicht jede Frage drannehmen kann, ist klar, aber dass 99% der potentiellen Fragesteller gar nicht erst fragen dürfen, ist blöd.

P.S.: Wieso steht im <title/> eigentlich, dass Günther Netzer der Gast sei?



backlinks: