SPAM - unsolicited unwanted messages, typically intended for SMTP e-mail, but this article looks at it from a generic and PSYC perspective. There is also a more specific term for unsolicited unwanted instant messaging called SPIM. This article is about both.
Contents |
1 Our Strategies
PSYC has some structural advantages in fighting unwanted messages compared to E-Mail, Jabber and IRC. Even centralistic systems suffer from SPIM, although they should be able to handle these things best. In our current PSYC implementations almost none of these concepts are implemented, so if you go spim on PSYC today you will be successful and you can cry out "I spammed PSYC" but you're beside the point if you do.
1.1 Multicast Helps Detecting Unwanted Unsolicited Messages
Good multicasting strategies help you detect SPIM, or fight it. A SPIM message only has the choice to either use proper multicasting, then it doesn't cause much harm trafficwise, and if users tag the message as unwanted, it can be retraced and blacklisted along the network using its worldwide unique packet id. The result would be: A few people do see the message, but it gets removed before it reaches many other recipients.
Or SPIM can avoid multicasting, that is send copies of itself to people one by one. This can be detected as a protocol breach, since there are no operations on PSYC that legally perform something like a broad unicast.
1.1.1 Subscription Strategies
There are two ways how to set up a tree or mesh of recipients for a multicast channel. Either you can collect all subscription requests at the top, then from the top teach the network who is to send something to whom and where, that's top-down. The other way instead is to let the subscriber's server plug themselves into the distribution network, maybe after kindly asking permission to do so from the authority, and handle the distribution themselves, driven by the needs of the recipients.
1.1.2 Bottom-Up Multicast Subscription
Thus, in a bottom-up approach recipients can unsubscribe from a SPIM transmission, eventually leading to whole servers unsubscribing it, so the SPIM no longer gets anywhere. Or if the spimming server insists on sending something on a context that no longer has people on it, the receiving server easily figures out this guy is misbehaving and blacklists him.
Bottom-Up subscription techniques are being used by IP Multicast, even IRC and of course ppp, as it is a PSYC-like multicast implementation.
1.1.3 Top-Down Subscription
Whereas in a top-down approach the SPIM is distributed to all servers, which then will find most recipients have set up privacy bans against the sender, so the message was transferred for nothing. The receiving server may be able to figure out, that the sender probably is a spimmer, but there is nothing it can do to protect itself from the traffic, as the sender isn't doing anything illegal.
An example for a top-down approach is E-Mail/SMTP, but also Jabber's JEP-0033, which emulates E-Mail addressing. Both however do not qualify as multicast, they are just multiple addressing schemes.
1.1.4 Combined Multicast Subscription
PSYC contexts use a nicely innovative combined strategy of top-down after bottom-up, thus cashing in on advantages on both sides, while probably being the safest approach from SPIM.
This is achieved using channels. Once you have subscribed to the context in a bottom-up way, the context gets to reorganize its subscribers in a top-down way into channels as best suited to the application. This makes all types of messaging possible, which require a top-down approach, without making them indistinguishable from an unsolicited mass mailing.
1.1.5 Better Flood Control
Since a multicast message is harder to abuse, it can be reduced or even exempted from flood control, meaning that only unicast traffic needs to be controlled in amounts. This means a lot less disturbance to the end user as flood control is causing with IRC while maximized disturbance to anyone who tries to send SPAM by unicast. You can happily cut & paste things into a chatroom, as long as you're a legal member of that chatroom. If you do it with vandalistic purposes, you are likely to be expelled, that's all. You can freely send new photos over your photo profile channel, you only risk some friend unsubscribing from it. A game server can send several updates per second. If you're subscribed to it, that's what you wanted. An exhibitionist may be streaming a live camera view of himself sitting in front of the computer, over his video channel. If your Internet line is fine and you like watching him, there you go. But if a Spammer tries to send you advertisements, he first needs to lure you into a subscription, and before he can count to three he may already find himself on a blacklist.
Even unicast traffic can be exempted from flood control whenever the sender is on the recipient's subscription list. What remains and needs to be slowed down are messages from strangers. This in exchange can be done heavily, as your regular communications are never in danger of being interspersed with SPAM.
Flood control can thus be as heavy as what is now called greylisting on SMTP: Tell the spammers to retry later. Of course the spammers will quickly learn to retry later, but the receiving server has plenty of time to spot the bad guys by listening to the blacklist broadcasts in its web of trusted servers. The time won is an important strategic factor here.
1.2 Web Of Trust Helps Against SPIM
Additionally, as described in sn_wot on Friendivity, the web of trust also gives intelligence on the trustworthiness of network entities, thus helping to filter SPIM.
Using elridions new _trustee mechanism we can already start making those friendship requests more trustworthy and less SPIM prone. In the long run we may even dare to reject uncredited friendship requests.
1.3 Credibility Values for Servers and its Users
The credibility of a server multiplied by the virtual age of a user or entity (how long it has been registered and used) as provided by that server. the age can be expected to be within a reasonable range. a hacked server of course gets no cred, so age has no relevance in that case.
1.4 Protocol Breach
As stated before, if you send the same content to several recipients without doing it within a context you are committing a PSYC protocol breach. It's not hard to modify each message for a spammer, but this detail forces him to do so rather than leaving him the option.
1.5 Spam Recognition
The fact that unlike on E-Mail, regular mass messages do not have to be distinguished from irregular ones, anything that looks like a mass mailing is suspicious. This is a much better position to start from, when running the remaining messages through SPAM recognition technology. The technology can be a lot stricter than currently with E-Mail, and the amount of processing is hardly an issue.
1.6 Invitation Tokens
Another way to have a stranger get in touch with you without having to rely on spam recognition: You can have invitation tokens lying around wherever you find it useful. The moment the wrong person or robot picks them up, you disable them. You can print them on cards and give them out to people in person. You can limit them to single use, or until they fail you. This is a softer way of whitelisting, since you give trust without requiring anything from the other side yet, like a trustworthy electronic identity.
2 Related Work
XEP 0159 (SPIM-Blocking Control) suggests a Jabber extension that defines an interface for users to control if SPAM filtering is being done on the server and for whom. It does not address how to filter SPIM in the first place.
XEP 0158 (Robot Challenges) is a quite elaborate protocol for ensuring that there is a human on the other side of the client connection. This is interesting, although of course it is better not to allow for broad unicasting in the first place, thus preempting SPAM without having to disturb innocent users. In terms of the spam form: Users will not put up with it
XEP 0161 (SPIM Reporting) is quite useful indeed. To be able to detect broad unicast we might still need a little help from users as the robots would obviously learn how to modify and obfuscate their SPIMs not to look the same technically. So just in case both multicast and web-of-trust fail as anti-SPIM strategies, you can fall back to this to repair multicast-based SPIM annihilation.
NEW: XEP 0268 (Incident Reporting)
NEW: XEP 0275 (Entity Reputation)
3 How does this work in the whole?
Let's start from the PSYC routing level. Since most traffic on PSYC is the consequence of a subscription (presence, chatrooms, friendcasts), a router can expect every message that comes in a multicast context to be spam free (or plain stupid). This may sound simplistic, but it is indeed that simple. This reduces the amount of messages that need to be checked to a ridiculous small amount. This is something both SMTP and XMPP cannot do, as the multiple unicast is considered a legitimate form of distribution.
So what we are looking at is merely the unicast traffic. Here we can apply the web of trust strategy to let anyone through who is vaguely known to be somebody's friend. This is a strategy that could work over SMTP and XMPP too, but SMTP lacks the protocol for trust.
If the sender is a stranger in trust terms, she might have ran into an invitation token of yours, that lets her skip the next checks. This requires a little work on the recipient side, keeping the health of her invitation tokens clean.
In the case of the federated PSYC1 architecture we additionally have the credibility strategy which allows us to give more trust to a user on a trusted server, if he has been a regular user for long enough.
If the sender didn't qualify as spam-unlikely under any of the given strategies, we get to apply flood control/greylist and spam recognition.
If a spam seriously makes it through all of these barriers, we can blacklist (publically announce negative trust) the address by pruning/reporting. That is, the first recipients of a spam can protect others from ever receiving it. This of course means the flood control lists need to be flushed if the sender in question has sufficiently been detected as an offender.
The last part of the whole process clearly is the hardest work, but I expect the first three methods to be so efficient, that a final robot challenges approach is enough to handle the very very few cases of a total stranger legitimately asks to have a private conversation with someone or a complete stranger legitimately asks admission to an otherwise public chat room.
You even have the option of declaring this last very seldom case unacceptable and expect strangers to find a valid invitation token or a patron/advocate/notary/intercessor as some social network systems are already practicing. This would make the whole pruning and blacklisting technology unnecessary, even flood control becomes unnecessary if misbehaviour kicks you out of contexts and lowers your trust credits.
3.1 The spam form Check
You have probably seen the spam form before. Mailing lists and other legitimate email uses would be affected .. yes, when switching to PSYC the old habit of just collecting a list of recipients would no longer be okay as such. You are moving to a protocol that carefully watches your subscriptions. Many email users cannot afford to lose business or alienate potential employers: one of several reasons to slowly migrate rather than to shut down your existing SMTP catastrophe. A new system would open up a parallel universe much like messaging on Myspace and co. are already doing, only with PSYC it would be ok if it took over. Blacklists suck that's why we only get to them after the other mechanisms have pointed administrators to the actual candidates for blacklisting, so I guess it's okay compared to end-users having to manage any white or black lists themselves. Killing them that way is not slow and painful enough Huh? phasing out SMTP would certainly be painful and will probably never complete. Hmm, I guess the PSYC plan passed the spam form. What do you think?
4 Side notes
History of the term spam in networking suggests MUD and Bitnet Relay were involved.
2007 Hint for Spammers: A good way to collect email addresses for spamming is to google for the popular linkTo_UnCryptMailto JavaScript function, then automatically run the results through a linkTo_UnCryptMailto implementation. Detecting e-mail addresses has rarely been easier and so likely to be error free. Enjoy! (Funny how you will find many pages which advocate the use of such a function. Yes! Good idea!)