Given that most of the widely used Internet protocols such as SMTP, NNTP and FTP are all based on commands which are given via ASCII strings, there seems no reason why a conferencing protocol should be any different. The gains from going to binary are marginal and debugging/testing is not as easy as with ASCII. However, it is not unreasonable for some part of the protocol to be done in binary.
The reason why PSYC was kept textual and not changed to a binary syntax like Thrift is because of its suitability to be processed as templates in text formatting tools. It is very convenient, that simple tools like git2psyc just need to throw some variables into a template string to have a valid PSYC packet. A binary syntax like Google's Protocol Buffers, DNS or RTMP, even a formally text-based syntax like BitTorrent's bencode always requires a renderer implementation to produce a valid packet, whereas PSYC's textuality even led to textdb - a major simplification of rendering operations using a text template engine.
Still PSYC is lightning fast as the benchmarks show. PSYC is probably the fastest protocol syntax to unite the following characteristics:
- truly text-based, you can edit it and use templates to produce it
- for simple purposes simple enough that you can code it yourself
- can efficiently store binary data and binary metadata
- can hold data structures up to a certain depth (to be improved with coming versions)
- can be processed as efficiently as a binary protocol
So you may very well consider using libpsyc for serialization purposes beyond what PSYC was originally designed for. There's nothing wrong with storing data in PSYC packet form in files.
History of PSYC Syntax
Since the early drafts from 1995 until 2007, PSYC had a syntax easily differentiated by the leading period character '.' instead of the new pipe glyph '|'. It has proven quite powerful in many ways, so the new syntax is just an update, and psyced will support the old syntax for as long as it doesn't hurt, given the existing deployments and dozens of source codes using it.
The old syntax suffered of some details which made parsing unnecessarily complicated, in particular the list and multi-line continuation. We dropped those in favour of opaque data variables and lists, that means you can now transmit a list containing 4 gifs, for example.
The new features make the parser roughly as complicated as it was before, but more powerful. See also: history of PSYC.
The W3C Feature Checklist
The term binary XML popularly resumes the wish to fix all the problems with XML in a brilliant new yet equivalent format, in this vein the XBC Working Group of the W3C collected a beautiful wishlist of properties XBC should implement, the ultimate perfect data format syntax. After 2 years of research they conclude as follows:
- The XBC WG developed 18 extensive use cases and documented 38 different format properties and considerations which those use cases might require. The sheer number of requirements has suggested to some that either Binary XML is not achievable or, in attempt to satisfy too many requirements, is destined to collapse under its own weight.
They also conclude that, dropping many of the properties would still result in a useful Binary XML standard which the W3C should produce. Six years later they have in fact come up with Efficient XML which is a binary encoding of XML.
As an exercise for us to know where we stand and for inspiration, let's look at that XBC feature wishlist and see how the PSYC syntax handles those. PSYC isn't truly binary and certainly not XML, yet it has answers for many of the requirements the W3C experts have collected.
XML maps onto PSYC at best as a sort of list of values assigned to XQuery-like variables. The same information is thus flattened out. This sounds quite respect lacking, but when considering uppercase compact variables, it may actually be a viable approach to map the XML tree structure into that, depending on the depth and complexity of the data. It could work out for mere structural uses of XML, such as XMPP, while it probably explodes into too many elements if applied to a text markup such as XHTML.
Text markup therefore belongs into the body, which isn't very surprising really, while having complex structures accessible in a flat manner brings several advantages, as the analysis below elaborates. To understand the context of the comments following, you may want to open the property wishlist into a separate window.
4 Algorithmic Properties 4.1 Processing Efficiency
Looks like PSYC scores pretty well here, we worked hard to optimize parsing and rendering, and since we do not try to handle tree-like complexity, our data binding is pretty nice and straightforward.
4.2 Small Footprint 4.3 Space Efficiency
Certainly smaller than any XML parser. Even with advanced features we are very good at this.
5 Format Properties 5.1 Accelerated Sequential Access
No indexing built-in, but it doesn't make much difference whether a packet is in memory in raw form or if it is parsed. Once parsed, a hash is a natural storage medium, thus making it easy to access specific elements immediately.
In the words of the XBC document, PSYC is clearly delta-based as it continuously modifies state using its + and - operators. A schema-based approach is also planned using signatures while a lossy plan isn't considered.
5.3 Content Type Management
Hm, what about psyc/whatever as a content type? ;) I like the sound of that - Coyo
PSYC supports both +/- deltas, but also inheritance.
5.5 Directly Readable and Writable
The newly invented uppercase characters in compact mode, which applications can use freely, should do the trick since PSYC will simply pass them around transparently.
5.6 Efficient Update
Updating parts of a document, which in our model is the state of a particular entity, is obviously trivial with our +/-/=/: operators. Doing in-depth updates of complex data structures like lists and tables isn't considered yet. Wouldn't it be sufficient that all parts that may need dynamic updating are flattened out into top-level variables?
If you really insist on having dynamic changes of subparts of complex data structures, a suitable syntax can be developed (or may even exist in embryonic state in some experimental projects like ppp) but it should only be an optional module rather than a core feature of PSYC, since we don't want the job of implementing PSYC become too complex and torturous.
5.7 Embedding Support
We don't have encryption beyond outer TLS yet, but it doesn't look like it could turn into a problem to create extensions to the protocol, that provide for more detail. With our layer separation we can even define encryption to be applied on interentity level, while leaving routing unaffected. Not just end-to-end like OTR, but carrying all the variables in the entity layer over into the encrypted part of the message.
5.9 Explicit Typing
Yes, see types.
5.10 Extension Points
Yes, plenty. One of them being inheritance. Another the ability to simply add more variables.
5.11 Format Version Identification
The first character serves that purpose. Our old format used ., the new one has |. We tend to change the basic syntax format once in a decade, so one character does scale.
To be able to reuse the uppercase compact namespace, you need to have separate sending and receiving entities for each application. In a flat file format, you could still use routing-like headers to achieve the same effect, or use a content type approach. On XML details, see below in Roundtrip.
5.14 Human Language Neutral
5.15 Human Readable and Editable
Yes!! PSYC isn't generally binary, so it is still easy to edit manually. Even when binary data is transferred, you can still tweak the outer structure. A good renderer will not apply lengths to things that don't need it, so you can edit them and might just have to fix a general packet length, if what you are editing qualifies as a packet at all. A single packet file format doesn't need a packet length of course.
5.16 Integrable into XML Stack
Okay, this was certainly not our goal, and when I started looking into this list I didn't expect we could even consider becoming an XML optimization, but if there is a proper mapping from PSYC to XML and back, then all of the typical XML operations can be applied after conversion, thus making PSYC a possible binary-like encoding of XML, or at least of certain suitable types of schemas.
Structure yes, mark-up not easily, but I can't assure you it wouldn't work out. To transform something like XHTML into PSYC syntax would need extensive use of psyctext with a massive number of variables for all the subparts or an altogether smarter template strategy. The result may be less bandwidth efficient than the current XHTML syntax, but it would allow to skip the traditional parsing process: A web browsing renderer would be able to quickly act on data in variables and maybe turn out faster in operation.
5.17 Localized Changes
5.18 No Arbitrary Limits
The limitation on available tokens may be impractical as it requires multibyte tokens as soon as the basic 26 are used up, but it's not a physical limit.
5.19 Platform Neutrality
5.20 Random Access
PSYC does not provide such an ability, but you can define your own index variable type to implement something like that. It is not advisable for every application, anyway, but a standardized strategy on indexing could be interesting.
Checksums may be added, but PSYC's syntax is focused on compactness, not the ability to recover from a data integrity error which lower layer transport protocols are supposed to handle. PSYC can detect errors, but there is no master plan for recovering. Exotic radio applications, that cannot afford retransmissions, as described in the document, should probably wrap PSYC into a redundancy improvement encoding, similar to the strategy used by the ISO CDROM format. It is certainly a bad idea making this a requirement for normal Internet-based applications.
5.22 Roundtrip Support
Unlikely to provide identical XML, but lossless equivalent XML could be feasible. It is however a major minus that XML comes with both attributes and children. Encoding this into PSYC is certainly ugly. Being allowed to map all attributes to children would be a plus.
The combination of signatures and XML schemas may solve this issue by allowing a most optimal PSYC encoding and defining a mapping how such encoding is to fill the XML document like a template, then the children vs attributes bug of XML doesn't matter.
psyctext may prove useful to this purpose. In that case even whitespace can be maintained, if all documents share the same spacing style.
5.23 Schema Extensions and Deviations
That is the normal way for PSYC to see things. Enforcing schemas is the harder job. ;)
5.24 Schema Instance Change Resilience
5.25 Self Contained
PSYC as a format does make sense by itself, so.. yes here.
5.27 Specialized codecs
Sounds like adding own types or embedding binary data.
5.29 Support for Error Correction
May be easy, may be difficult. See Robustness above.
5.30 Transport Independence
6 Additional Considerations 6.1 Forward Compatibility
Yes, this is very likely to work out.
6.2 Implementation Cost
We win here. ;) <coyo> Indeed.
6.3 Royalty Free
I wished I was making money with this.
6.4 Single Conformance Class
6.5 Widespread Adoption
That I just can't answer. ;)
Looks like the PSYC syntax is pretty useful in a large variety of situations. Only when there is a large amount of fields in a deep structure, it may turn out more useful to encode things differently. With the concept of extensible types and the ability for both variables and body to contain arbitrary information, PSYC may not come with a standard solution, but with an open interface to address the problem.
So, concluding, although PSYC is neither binary nor XML, its syntax might be good enough for what you were looking for. Of course my considerations may be totally wrong and your accurate inspection of this will lead to different conclusions. Conversion prototypes could be an interesting experiment, however.