Contents

Sonic PSYC syntax

Here's a new PSYC syntax which could handle the new requirements of Talk:Variable, Serialisation and the wish list on Decisions to make (in other words: Hell, what a fight to get here!).

Essentially, the existing PSYC syntax apparently did not allow for fully integrated arbitrary data structures - only lists and tables and a special syntax for structures (JSON).

Introducing a new syntax is a terrific change - a chance we probably won't have doing later - a huge effort, and the PROs and CONs of this new syntax are okay, but not completely impressive. PSYC is doing good either way. So what do we do?

Still, it isn't entirely clear if the extension to the old syntax works out, or if there are more reasons to play around with this one. So here it comes:

New order: MMP, PSYC method and variables, optional body. In fact all elements are optional, but you have to send an empty line if you need to skip one.

The : modifier is syntactically no longer necessary - it is the default.

Message text has moved into _text rather than body. psyctext template also has moved into _format rather than body.

Strings are quoted only when they aren't of any keyword or other escape-safe type.

Complex data can be encoded, if a signature is available they can be appropriately decoded, if not, they can still be decoded as arbitrary data.

The space character is the universal separator. No other whitespace characters have that function (TABs and newlines are just normal value characters).

We could be using something less popular than the space character, but then we'd either suffer of the optical effect of Bleiwüste, making PSYC harder to read and type (maybe just maybe a . could be okay - but oh, the dot is just as important!) or we'd be using other whitespace characters like TAB, which would explode the syntax horizontally, make a vertical structure appear where there actually isn't one and stay hard to use in TAB-unfriendly environments like this wiki) or even newline (which would completely destroy the readability of the protocol unless you also keep other elements of old PSYC syntax, resulting in what is described further below in Keep the line feed).

A simple example to start with:

.
_source_relay_uniform psyc://example.org/~josef _context_uniform psyc://psyced.org/@dev
_message _nick jOsef _action beisst _text "Hallo leude wat jeht?"
.

Here are some examples with different module combinations:

Example using _compact & _state

.
=sru //goodadvice.pages.de/~josef cu //psyced.org/@dev
m =n jOsef =a beisst t "Hallo leude wat jeht?" _complex ( xmpp:fox@example.org { +17:"happy to be here" joe:correct } #949494 ) f "[n] [a]: [t]"
.

In the example above variables are compacted but still present, and template is given, too.

Grammar of values

Values can appear just as they are, as long as their first character is not any of the following and in their entirity they are free from space characters, colons and newlines.

  • " for arbitrary strings or binary values. " itself is escaped by doubling it, as in "". In this case the value can contain spaces of course.
  • [0-9] digits indicate the length of the subsequent string if _length was negotiated. otherwise this is an illegal value and MUST be treated as syntax error. lengths are usually more elegant for encoding binary values than escaping quotes in the middle of them. Also in this case the value can contain spaces of course.
  • + is an escape prefix which can be used to provide a number without being interpreted as a length, or to encode a string containing '"' or other of the prefix characters mentioned here, as long as there are no spaces.
  • ( indicates the beginning of an array declaration. the elements are themselves encoded like values separated by space.
  • ) marks the ending of the array.
  • { indicates the beginning of a mapping. each element has two values separated by a : colon without space.
  • } marks the ending of the mapping.

The (,),{ and } glyphs require a subsequent space so something like (L) MUST be illegal and throw a syntax error (but you can use +(L) if you want to send that).

Further ASCII glyph characters are reserved for future syntax extensions and MUST therefore be treated as syntax errors unless negotiated by a future module.

Example using _state & _template

Same thing not compact and without template:

.
_source_relay_uniform psyc://goodadvice.pages.de/~josef _context_uniform psyc://psyced.org/@dev
_message =_nick jOsef =_action beisst _text "Hallo leude wat jeht?" _complex ( xmpp:fox@example.org { +17:"happy to be here" joe:correct } #949494 )
.

See how _action is now defined to be a one-word-variable. If you want a real /me you need _action_text which is a quoted string.

Example using _signature & _template & _length

Same thing with known signature for all elements except _complex and _place_nick, and a non-standard template (which is thus sent even if templates are turned off). This does not preclude that a signature for _complex is available somewhere else.

.
_source_relay_uniform psyc://goodadvice.pages.de/~josef _context_uniform psyc://psyced.org/@dev
_message jOsef beisst "Hallo leude wat jeht?" _place_nick dev _complex 71 ( xmpp:fox@example.org { +17:"happy to be here" joe:correct } #949494 ) _format "In [_place] [_nick] [_action]: [_text]"
.

Length can be omitted for all values you deem too short to mention a length for. In this example only the contents of _complex has been prefixed by a length.

General thoughts

Arbitrary encoding of complex data: Almost like JSON - you can scan strings for being ok with quotelessness, or you can simply quote every string, so guessing is optional. A signature would tell you which elements are _text, allowing you to use all other types unquoted.

See Talk:JSON#Ballast for all the things we don't want to inherit from JSON. Essentially, there is hardly anything left of it. What you see here is pretty much what you get. Also, since [ and ] are template delimiter chars, we use ( and ) for lists.

We could use '*' as a new packet delimiter to avoid a collision with old PSYC implementations, or maybe we just change the port to 18.

Simplicity is still okay as a very stupid protocol implementation would not support signatures, not support leaving out templates and maybe at top level even not support compact keywords.

Embedding names in complex data is syntacticly too hard I guess, and el says nobody needs that anyway. If you want to do serious apps with complex data you must also implement signatures for safety.

Modifiers collide with signatures

The role of modifiers in this is problematic. Are they part of the signatures this would require a different signature for every combination of modifiers, and probably a different method name for each of these. If they are not part of the signature, how the hell do we encode which one is missing because of a '=' and which one is being modified ('+'/'-') or overridden by a ':' ?!?

What if we consider a _signature module incompatible to the _state module. Then we can drop the modifiers. In fact this new syntax would allow us to drop modifiers whenever _state is unavailable - we simply have to send empty lines in case the MMP or PSYC variable fields are empty.
Then again, I think they can be compatible to that point that all the elements of a signature must be provided (you can't make parts of a signature stateful) but you can still apply state to all additional variables.

PROs and CONs

Advantages over existing syntax (PSYC with JSON variables):

  • Strings, lists, tables use the same syntax again, not two differing ones.
  • Existing syntax would not allow to use signatures for anything but JSON (complex) variables.
  • By modifying existing JSON libraries (maybe) we get to working parser/renderers quicker than working from scratch.
  • Less bytes in total with all the ':' no longer necessary in exchange for occasional extra quotes.

Undecided:

  • Packets are more compact to look at, but they might wrap if loaded with many vars.
  • Is a multiliny syntax more ldmud-friendly? Not sure.
  • Grepping for a variable has the disadvantage and the advantage that you get all variables at once.

Disadvantages:

  • Long raw data in a variable makes the packet more unreadable than before.

Which leads to an alternate thought. Where do we get if we...

Keep the line feed?

Syntax that keeps the newlines in the game. This is the _template & _signature example with newlines and tabs instead of spaces. The modifiers are necessary again except for elements described in the signature. Quotes are still necessary because LF-TAB line continuation doesn't work.

.
=_source_relay_uniform     psyc://goodadvice.pages.de/~josef
:_context_uniform          psyc://psyced.org/@dev
_message
jOsef
beisst
"Hallo leude wat jeht?"
:_place_nick dev
:_complex    ( xmpp:fox@example.org { +17:"happy to be here" joe:correct } #949494 )
:_format     In [_place] [_nick] [_action]: [_text]
.

Hm!! Shouldn't _complex also be split over several lines? Then it would look like this... using TAB as a key separator in hashes.

.
=_source_relay_uniform    psyc://goodadvice.pages.de/~josef
:_context_uniform         psyc://psyced.org/@dev
_message
jOsef
beisst
"Hallo leude wat jeht?"
:_place_nick dev
:_complex    (
xmpp:fox@example.org
{
+17    "happy to be here"
joe    correct
}
#949494
)
:_format     In [_place] [_nick] [_action]: [_text]
.

Migration

We don't have to throw away the existing protocol. We keep the renderers and parsers on port 4404, call the scheme ve: and allocate a new port for the new psyc: scheme.

Second thoughts

For the purpose of reliably parsing signatures it is useful to have a syntax where the presence or absence of variable name and type is clearly recognizable.

I don't know how to solve this issue other by re-introducing the state modifier prefixes in front of the variable name.

This is almost equivalent to going back to the old syntax, so I transferred the new concept of arbitary complex structures to the old syntax, which means we stick to the established syntax and don't have to change everything if the end result isn't better.