PSYC Syntax Specification - Version 1.0 alpha
PSYC packets travel the wire either using TCP circuits or by UDP. UDP is typically used for multicast notices to large contexts whose successful reception is not critical. The PSYC packet format is mostly line-based with some exceptions.
(See also ABNF)
PSYC packets are byte sequences which have the following syntax:
packet = routing-header [ content-length content ] "|" LF ; the length of content is either implicit (scan until LF "|" LF) ; or explicitly reported in content-length. routing-header = *routing-modifier entity-header = *sync-operation *entity-modifier content = entity-header [ body LF ] content-length = [ length ] LF routing-modifier = operator variable ( simple-arg / LF ) sync-operation = ( "=" LF / "?" LF ) entity-modifier = operator variable ( simple-arg / binary-arg / LF ) body = method [ LF data ] operator = "=" / ":" / "+" / "-" / "?" / "!" / <more reserved glyphs> simple-arg = HTAB text-data LF binary-arg = SP length HTAB binary-data LF length = 1*DIGIT binary-data = <a length byte long byte sequence> method = 1*kwchar variable = 1*kwchar text-data = *nonlchar data = <amount of bytes as given by length or until the (LF "|" LF) sequence has been encountered> nonlchar = %x00-09 / %x0B-FF ; basically any byte except newline kwchar = <alpha numeric ASCII char or "_"> For the definition of DIGIT, VCHAR, SP, LF and HTAB see RFC 2234 (ABNF).
Either text-data or binary-data can contain lists, which adhere to the following syntax (in ABNF):
list = binary-elem *("|" binary-elem) ; for binary values =/ "|" text-elem *("|" text-elem) ; for visible/non-binary characters binary-elem = length SP binary-data text-elem = *nonlpipechar nonlpipechar = %x00-09 / %x0B-7B / %x7D-FF ; any byte except newline and "|"
Either format can appear in either data container! This list syntax is only valid for variables of the _list type that start with _list.
The following examples illustrate the syntax. Consider the names of variables and methods ficticious, some of them are historic, some like _list_topic will probably never be.
This is a simple example packet:
| :_source psyc://example.symlynX.com/~fippo :_target psyc://ente.aquarium.example.org:-32872 :_nick fippo _info_nickname Hello [_nick]. |
And this is an example packet that covers most of the BNF rules above:
| :_context psyc://example.org/@democracynow :_target psyc://ente.aquarium.example.org:-32872 :_list_member |psyc://example.symlynX.com/~jim|psyc://example.org/~judy :_list_topic 9 democracy|3 now :_list_image 9213 4404 <binary data>|4798 <binary data> :_list_owner 26 |psyc://example.org/~judy :_image 4212 <binary data> _status_context In [_context:_nick]: [_list_member:_nick] |
This example uses entity-oriented psyctext. The images could be used to decorate a member list, but the normal approach should be to obtain the member images from the state of each member entity. So this example really serves the purpose of showing several possible encodings of lists and data.
The decision which strategy to pick is left to the implementor, mainstream server implementations should choose an application-developer-friendly style however, which is yet to be defined.
| :_source psyc://base.example.org/~k :_target psyc://localhost:1234 175 :_color #CC0000 :_nick k :_nick_target psyc://localhost:1234 _message_private hi there. this message contains NL | NL here: | but it doesn't matter because it has length! |
Routing header and content are separated by a special line that is either empty or contains the length of the content (in bytes).
If no length was provided, the bytes after the routing-header are parsed as the content rule in the grammar defines. That means that data ends at the first LF "|" LF.
This means that as soon as you use binary data that might contain LF "|" LF in the content you MUST report the content length.
If content length was provided, the given number of bytes is read from the byte stream after the routing-header and then processed as the content rule in the grammar defines, which means that data ends with the end of the content (This makes it possible to transmit arbitrary opaque binary data as the data part of a packet).
Variable names and methods are ASCII encoded strings while the contents of body or the arguments of variables are kept in as UTF-8 unless specified otherwise.
There are two kinds of variables: routing variables and entity variables.
Routing variables can be persisted (which means the variable is set via the '=' operator (or '+', '-') in the persistent variable set (see below)) and modified during the course of the existence of the circuit, making this a simple mechanism for protocol compression, whereas context entities may persist variables (meaning: they use the '=' modifier to set persistent variables) for all their members to keep for the entire duration of existence of the context, making it a decentralized storage vehicle.
Each context has it's own set of persistent entity variables. If the _context routing variable is NOT set persistent entity variables MUST NOT be changed. This means that persistent entity variable changing modifiers can only be used when _context is set, and thus updating persistent entity variables can only be done by a context.
Should a modifier change a persistent entity variable but _context is not set, the violation SHOULD be acquitted with a _failure_unsupported_state_persistent error packet and the circuit MAY be terminated.
NOTE: In theory also unicast entity communications between a _source and a _target could each define a set of persistent variables. Such entity state (as opposed to context state) is however currently not supported as it raises storage requirements of PSYC implementations more than it is likely to prove useful. It is reserved for possible future use.
Out-of-context communication may however still refer to persistent variables from in-context communication in its psyctext template.
Each packet defines a set of current variables which may be different from the persistent set of variables. When passing the variables to an application, the programming interface SHOULD merge current routing and entity variables into a single structure.
routing and entity modifiers
Each packet comes with a set of routing modifiers and entity modifiers. The routing modifiers belong to the routing-header and are separated by a newline from the entity modifiers (entity-modifier).
The routing modifiers modify the current and persistent routing variables. The entity modifiers modify the current and persistent entity variables.
This means that after routing modifiers (routing-mods) have been processed the persistent variables for the sending entity need to be loaded (persistent context slave) before the process of handling content is started (entity-modifier).
Recommendation: The modification of incoming and outgoing routing variables should be done by the circuit whereas entity variable modifications are generated by the entities and received by context slaves.
Current and persistent variable handling
When a packet is being parsed, the modifiers modify the set of current and persistent routing and entity variables. To do so, the set of current variables is initialized by the set of persistent variables before the modifiers are applied.
After the packet has been processed, the current routing and entity variables are the significant variables that belong to the instance of the parsed packet.
NOTE: You have to make sure that you don't apply the changes to the permanent routing variables until the whole routing header has been parsed. Same with the entity variables.
- = – The variable is modified in the set of current variables and the set of persistent variables. If no variable name is provided the persistent variables and current variables are deleted.
- : – The variable is only modified in the set of current variables.
- + – For _list variables, the elements in the modifier argument are appended to the permanent variables and the current variables. Further types MAY define custom uses of this modifier.
- - – For _list variables, the elements in the modifier argument are removed from the permanent variables and the current variables. Further types MAY define custom uses of this modifier.
- ? – State sync request.
- !$@%&*/#;, – Reserved for as yet undefined state operations.
The ASCII strings, denoted by the non-terminals variable and method, have to adhere to the keyword naming specification.
MIME Content Type
The content type for complete PSYC packets themselves is message/x-psyc (uncaring of the content-type of the data contained within). It needs to be encapsulated in an 8-bit transparent way, as it may contain binary data.
The "Parse" about page gives practical instructions how to write a PSYC parser.
NOTE: When an invalid packet is received, and the routing header has already been parsed successfully, and it has been processed partially, it may or may not have affected state changes before being dismissed as invalid. Thus, after receiving an invalid packet (which means, there are syntax errors in the content part of the packet), the current state data of the addressed context MUST be invalidated.
NOTE: With the new syntax strictly expecting LF between lines, no longer accepting CRLF like the old syntax, you can no longer use telnet for testing purposes. Please obtain netcat (nc command) or similar.