You could argue that PSYC itself, when enhanced by state, is a worldwide decentralized database.

Consider that using the networking capabilities of many database systems is very likely to collide with PSYC's own requirements for integrity (the awareness of knowing when data has been modified etc). PSYC needs to solve these issues itself, and considering that no database knows how to multicast, should end up being a better choice for database replication at some point, or at least for its specific needs.

Database technology should therefore be interesting for PSYC applications on an interprocess or library link level instead, not by means of a network protocol interface. This rules out LDAP pretty much from the start.

Contents

Local database

So let's look into the local storage issue which is frequently looked at from the database point of view. For practical reasons I'll discuss this here, it may influence development of applications such as psyced, but this is in fact a generic discussion on database query languages.

SQL

SQL is the standard, aged with many strengths and shortcomings. The things that disturbs me most is how you are still supposed to make string concatenation operations, even if the SQL implementation is a library linked directly into your application, such as SQLite. Stored procedures alleviate the problem, and if your programming language is compiled at runtime anyway, it doesn't make much difference to also compile some database API strings. But if that is not the case, or the strings are compiled more than once, you are simply running into a performance brake.

Alternatives to SQL

Some RDBMS provide alternate query languages or native APIs, but they frequently still employ SQL string statements at some point. Several query languages that compile to real data structures exist, but they are either fringe or Java-only solutions: .QL, D, LINQ, Datalog and QVT using OCL.

Going object-oriented

There is quite a choice in object-oriented database management systems (OODBMS), with FastDB and GigaBASE being some of the most popular open source implementations. FastDB is limited to in-memory applications, but luckily, when your application grows too big, you can switch to GigaBASE which happens to be API-compatible. The advantages and disadvantages paragraph is quite a useful read here. With PSYC being itself very object-oriented and hardly hierarchical/structural, this seems to be a rather natural choice.

XQuery

But what if applications require a more structural approach? The XML industry is producing a whole lot of output, and with the W3C generating so many standards along with it, they may even one day challenge the SQL dominance. When it comes to parsing, XML isn't exciting, but XPath can frequently be precompiled, that is sometimes also the case for XQuery, like with MonetDB, which is memory-based like FastDB however. Sedna is an interesting technology in this area. When both the XML data and the query statements are handled as real data structures, not parsed, then this stuff should be both powerful and acceptably fast.

Not sure what SPARQL is about, but it goes beyond XQuery as it encompasses several source documents and operates in a graph-oriented way, limiting it to the RDF format for source material. Decentralized RDF as the next generation XML? Does it make any sense for us if PSYC is itself decentralized? After all PSYC is a rather nice platform for the so-called semantic web as it has bidirectionality, event-orientedness and privacy! Could SPARQL/RDF make sense on a data structure level with a different protocol? Maah. Humbug. And no current implementations allow that sort of access to the data, anyway. Probably isn't even intended.

Summary

FastDB/GigaBASE are interesting deals, something XQuery-based may prove slower but more flexible; and finally even SQLite may turn out useful. What do you think?

Quotes

twitter blogs: Our new architecture will move our reliance to a simple, elegant filesystem-based approach, rather than a collection of database.