Added an article as introduction to the Tanja project + various fixes

2012-02-15 22:17:20 +01:00 · 2012-02-15 22:17:20 +01:00 · fa299b0fd9
commit fa299b0fd9
parent 1d24a2c6bb
5 changed files with 411 additions and 1 deletions
--- a/dat/doc
+++ b/dat/doc
@ -13,6 +13,11 @@ write an article surprasses my urge to get some programming done again.

 =over

+=item C<2012-02-15 > - L<A Distributed Communication System for Modular Applications|http://dev.yorhel.nl/doc/commvis>
+
+In this article I explain a vision of mine, and the results of a small research
+project aimed at realizing that vision.
+
 =item C<2011-11-26 > - L<Multi-threaded Access to an SQLite3 Database|http://dev.yorhel.nl/doc/sqlaccess>

 So you have a single database and some threads. How do you combine these in a
--- a/dat/doc-commvis
+++ b/dat/doc-commvis
@ -0,0 +1,393 @@
+A Distributed Communication System for Modular Applications
+
+=pod
+
+(Published on B<2012-02-15>. Also available in L<POD|http://dev.yorhel.nl/dat/doc-commvis>.)
+
+
+=head1 Introduction
+
+I have a vision. A vision in which rigid point-to-point IPC is replaced with a
+far more flexible and distributed communication system. A vision in which
+different components in the same program can interact with each other without
+having to worry about each others' internal state. A vision where programs can
+be designed in a modular way, without even worrying about whether to use
+threads or an event-based model. A vision where every component communicates
+with others, and where you can communicate with every component. And more
+importantly, a vision in which each component can be implemented in a different
+programming language, without the need for specific code to glue everything
+together.
+
+If that sounds interesting to you, then please read on. As a small research
+project of mine, I've been looking into ways to realize the above vision, and I
+believe to have found an answer. In this article I'll try to explain my ideas
+and how they may be used to realize this vision.
+
+My ideas have been heavily inspired by
+L<Linda|http://en.wikipedia.org/wiki/Linda_(coordination_language)>. If you're
+already familiar with that, then what I present here probably won't be very
+revolutionary. Still, there are several aspects in which my ideas differ
+significantly from Linda, so you won't be bored reading this. :-)
+
+
+
+=head1 The Concept
+
+In this section I'll try to introduce the overall concept and some terminology.
+This is going to be somewhat abstract and technical, but please bear with me.
+I promise that things will get more interesting in the later sections.
+
+Let me first define an abstract communications framework. We have a B<network>
+and a bunch of B<sessions> connected to that network. Sessions can communicate
+with each other through this network (that's usually what a network is for,
+after all). These sessions do not have to be static: they may come and go.
+Keep in mind that, for the purpose of explaining this concept, these terms are
+very abstract: a session can be anything. A process, thread, a single function,
+an object, or even your mobile phone. Anything. In the same way, the network is
+nothing more than an abstract way to connect these sessions. It could be
+sockets, pipes, a HTTP server, a broadcast network or just shared memory
+between threads. If it allows sessions to communicate I'll call it a network.
+
+Unlike many communication systems, this network does not have the concept of
+I<addresses>. There is no direct way for one session to identify another, and
+indeed there is no need to do so for the purposes of communication. Instead,
+the primary means of communication is by using B<tuples> and patterns.
+
+A tuple is an ordered set (list, array, whatever terminology you prefer) of
+zero or more elements.  Each element may have a different type, so it can hold
+booleans, integers, floating point numbers, strings and even more complex data
+structures as arrays or maps. You may think of a tuple as an array in
+L<JSON|http://json.org/> notation, if that makes things easier to understand.
+
+Sessions send and receive tuples to communicate with each other. On the sending
+side, a session simply "passes" a tuple to the network. This is a non-blocking,
+asynchronous operation. In fact, it makes no sense to make this a blocking
+action, because the sender can not know whether it will be received by any
+other session anyway. The tuple may be received by many other sessions, or
+there may not even be a single session interested in the tuple at all.
+
+On the receiving side, sessions B<register> patterns. A pattern itself is
+mostly just a tuple, but with a more limited set of allowed types: only those
+types for which exact matching makes sense, like booleans, integers and
+strings. A pattern matches an incoming tuple if the first C<n> elements of the
+tuple exactly match the corresponding elements of the pattern. A special
+I<wildcard> element may be used to match any value of any type.
+
+A sessions thus only receives tuples from other sessions if they have
+registered a pattern for them. As mentioned, it is not illegal to send a tuple
+for which no other sessions have registered. In this case, the tuple will just
+be discarded. It is also possible that many sessions have registered for a
+matching pattern, in which case all of these sessions will receive the tuple.
+As an additional rule, if a session sends out a tuple that matches one of its
+own patterns, then it will receive its own tuple. (However, programming
+interfaces might allow this to be detected and/or disabled if this eases the
+implementation of a session).
+
+Finally, there is the concept of a B<return-path>. Upon sending out a tuple, a
+session may indicate that it is interested in receiving replies. The network
+is then responsible for providing a return-path: a way for receivers of the
+tuple to reply to it. When a tuple is received, the session has the option to
+reply to it: a reply consists of one or more tuples that are sent directly to
+the session from which the tuple originated, using this return-path. When a
+receiver is done replying to the tuple or when it has no intention of sending
+back a reply, it should close the return-path to indicate this. The session
+that sent the original tuple is then notified that the return-path is closed,
+and no more replies will be received. If there is no session that has
+registered for the tuple, the return-path is closed immediately (or at least,
+the sending session is notified that there won't be a reply). If the tuple is
+received by multiple sessions, then the replies will be interleaved over the
+return-path, and the path is closed when all of the receiving sessions have
+closed their end.
+
+
+
+=head1 Common design patterns and solutions
+
+The previous section was rather abstract. This section provides several
+examples on how to do common tasks and design patterns by using the previously
+described concepts.
+
+
+=head2 Broadcast notifications
+
+This is commonly implemented in OOP systems using the I<Observer pattern>.
+Implementing the same using tuples and patterns is an order of magnitude more
+simple, as broadcast notifications are pretty much the native means of
+communication.
+
+In OOP you have the "observers" that can add themselves to the "observer list"
+of any "object". This observer list is usually managed by the object that is to
+be observed. If something happens to the object, it will walk through the
+observer list and notify each observer.
+
+If you represent an object as a session and define a notification as a tuple
+that follows a certain pattern, then you very easily achieve the same
+functionality as with an OOP implementation. In fact, there are some advantages
+to doing it this way:
+
+=over
+
+=item *
+
+Sessions stay registered to the same notifications even if the "object" (the
+session that is being observed) is restarted or replaced with something else.
+It's the network itself that keeps track of the registrations, not the sessions
+that provide the notifications. Of course, this can be seen as a drawback, but
+you can easily emulate OOP behaviour by providing an extra notification when
+the "object" is shut down, indicating that the observing sessions can remove
+their patterns.
+
+=item *
+
+Since there is no need for the session that is being observed to keep a list of
+sessions that are observing it, it also doesn't have walk the list and send out
+multiple notifications. Notifying the observers is as simple as sending out a
+single tuple.
+
+=item *
+
+Many implementations of the Observer pattern maintain only a single list of
+observers per object, and each listed observer will be notified for every
+change to the object. For example, if an object maintains a list and provides
+notifications when something is added and deleted to the list, every observer
+will be notified of both the "added" action and the "deleted" action. The use
+of tuples and patterns allows observers to register for all actions, or just
+for a single one. If an "add" action would be notified with a tuple of
+C<["object", "add", id]> and a "delete" action with
+C<["object", "delete", id]>, then an observing session can register with the
+pattern C<["object", *]> to be notified for both actions, or just
+C<["object", "add"]> to register only for additions.
+
+=back
+
+Of course, this is only one way to implement a notification mechanism. There
+are also solutions that more accurately mimic the behaviour of the Observer
+pattern OOP in cases where that is desired.
+
+
+=head2 Commands
+
+A I<command> is what I call something along the lines of one session telling an
+other session to do something. Suppose we have a session representing a file
+system. A command for this session could then be something like "delete file
+X".
+
+In a sense, this isn't much different from a notification as described above.
+The file system session would have registered a pattern like
+C<["fs", "delete", *]>, where the wildcard is used for the file name. If an
+other session then wants to have a file deleted, the only thing it will have to
+do is send out a tuple matching that pattern, and the file system session will
+take care of deleting it.
+
+In the above scenario, the session sending the command has no feedback
+whatsoever on whether the command has been successfully executed or not.
+Whether this is acceptable depends of course on the specific application. One
+way of still providing some form of feedback is to have the file system session
+send out a notification tuple, e.g. C<["fs", "deleted", "file"]> (Note that the
+second element is now C<deleted> rather than C<delete>. Using the same tuple
+for actions and notifications is going to be very messy...). This way the
+session sending the command, in addition to any other sessions that happen to
+be interested in file deletion, will be notified of the deletion of the file.
+An alternative solution is to use the RPC-like method, as described below.
+
+
+=head2 RPC
+
+L<RPC|http://en.wikipedia.org/wiki/Remote_procedure_cal> is in essence nothing
+else than providing an interface similar to a regular function call to a
+component that can't be reached via a regular function call (e.g. because the
+object isn't inside the address space of the program). RPC is generally a
+request-response type of interaction, and making use of the return-path
+facility as I described earlier, all of the functionality of RPC is also
+available with the concept of tuple communication.
+
+=head3 Commands, the RPC-way
+
+Take the previous file system example. Instead of just sending the command
+tuple to delete the file, the session could indicate that it is interested in
+replies and the network will create a return-path. If the return-path is closed
+before any replies have been received, then the commanding session knows that
+the file system session is either down or broken. Otherwise, the file system
+session has the ability to send back a response. This could be a simple "okay,
+file has been deleted" tuple if things went alright, or an error indication if
+things didn't go too well. The commanding session has the option to either
+block and wait for a reply (or a close of the return-path), or continue doing
+whatever it wanted to do and asynchronously check for a reply.
+
+The downside of using the return-path rather than the previously mentioned
+notification approach is that other sessions can't easily be notified of file
+deletion. Of course, an other session can register for the same pattern as the
+file system did and thus receive the same command, but it would have no way of
+knowing whether the delete was actually successful or not. For other sessions
+to be notified as well, the file system session would probably have to send out
+a notification tuple. Of course, it all depends on the application whether this
+is necessary, you only have to implement the functionality that is necessary
+for your purposes.
+
+=head3 Requesting information
+
+Another use of RPC, and thus also of the return-path, is to allow sessions to
+request information from each other. Using the same example again, the file
+system session could register for a pattern such as C<["fs", "list"]>. Upon
+receiving a tuple matching that pattern, the session would send a list of all
+its files over the return-path. Other sessions can then request this list by
+simply sending out the right tuple and waiting for the replies.
+
+
+
+
+=head1 Advantages over other systems
+
+Now that I've hopefully convinced you that my communication concept is powerful
+enough to build applications with it, you may be wondering why you should use
+it instead of the other technologies. After all, you can achieve pretty much
+the same functionality with just regular OOP, RPC, message passing, or other
+systems. Let me present some of the inherent advantages that this system has
+compared to others, and why it will help in designing flexible and modular
+applications.
+
+=head2 Loose coupling of components
+
+Sessions (representing the components of a system) do not have to have a lot of
+knowledge about each other. Sessions implicitly provide abstracted I<services>
+using tuple communications, in much the same way as interfaces explicitly do in
+OOP.
+
+Very much unlike OOP, however, is that sessions do not even have to know of
+each other how they should be used in threaded or event-based environments. For
+example, threading in OOP is a pain: which objects should implement
+synchronisation and which shouldn't? The answer to this question is not nearly
+as obvious as it should be. With event-based systems, you'll always need to
+worry about how long a certain function call block the callers' thread.  Since
+communication between the different sessions is completely asynchronous, these
+worries are gone.
+
+=head2 Location independence
+
+Sessions can communicate with other sessions without knowing I<where> they are.
+This has as major advantage that a session can be moved around without having
+to change a single line of code in any of the sessions relying on its service.
+This allows sessions that communicate a lot with each other to be placed in the
+same process, while resource-heavy sessions may be distributed among several
+physical devices.
+
+=head2 Programming language independence
+
+All communication is solely done with tuples, which can be represented as
+abstract objects and serialized and deserialized (or marshalled/unmarshalled,
+whichever terminology you prefer) for communication. I used a JSON array as an
+example of a tuple earlier, and perhaps it's not such a bad one: JSON data can
+be interchanged between many programming languages, and are quite often not
+that annoying in use. Still, there are many other alternatives (Bencoding, XML,
+binary encodings, etc.), and it all depends on the exact data types and values
+you wish to use for communication.
+
+Language independence allows each session to be (re)implemented in a different
+language, again without affecting any other sessions. Did you write an
+application in a high-level language and noticed that performance wasn't as
+good as you wanted? Then you can very easily rewrite the most resource-heavy
+sessions in a low-level language such as C. Similarly, it allows developers to
+hook into your application even when they are not familiar with your favorite
+programming language.
+
+=head2 Easy debugging
+
+Not only can other applications and/or plugins hook into your application, you
+can also connect a simple debugger to the network. The debugger just has to
+register for a pattern and then print out any received tuples, allowing you to
+see exactly what is being sent over the network and whether the sessions react
+as expected. Similarly, the debugger could allow you to send tuples back to the
+network and see whether the sessions react as they should. Unfortunately, what
+is being sent over a return-path is generally not visible to anyone but the
+receiver of the replies, although a network implementation might allow a
+debugging application to look into that as well.
+
+
+
+=head1 Where to go from here
+
+What I've described above is nothing more than a bunch of ideas. To actually
+use this, there's a lot to be done.
+
+=over
+
+=item Defining a "tuple"
+
+What types can be used in tuples? Should a tuple have some maximum size or a
+maximum number of elements? Should a C<NULL> type be included? What about a
+boolean type, why not use the integers 1 and 0 for that? Should it be possible
+to interchange binary data, or only UTF-8 strings?
+
+What will be the size of an integer that a session can reasonably assume to be
+available? Specifying something like "infinite" is going to be either
+inefficient in terms of memory and CPU overhead or will require extra overhead
+(in terms of code) in usage. Specifying that everything should fit in a 64bit
+integer is a lot more practical, but may be somewhat annoying to cope with in
+many dynamically typed languages running on 32bit architectures. Specifying
+that integers are 32bits will definitely ease the implementation of the network
+library in interpreted languages, but lowers the usefulness of the integer type
+and is still a pain to use in OCaml (which has 31bit integers).
+
+These choices greatly affect the ease of implementing a networking library for
+specific programming languages and the ease of using the network to actually
+develop an application.
+
+=item The exact semantics of matching
+
+Somewhat similar to the previous point, the semantics of matching tuples with
+patterns should also be defined in some way. Some related questions are whether
+values of different types may be equivalent. For example, is the string
+C<"1234"> equivalent to an integer with that value? What about NULL and/or
+boolean types? If there is a floating point type, you probably won't need exact
+matching on those values (floating points are too imprecise for that anyway),
+but you might still want the floating point number C<10.0> to match the integer
+C<10> to ease the use in dynamic languages where the distinction between
+integer and float is blurred.
+
+=item Defining the protocol(s)
+
+Making my vision of modularity and ease of use a reality requires that any
+session can easily communicate with an other session, even if they have a
+vastly different implementation. To do this, we need a protocol to connect
+multiple processes together, whether they run on a local machine or on over a
+physical network.
+
+=item Coding the stuff
+
+Obviously, all of this remains as a mere concept if nothing ever gets
+implemented. Easy-to-use libraries are needed for several programming
+languages. And more importantly, actual applications will have to be developed
+using these libraries.
+
+=back
+
+Of course, realizing all of the above is an iterative process. You can't write
+an implementation without knowing what data types a tuple is made of, but it is
+equally impossible to determine the exact definition of a tuple without having
+experienced with an actual implementation.
+
+
+=head2 What's the plan?
+
+I've been working on documenting the basics of the semantics and the
+point-to-point communication protocol, and have started on an early
+implementation in the Go programming language to experiment with. I've dubbed
+the project B<Tanja>, and have published my progress on a
+L<git repo|http://g.blicky.net/tanja.git/>.
+
+My intention is to also write implementations for C and Perl, experiment with
+that, and see if I can refine the semantics to make this concept one that is
+both efficient and easy to use.
+
+Since I still have no idea whether this concept is actually a convenient one to
+write large applications with, I'd love to experiment with that as well. My
+original intention has always been to write a flexible client for the Direct
+Connect network, possibly extending it to other P2P or chat networks in the
+future.  So I'd love to write a large application using this concept, and see
+how things work out.
+
+In either case, if this article managed to get you interested in this concept
+or in project Tanja, and you have any questions, feedback or (gasp!) feel like
+helping out, don't hesitate to contact me! I'm available as 'Yorhel' on Direct
+Connect at C<adc://blicky.net:2780> and IRC at C<irc.synirc.net>, or just drop
+me a mail at C<projects@yorhel.nl>.
--- a/dat/home
+++ b/dat/home
@ -12,6 +12,8 @@ you decide to do with it.

 =over

+=item C<2012-02-15 > Added a new article on my new L<communication system|http://dev.yorhel.nl/doc/commvis>.
+
 =item C<2012-02-13 > ncdc 1.8 released.

 =item C<2012-01-19 > TUWF 0.2 released.
--- a/dat/ncdc
+++ b/dat/ncdc
@ -37,7 +37,6 @@ L<Arch Linux|http://aur.archlinux.org/packages.php?ID=50949> -
 L<FreeBSD|http://www.freshports.org/net-p2p/ncdc/> -
 L<Frugalware|http://frugalware.org/packages/136807> -
 L<Gentoo|http://packages.gentoo.org/package/net-p2p/ncdc> -
-L<Mac OS X|http://www.macports.org/ports.php?by=name&substr=ncdc> -
 L<OpenSUSE|http://packman.links2linux.org/package/ncdc>

 The L<Open Build