Added an article as introduction to the Tanja project + various fixes
This commit is contained in:
parent
1d24a2c6bb
commit
fa299b0fd9
5 changed files with 411 additions and 1 deletions
5
dat/doc
5
dat/doc
|
|
@ -13,6 +13,11 @@ write an article surprasses my urge to get some programming done again.
|
|||
|
||||
=over
|
||||
|
||||
=item C<2012-02-15 > - L<A Distributed Communication System for Modular Applications|http://dev.yorhel.nl/doc/commvis>
|
||||
|
||||
In this article I explain a vision of mine, and the results of a small research
|
||||
project aimed at realizing that vision.
|
||||
|
||||
=item C<2011-11-26 > - L<Multi-threaded Access to an SQLite3 Database|http://dev.yorhel.nl/doc/sqlaccess>
|
||||
|
||||
So you have a single database and some threads. How do you combine these in a
|
||||
|
|
|
|||
393
dat/doc-commvis
Normal file
393
dat/doc-commvis
Normal file
|
|
@ -0,0 +1,393 @@
|
|||
A Distributed Communication System for Modular Applications
|
||||
|
||||
=pod
|
||||
|
||||
(Published on B<2012-02-15>. Also available in L<POD|http://dev.yorhel.nl/dat/doc-commvis>.)
|
||||
|
||||
|
||||
=head1 Introduction
|
||||
|
||||
I have a vision. A vision in which rigid point-to-point IPC is replaced with a
|
||||
far more flexible and distributed communication system. A vision in which
|
||||
different components in the same program can interact with each other without
|
||||
having to worry about each others' internal state. A vision where programs can
|
||||
be designed in a modular way, without even worrying about whether to use
|
||||
threads or an event-based model. A vision where every component communicates
|
||||
with others, and where you can communicate with every component. And more
|
||||
importantly, a vision in which each component can be implemented in a different
|
||||
programming language, without the need for specific code to glue everything
|
||||
together.
|
||||
|
||||
If that sounds interesting to you, then please read on. As a small research
|
||||
project of mine, I've been looking into ways to realize the above vision, and I
|
||||
believe to have found an answer. In this article I'll try to explain my ideas
|
||||
and how they may be used to realize this vision.
|
||||
|
||||
My ideas have been heavily inspired by
|
||||
L<Linda|http://en.wikipedia.org/wiki/Linda_(coordination_language)>. If you're
|
||||
already familiar with that, then what I present here probably won't be very
|
||||
revolutionary. Still, there are several aspects in which my ideas differ
|
||||
significantly from Linda, so you won't be bored reading this. :-)
|
||||
|
||||
|
||||
|
||||
=head1 The Concept
|
||||
|
||||
In this section I'll try to introduce the overall concept and some terminology.
|
||||
This is going to be somewhat abstract and technical, but please bear with me.
|
||||
I promise that things will get more interesting in the later sections.
|
||||
|
||||
Let me first define an abstract communications framework. We have a B<network>
|
||||
and a bunch of B<sessions> connected to that network. Sessions can communicate
|
||||
with each other through this network (that's usually what a network is for,
|
||||
after all). These sessions do not have to be static: they may come and go.
|
||||
Keep in mind that, for the purpose of explaining this concept, these terms are
|
||||
very abstract: a session can be anything. A process, thread, a single function,
|
||||
an object, or even your mobile phone. Anything. In the same way, the network is
|
||||
nothing more than an abstract way to connect these sessions. It could be
|
||||
sockets, pipes, a HTTP server, a broadcast network or just shared memory
|
||||
between threads. If it allows sessions to communicate I'll call it a network.
|
||||
|
||||
Unlike many communication systems, this network does not have the concept of
|
||||
I<addresses>. There is no direct way for one session to identify another, and
|
||||
indeed there is no need to do so for the purposes of communication. Instead,
|
||||
the primary means of communication is by using B<tuples> and patterns.
|
||||
|
||||
A tuple is an ordered set (list, array, whatever terminology you prefer) of
|
||||
zero or more elements. Each element may have a different type, so it can hold
|
||||
booleans, integers, floating point numbers, strings and even more complex data
|
||||
structures as arrays or maps. You may think of a tuple as an array in
|
||||
L<JSON|http://json.org/> notation, if that makes things easier to understand.
|
||||
|
||||
Sessions send and receive tuples to communicate with each other. On the sending
|
||||
side, a session simply "passes" a tuple to the network. This is a non-blocking,
|
||||
asynchronous operation. In fact, it makes no sense to make this a blocking
|
||||
action, because the sender can not know whether it will be received by any
|
||||
other session anyway. The tuple may be received by many other sessions, or
|
||||
there may not even be a single session interested in the tuple at all.
|
||||
|
||||
On the receiving side, sessions B<register> patterns. A pattern itself is
|
||||
mostly just a tuple, but with a more limited set of allowed types: only those
|
||||
types for which exact matching makes sense, like booleans, integers and
|
||||
strings. A pattern matches an incoming tuple if the first C<n> elements of the
|
||||
tuple exactly match the corresponding elements of the pattern. A special
|
||||
I<wildcard> element may be used to match any value of any type.
|
||||
|
||||
A sessions thus only receives tuples from other sessions if they have
|
||||
registered a pattern for them. As mentioned, it is not illegal to send a tuple
|
||||
for which no other sessions have registered. In this case, the tuple will just
|
||||
be discarded. It is also possible that many sessions have registered for a
|
||||
matching pattern, in which case all of these sessions will receive the tuple.
|
||||
As an additional rule, if a session sends out a tuple that matches one of its
|
||||
own patterns, then it will receive its own tuple. (However, programming
|
||||
interfaces might allow this to be detected and/or disabled if this eases the
|
||||
implementation of a session).
|
||||
|
||||
Finally, there is the concept of a B<return-path>. Upon sending out a tuple, a
|
||||
session may indicate that it is interested in receiving replies. The network
|
||||
is then responsible for providing a return-path: a way for receivers of the
|
||||
tuple to reply to it. When a tuple is received, the session has the option to
|
||||
reply to it: a reply consists of one or more tuples that are sent directly to
|
||||
the session from which the tuple originated, using this return-path. When a
|
||||
receiver is done replying to the tuple or when it has no intention of sending
|
||||
back a reply, it should close the return-path to indicate this. The session
|
||||
that sent the original tuple is then notified that the return-path is closed,
|
||||
and no more replies will be received. If there is no session that has
|
||||
registered for the tuple, the return-path is closed immediately (or at least,
|
||||
the sending session is notified that there won't be a reply). If the tuple is
|
||||
received by multiple sessions, then the replies will be interleaved over the
|
||||
return-path, and the path is closed when all of the receiving sessions have
|
||||
closed their end.
|
||||
|
||||
|
||||
|
||||
=head1 Common design patterns and solutions
|
||||
|
||||
The previous section was rather abstract. This section provides several
|
||||
examples on how to do common tasks and design patterns by using the previously
|
||||
described concepts.
|
||||
|
||||
|
||||
=head2 Broadcast notifications
|
||||
|
||||
This is commonly implemented in OOP systems using the I<Observer pattern>.
|
||||
Implementing the same using tuples and patterns is an order of magnitude more
|
||||
simple, as broadcast notifications are pretty much the native means of
|
||||
communication.
|
||||
|
||||
In OOP you have the "observers" that can add themselves to the "observer list"
|
||||
of any "object". This observer list is usually managed by the object that is to
|
||||
be observed. If something happens to the object, it will walk through the
|
||||
observer list and notify each observer.
|
||||
|
||||
If you represent an object as a session and define a notification as a tuple
|
||||
that follows a certain pattern, then you very easily achieve the same
|
||||
functionality as with an OOP implementation. In fact, there are some advantages
|
||||
to doing it this way:
|
||||
|
||||
=over
|
||||
|
||||
=item *
|
||||
|
||||
Sessions stay registered to the same notifications even if the "object" (the
|
||||
session that is being observed) is restarted or replaced with something else.
|
||||
It's the network itself that keeps track of the registrations, not the sessions
|
||||
that provide the notifications. Of course, this can be seen as a drawback, but
|
||||
you can easily emulate OOP behaviour by providing an extra notification when
|
||||
the "object" is shut down, indicating that the observing sessions can remove
|
||||
their patterns.
|
||||
|
||||
=item *
|
||||
|
||||
Since there is no need for the session that is being observed to keep a list of
|
||||
sessions that are observing it, it also doesn't have walk the list and send out
|
||||
multiple notifications. Notifying the observers is as simple as sending out a
|
||||
single tuple.
|
||||
|
||||
=item *
|
||||
|
||||
Many implementations of the Observer pattern maintain only a single list of
|
||||
observers per object, and each listed observer will be notified for every
|
||||
change to the object. For example, if an object maintains a list and provides
|
||||
notifications when something is added and deleted to the list, every observer
|
||||
will be notified of both the "added" action and the "deleted" action. The use
|
||||
of tuples and patterns allows observers to register for all actions, or just
|
||||
for a single one. If an "add" action would be notified with a tuple of
|
||||
C<["object", "add", id]> and a "delete" action with
|
||||
C<["object", "delete", id]>, then an observing session can register with the
|
||||
pattern C<["object", *]> to be notified for both actions, or just
|
||||
C<["object", "add"]> to register only for additions.
|
||||
|
||||
=back
|
||||
|
||||
Of course, this is only one way to implement a notification mechanism. There
|
||||
are also solutions that more accurately mimic the behaviour of the Observer
|
||||
pattern OOP in cases where that is desired.
|
||||
|
||||
|
||||
=head2 Commands
|
||||
|
||||
A I<command> is what I call something along the lines of one session telling an
|
||||
other session to do something. Suppose we have a session representing a file
|
||||
system. A command for this session could then be something like "delete file
|
||||
X".
|
||||
|
||||
In a sense, this isn't much different from a notification as described above.
|
||||
The file system session would have registered a pattern like
|
||||
C<["fs", "delete", *]>, where the wildcard is used for the file name. If an
|
||||
other session then wants to have a file deleted, the only thing it will have to
|
||||
do is send out a tuple matching that pattern, and the file system session will
|
||||
take care of deleting it.
|
||||
|
||||
In the above scenario, the session sending the command has no feedback
|
||||
whatsoever on whether the command has been successfully executed or not.
|
||||
Whether this is acceptable depends of course on the specific application. One
|
||||
way of still providing some form of feedback is to have the file system session
|
||||
send out a notification tuple, e.g. C<["fs", "deleted", "file"]> (Note that the
|
||||
second element is now C<deleted> rather than C<delete>. Using the same tuple
|
||||
for actions and notifications is going to be very messy...). This way the
|
||||
session sending the command, in addition to any other sessions that happen to
|
||||
be interested in file deletion, will be notified of the deletion of the file.
|
||||
An alternative solution is to use the RPC-like method, as described below.
|
||||
|
||||
|
||||
=head2 RPC
|
||||
|
||||
L<RPC|http://en.wikipedia.org/wiki/Remote_procedure_cal> is in essence nothing
|
||||
else than providing an interface similar to a regular function call to a
|
||||
component that can't be reached via a regular function call (e.g. because the
|
||||
object isn't inside the address space of the program). RPC is generally a
|
||||
request-response type of interaction, and making use of the return-path
|
||||
facility as I described earlier, all of the functionality of RPC is also
|
||||
available with the concept of tuple communication.
|
||||
|
||||
=head3 Commands, the RPC-way
|
||||
|
||||
Take the previous file system example. Instead of just sending the command
|
||||
tuple to delete the file, the session could indicate that it is interested in
|
||||
replies and the network will create a return-path. If the return-path is closed
|
||||
before any replies have been received, then the commanding session knows that
|
||||
the file system session is either down or broken. Otherwise, the file system
|
||||
session has the ability to send back a response. This could be a simple "okay,
|
||||
file has been deleted" tuple if things went alright, or an error indication if
|
||||
things didn't go too well. The commanding session has the option to either
|
||||
block and wait for a reply (or a close of the return-path), or continue doing
|
||||
whatever it wanted to do and asynchronously check for a reply.
|
||||
|
||||
The downside of using the return-path rather than the previously mentioned
|
||||
notification approach is that other sessions can't easily be notified of file
|
||||
deletion. Of course, an other session can register for the same pattern as the
|
||||
file system did and thus receive the same command, but it would have no way of
|
||||
knowing whether the delete was actually successful or not. For other sessions
|
||||
to be notified as well, the file system session would probably have to send out
|
||||
a notification tuple. Of course, it all depends on the application whether this
|
||||
is necessary, you only have to implement the functionality that is necessary
|
||||
for your purposes.
|
||||
|
||||
=head3 Requesting information
|
||||
|
||||
Another use of RPC, and thus also of the return-path, is to allow sessions to
|
||||
request information from each other. Using the same example again, the file
|
||||
system session could register for a pattern such as C<["fs", "list"]>. Upon
|
||||
receiving a tuple matching that pattern, the session would send a list of all
|
||||
its files over the return-path. Other sessions can then request this list by
|
||||
simply sending out the right tuple and waiting for the replies.
|
||||
|
||||
|
||||
|
||||
|
||||
=head1 Advantages over other systems
|
||||
|
||||
Now that I've hopefully convinced you that my communication concept is powerful
|
||||
enough to build applications with it, you may be wondering why you should use
|
||||
it instead of the other technologies. After all, you can achieve pretty much
|
||||
the same functionality with just regular OOP, RPC, message passing, or other
|
||||
systems. Let me present some of the inherent advantages that this system has
|
||||
compared to others, and why it will help in designing flexible and modular
|
||||
applications.
|
||||
|
||||
=head2 Loose coupling of components
|
||||
|
||||
Sessions (representing the components of a system) do not have to have a lot of
|
||||
knowledge about each other. Sessions implicitly provide abstracted I<services>
|
||||
using tuple communications, in much the same way as interfaces explicitly do in
|
||||
OOP.
|
||||
|
||||
Very much unlike OOP, however, is that sessions do not even have to know of
|
||||
each other how they should be used in threaded or event-based environments. For
|
||||
example, threading in OOP is a pain: which objects should implement
|
||||
synchronisation and which shouldn't? The answer to this question is not nearly
|
||||
as obvious as it should be. With event-based systems, you'll always need to
|
||||
worry about how long a certain function call block the callers' thread. Since
|
||||
communication between the different sessions is completely asynchronous, these
|
||||
worries are gone.
|
||||
|
||||
=head2 Location independence
|
||||
|
||||
Sessions can communicate with other sessions without knowing I<where> they are.
|
||||
This has as major advantage that a session can be moved around without having
|
||||
to change a single line of code in any of the sessions relying on its service.
|
||||
This allows sessions that communicate a lot with each other to be placed in the
|
||||
same process, while resource-heavy sessions may be distributed among several
|
||||
physical devices.
|
||||
|
||||
=head2 Programming language independence
|
||||
|
||||
All communication is solely done with tuples, which can be represented as
|
||||
abstract objects and serialized and deserialized (or marshalled/unmarshalled,
|
||||
whichever terminology you prefer) for communication. I used a JSON array as an
|
||||
example of a tuple earlier, and perhaps it's not such a bad one: JSON data can
|
||||
be interchanged between many programming languages, and are quite often not
|
||||
that annoying in use. Still, there are many other alternatives (Bencoding, XML,
|
||||
binary encodings, etc.), and it all depends on the exact data types and values
|
||||
you wish to use for communication.
|
||||
|
||||
Language independence allows each session to be (re)implemented in a different
|
||||
language, again without affecting any other sessions. Did you write an
|
||||
application in a high-level language and noticed that performance wasn't as
|
||||
good as you wanted? Then you can very easily rewrite the most resource-heavy
|
||||
sessions in a low-level language such as C. Similarly, it allows developers to
|
||||
hook into your application even when they are not familiar with your favorite
|
||||
programming language.
|
||||
|
||||
=head2 Easy debugging
|
||||
|
||||
Not only can other applications and/or plugins hook into your application, you
|
||||
can also connect a simple debugger to the network. The debugger just has to
|
||||
register for a pattern and then print out any received tuples, allowing you to
|
||||
see exactly what is being sent over the network and whether the sessions react
|
||||
as expected. Similarly, the debugger could allow you to send tuples back to the
|
||||
network and see whether the sessions react as they should. Unfortunately, what
|
||||
is being sent over a return-path is generally not visible to anyone but the
|
||||
receiver of the replies, although a network implementation might allow a
|
||||
debugging application to look into that as well.
|
||||
|
||||
|
||||
|
||||
=head1 Where to go from here
|
||||
|
||||
What I've described above is nothing more than a bunch of ideas. To actually
|
||||
use this, there's a lot to be done.
|
||||
|
||||
=over
|
||||
|
||||
=item Defining a "tuple"
|
||||
|
||||
What types can be used in tuples? Should a tuple have some maximum size or a
|
||||
maximum number of elements? Should a C<NULL> type be included? What about a
|
||||
boolean type, why not use the integers 1 and 0 for that? Should it be possible
|
||||
to interchange binary data, or only UTF-8 strings?
|
||||
|
||||
What will be the size of an integer that a session can reasonably assume to be
|
||||
available? Specifying something like "infinite" is going to be either
|
||||
inefficient in terms of memory and CPU overhead or will require extra overhead
|
||||
(in terms of code) in usage. Specifying that everything should fit in a 64bit
|
||||
integer is a lot more practical, but may be somewhat annoying to cope with in
|
||||
many dynamically typed languages running on 32bit architectures. Specifying
|
||||
that integers are 32bits will definitely ease the implementation of the network
|
||||
library in interpreted languages, but lowers the usefulness of the integer type
|
||||
and is still a pain to use in OCaml (which has 31bit integers).
|
||||
|
||||
These choices greatly affect the ease of implementing a networking library for
|
||||
specific programming languages and the ease of using the network to actually
|
||||
develop an application.
|
||||
|
||||
=item The exact semantics of matching
|
||||
|
||||
Somewhat similar to the previous point, the semantics of matching tuples with
|
||||
patterns should also be defined in some way. Some related questions are whether
|
||||
values of different types may be equivalent. For example, is the string
|
||||
C<"1234"> equivalent to an integer with that value? What about NULL and/or
|
||||
boolean types? If there is a floating point type, you probably won't need exact
|
||||
matching on those values (floating points are too imprecise for that anyway),
|
||||
but you might still want the floating point number C<10.0> to match the integer
|
||||
C<10> to ease the use in dynamic languages where the distinction between
|
||||
integer and float is blurred.
|
||||
|
||||
=item Defining the protocol(s)
|
||||
|
||||
Making my vision of modularity and ease of use a reality requires that any
|
||||
session can easily communicate with an other session, even if they have a
|
||||
vastly different implementation. To do this, we need a protocol to connect
|
||||
multiple processes together, whether they run on a local machine or on over a
|
||||
physical network.
|
||||
|
||||
=item Coding the stuff
|
||||
|
||||
Obviously, all of this remains as a mere concept if nothing ever gets
|
||||
implemented. Easy-to-use libraries are needed for several programming
|
||||
languages. And more importantly, actual applications will have to be developed
|
||||
using these libraries.
|
||||
|
||||
=back
|
||||
|
||||
Of course, realizing all of the above is an iterative process. You can't write
|
||||
an implementation without knowing what data types a tuple is made of, but it is
|
||||
equally impossible to determine the exact definition of a tuple without having
|
||||
experienced with an actual implementation.
|
||||
|
||||
|
||||
=head2 What's the plan?
|
||||
|
||||
I've been working on documenting the basics of the semantics and the
|
||||
point-to-point communication protocol, and have started on an early
|
||||
implementation in the Go programming language to experiment with. I've dubbed
|
||||
the project B<Tanja>, and have published my progress on a
|
||||
L<git repo|http://g.blicky.net/tanja.git/>.
|
||||
|
||||
My intention is to also write implementations for C and Perl, experiment with
|
||||
that, and see if I can refine the semantics to make this concept one that is
|
||||
both efficient and easy to use.
|
||||
|
||||
Since I still have no idea whether this concept is actually a convenient one to
|
||||
write large applications with, I'd love to experiment with that as well. My
|
||||
original intention has always been to write a flexible client for the Direct
|
||||
Connect network, possibly extending it to other P2P or chat networks in the
|
||||
future. So I'd love to write a large application using this concept, and see
|
||||
how things work out.
|
||||
|
||||
In either case, if this article managed to get you interested in this concept
|
||||
or in project Tanja, and you have any questions, feedback or (gasp!) feel like
|
||||
helping out, don't hesitate to contact me! I'm available as 'Yorhel' on Direct
|
||||
Connect at C<adc://blicky.net:2780> and IRC at C<irc.synirc.net>, or just drop
|
||||
me a mail at C<projects@yorhel.nl>.
|
||||
2
dat/home
2
dat/home
|
|
@ -12,6 +12,8 @@ you decide to do with it.
|
|||
|
||||
=over
|
||||
|
||||
=item C<2012-02-15 > Added a new article on my new L<communication system|http://dev.yorhel.nl/doc/commvis>.
|
||||
|
||||
=item C<2012-02-13 > ncdc 1.8 released.
|
||||
|
||||
=item C<2012-01-19 > TUWF 0.2 released.
|
||||
|
|
|
|||
1
dat/ncdc
1
dat/ncdc
|
|
@ -37,7 +37,6 @@ L<Arch Linux|http://aur.archlinux.org/packages.php?ID=50949> -
|
|||
L<FreeBSD|http://www.freshports.org/net-p2p/ncdc/> -
|
||||
L<Frugalware|http://frugalware.org/packages/136807> -
|
||||
L<Gentoo|http://packages.gentoo.org/package/net-p2p/ncdc> -
|
||||
L<Mac OS X|http://www.macports.org/ports.php?by=name&substr=ncdc> -
|
||||
L<OpenSUSE|http://packman.links2linux.org/package/ncdc>
|
||||
|
||||
The L<Open Build
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue