From fa299b0fd94634753c8deea1657839ae3781ee9f Mon Sep 17 00:00:00 2001 From: Yorhel Date: Wed, 15 Feb 2012 22:17:20 +0100 Subject: [PATCH] Added an article as introduction to the Tanja project + various fixes --- dat/doc | 5 + dat/doc-commvis | 393 ++++++++++++++++++++++++++++++++++++++++++++++++ dat/home | 2 + dat/ncdc | 1 - index.cgi | 11 ++ 5 files changed, 411 insertions(+), 1 deletion(-) create mode 100644 dat/doc-commvis diff --git a/dat/doc b/dat/doc index 29a2213..75602a6 100644 --- a/dat/doc +++ b/dat/doc @@ -13,6 +13,11 @@ write an article surprasses my urge to get some programming done again. =over +=item C<2012-02-15 > - L + +In this article I explain a vision of mine, and the results of a small research +project aimed at realizing that vision. + =item C<2011-11-26 > - L So you have a single database and some threads. How do you combine these in a diff --git a/dat/doc-commvis b/dat/doc-commvis new file mode 100644 index 0000000..e8e6142 --- /dev/null +++ b/dat/doc-commvis @@ -0,0 +1,393 @@ +A Distributed Communication System for Modular Applications + +=pod + +(Published on B<2012-02-15>. Also available in L.) + + +=head1 Introduction + +I have a vision. A vision in which rigid point-to-point IPC is replaced with a +far more flexible and distributed communication system. A vision in which +different components in the same program can interact with each other without +having to worry about each others' internal state. A vision where programs can +be designed in a modular way, without even worrying about whether to use +threads or an event-based model. A vision where every component communicates +with others, and where you can communicate with every component. And more +importantly, a vision in which each component can be implemented in a different +programming language, without the need for specific code to glue everything +together. + +If that sounds interesting to you, then please read on. As a small research +project of mine, I've been looking into ways to realize the above vision, and I +believe to have found an answer. In this article I'll try to explain my ideas +and how they may be used to realize this vision. + +My ideas have been heavily inspired by +L. If you're +already familiar with that, then what I present here probably won't be very +revolutionary. Still, there are several aspects in which my ideas differ +significantly from Linda, so you won't be bored reading this. :-) + + + +=head1 The Concept + +In this section I'll try to introduce the overall concept and some terminology. +This is going to be somewhat abstract and technical, but please bear with me. +I promise that things will get more interesting in the later sections. + +Let me first define an abstract communications framework. We have a B +and a bunch of B connected to that network. Sessions can communicate +with each other through this network (that's usually what a network is for, +after all). These sessions do not have to be static: they may come and go. +Keep in mind that, for the purpose of explaining this concept, these terms are +very abstract: a session can be anything. A process, thread, a single function, +an object, or even your mobile phone. Anything. In the same way, the network is +nothing more than an abstract way to connect these sessions. It could be +sockets, pipes, a HTTP server, a broadcast network or just shared memory +between threads. If it allows sessions to communicate I'll call it a network. + +Unlike many communication systems, this network does not have the concept of +I. There is no direct way for one session to identify another, and +indeed there is no need to do so for the purposes of communication. Instead, +the primary means of communication is by using B and patterns. + +A tuple is an ordered set (list, array, whatever terminology you prefer) of +zero or more elements. Each element may have a different type, so it can hold +booleans, integers, floating point numbers, strings and even more complex data +structures as arrays or maps. You may think of a tuple as an array in +L notation, if that makes things easier to understand. + +Sessions send and receive tuples to communicate with each other. On the sending +side, a session simply "passes" a tuple to the network. This is a non-blocking, +asynchronous operation. In fact, it makes no sense to make this a blocking +action, because the sender can not know whether it will be received by any +other session anyway. The tuple may be received by many other sessions, or +there may not even be a single session interested in the tuple at all. + +On the receiving side, sessions B patterns. A pattern itself is +mostly just a tuple, but with a more limited set of allowed types: only those +types for which exact matching makes sense, like booleans, integers and +strings. A pattern matches an incoming tuple if the first C elements of the +tuple exactly match the corresponding elements of the pattern. A special +I element may be used to match any value of any type. + +A sessions thus only receives tuples from other sessions if they have +registered a pattern for them. As mentioned, it is not illegal to send a tuple +for which no other sessions have registered. In this case, the tuple will just +be discarded. It is also possible that many sessions have registered for a +matching pattern, in which case all of these sessions will receive the tuple. +As an additional rule, if a session sends out a tuple that matches one of its +own patterns, then it will receive its own tuple. (However, programming +interfaces might allow this to be detected and/or disabled if this eases the +implementation of a session). + +Finally, there is the concept of a B. Upon sending out a tuple, a +session may indicate that it is interested in receiving replies. The network +is then responsible for providing a return-path: a way for receivers of the +tuple to reply to it. When a tuple is received, the session has the option to +reply to it: a reply consists of one or more tuples that are sent directly to +the session from which the tuple originated, using this return-path. When a +receiver is done replying to the tuple or when it has no intention of sending +back a reply, it should close the return-path to indicate this. The session +that sent the original tuple is then notified that the return-path is closed, +and no more replies will be received. If there is no session that has +registered for the tuple, the return-path is closed immediately (or at least, +the sending session is notified that there won't be a reply). If the tuple is +received by multiple sessions, then the replies will be interleaved over the +return-path, and the path is closed when all of the receiving sessions have +closed their end. + + + +=head1 Common design patterns and solutions + +The previous section was rather abstract. This section provides several +examples on how to do common tasks and design patterns by using the previously +described concepts. + + +=head2 Broadcast notifications + +This is commonly implemented in OOP systems using the I. +Implementing the same using tuples and patterns is an order of magnitude more +simple, as broadcast notifications are pretty much the native means of +communication. + +In OOP you have the "observers" that can add themselves to the "observer list" +of any "object". This observer list is usually managed by the object that is to +be observed. If something happens to the object, it will walk through the +observer list and notify each observer. + +If you represent an object as a session and define a notification as a tuple +that follows a certain pattern, then you very easily achieve the same +functionality as with an OOP implementation. In fact, there are some advantages +to doing it this way: + +=over + +=item * + +Sessions stay registered to the same notifications even if the "object" (the +session that is being observed) is restarted or replaced with something else. +It's the network itself that keeps track of the registrations, not the sessions +that provide the notifications. Of course, this can be seen as a drawback, but +you can easily emulate OOP behaviour by providing an extra notification when +the "object" is shut down, indicating that the observing sessions can remove +their patterns. + +=item * + +Since there is no need for the session that is being observed to keep a list of +sessions that are observing it, it also doesn't have walk the list and send out +multiple notifications. Notifying the observers is as simple as sending out a +single tuple. + +=item * + +Many implementations of the Observer pattern maintain only a single list of +observers per object, and each listed observer will be notified for every +change to the object. For example, if an object maintains a list and provides +notifications when something is added and deleted to the list, every observer +will be notified of both the "added" action and the "deleted" action. The use +of tuples and patterns allows observers to register for all actions, or just +for a single one. If an "add" action would be notified with a tuple of +C<["object", "add", id]> and a "delete" action with +C<["object", "delete", id]>, then an observing session can register with the +pattern C<["object", *]> to be notified for both actions, or just +C<["object", "add"]> to register only for additions. + +=back + +Of course, this is only one way to implement a notification mechanism. There +are also solutions that more accurately mimic the behaviour of the Observer +pattern OOP in cases where that is desired. + + +=head2 Commands + +A I is what I call something along the lines of one session telling an +other session to do something. Suppose we have a session representing a file +system. A command for this session could then be something like "delete file +X". + +In a sense, this isn't much different from a notification as described above. +The file system session would have registered a pattern like +C<["fs", "delete", *]>, where the wildcard is used for the file name. If an +other session then wants to have a file deleted, the only thing it will have to +do is send out a tuple matching that pattern, and the file system session will +take care of deleting it. + +In the above scenario, the session sending the command has no feedback +whatsoever on whether the command has been successfully executed or not. +Whether this is acceptable depends of course on the specific application. One +way of still providing some form of feedback is to have the file system session +send out a notification tuple, e.g. C<["fs", "deleted", "file"]> (Note that the +second element is now C rather than C. Using the same tuple +for actions and notifications is going to be very messy...). This way the +session sending the command, in addition to any other sessions that happen to +be interested in file deletion, will be notified of the deletion of the file. +An alternative solution is to use the RPC-like method, as described below. + + +=head2 RPC + +L is in essence nothing +else than providing an interface similar to a regular function call to a +component that can't be reached via a regular function call (e.g. because the +object isn't inside the address space of the program). RPC is generally a +request-response type of interaction, and making use of the return-path +facility as I described earlier, all of the functionality of RPC is also +available with the concept of tuple communication. + +=head3 Commands, the RPC-way + +Take the previous file system example. Instead of just sending the command +tuple to delete the file, the session could indicate that it is interested in +replies and the network will create a return-path. If the return-path is closed +before any replies have been received, then the commanding session knows that +the file system session is either down or broken. Otherwise, the file system +session has the ability to send back a response. This could be a simple "okay, +file has been deleted" tuple if things went alright, or an error indication if +things didn't go too well. The commanding session has the option to either +block and wait for a reply (or a close of the return-path), or continue doing +whatever it wanted to do and asynchronously check for a reply. + +The downside of using the return-path rather than the previously mentioned +notification approach is that other sessions can't easily be notified of file +deletion. Of course, an other session can register for the same pattern as the +file system did and thus receive the same command, but it would have no way of +knowing whether the delete was actually successful or not. For other sessions +to be notified as well, the file system session would probably have to send out +a notification tuple. Of course, it all depends on the application whether this +is necessary, you only have to implement the functionality that is necessary +for your purposes. + +=head3 Requesting information + +Another use of RPC, and thus also of the return-path, is to allow sessions to +request information from each other. Using the same example again, the file +system session could register for a pattern such as C<["fs", "list"]>. Upon +receiving a tuple matching that pattern, the session would send a list of all +its files over the return-path. Other sessions can then request this list by +simply sending out the right tuple and waiting for the replies. + + + + +=head1 Advantages over other systems + +Now that I've hopefully convinced you that my communication concept is powerful +enough to build applications with it, you may be wondering why you should use +it instead of the other technologies. After all, you can achieve pretty much +the same functionality with just regular OOP, RPC, message passing, or other +systems. Let me present some of the inherent advantages that this system has +compared to others, and why it will help in designing flexible and modular +applications. + +=head2 Loose coupling of components + +Sessions (representing the components of a system) do not have to have a lot of +knowledge about each other. Sessions implicitly provide abstracted I +using tuple communications, in much the same way as interfaces explicitly do in +OOP. + +Very much unlike OOP, however, is that sessions do not even have to know of +each other how they should be used in threaded or event-based environments. For +example, threading in OOP is a pain: which objects should implement +synchronisation and which shouldn't? The answer to this question is not nearly +as obvious as it should be. With event-based systems, you'll always need to +worry about how long a certain function call block the callers' thread. Since +communication between the different sessions is completely asynchronous, these +worries are gone. + +=head2 Location independence + +Sessions can communicate with other sessions without knowing I they are. +This has as major advantage that a session can be moved around without having +to change a single line of code in any of the sessions relying on its service. +This allows sessions that communicate a lot with each other to be placed in the +same process, while resource-heavy sessions may be distributed among several +physical devices. + +=head2 Programming language independence + +All communication is solely done with tuples, which can be represented as +abstract objects and serialized and deserialized (or marshalled/unmarshalled, +whichever terminology you prefer) for communication. I used a JSON array as an +example of a tuple earlier, and perhaps it's not such a bad one: JSON data can +be interchanged between many programming languages, and are quite often not +that annoying in use. Still, there are many other alternatives (Bencoding, XML, +binary encodings, etc.), and it all depends on the exact data types and values +you wish to use for communication. + +Language independence allows each session to be (re)implemented in a different +language, again without affecting any other sessions. Did you write an +application in a high-level language and noticed that performance wasn't as +good as you wanted? Then you can very easily rewrite the most resource-heavy +sessions in a low-level language such as C. Similarly, it allows developers to +hook into your application even when they are not familiar with your favorite +programming language. + +=head2 Easy debugging + +Not only can other applications and/or plugins hook into your application, you +can also connect a simple debugger to the network. The debugger just has to +register for a pattern and then print out any received tuples, allowing you to +see exactly what is being sent over the network and whether the sessions react +as expected. Similarly, the debugger could allow you to send tuples back to the +network and see whether the sessions react as they should. Unfortunately, what +is being sent over a return-path is generally not visible to anyone but the +receiver of the replies, although a network implementation might allow a +debugging application to look into that as well. + + + +=head1 Where to go from here + +What I've described above is nothing more than a bunch of ideas. To actually +use this, there's a lot to be done. + +=over + +=item Defining a "tuple" + +What types can be used in tuples? Should a tuple have some maximum size or a +maximum number of elements? Should a C type be included? What about a +boolean type, why not use the integers 1 and 0 for that? Should it be possible +to interchange binary data, or only UTF-8 strings? + +What will be the size of an integer that a session can reasonably assume to be +available? Specifying something like "infinite" is going to be either +inefficient in terms of memory and CPU overhead or will require extra overhead +(in terms of code) in usage. Specifying that everything should fit in a 64bit +integer is a lot more practical, but may be somewhat annoying to cope with in +many dynamically typed languages running on 32bit architectures. Specifying +that integers are 32bits will definitely ease the implementation of the network +library in interpreted languages, but lowers the usefulness of the integer type +and is still a pain to use in OCaml (which has 31bit integers). + +These choices greatly affect the ease of implementing a networking library for +specific programming languages and the ease of using the network to actually +develop an application. + +=item The exact semantics of matching + +Somewhat similar to the previous point, the semantics of matching tuples with +patterns should also be defined in some way. Some related questions are whether +values of different types may be equivalent. For example, is the string +C<"1234"> equivalent to an integer with that value? What about NULL and/or +boolean types? If there is a floating point type, you probably won't need exact +matching on those values (floating points are too imprecise for that anyway), +but you might still want the floating point number C<10.0> to match the integer +C<10> to ease the use in dynamic languages where the distinction between +integer and float is blurred. + +=item Defining the protocol(s) + +Making my vision of modularity and ease of use a reality requires that any +session can easily communicate with an other session, even if they have a +vastly different implementation. To do this, we need a protocol to connect +multiple processes together, whether they run on a local machine or on over a +physical network. + +=item Coding the stuff + +Obviously, all of this remains as a mere concept if nothing ever gets +implemented. Easy-to-use libraries are needed for several programming +languages. And more importantly, actual applications will have to be developed +using these libraries. + +=back + +Of course, realizing all of the above is an iterative process. You can't write +an implementation without knowing what data types a tuple is made of, but it is +equally impossible to determine the exact definition of a tuple without having +experienced with an actual implementation. + + +=head2 What's the plan? + +I've been working on documenting the basics of the semantics and the +point-to-point communication protocol, and have started on an early +implementation in the Go programming language to experiment with. I've dubbed +the project B, and have published my progress on a +L. + +My intention is to also write implementations for C and Perl, experiment with +that, and see if I can refine the semantics to make this concept one that is +both efficient and easy to use. + +Since I still have no idea whether this concept is actually a convenient one to +write large applications with, I'd love to experiment with that as well. My +original intention has always been to write a flexible client for the Direct +Connect network, possibly extending it to other P2P or chat networks in the +future. So I'd love to write a large application using this concept, and see +how things work out. + +In either case, if this article managed to get you interested in this concept +or in project Tanja, and you have any questions, feedback or (gasp!) feel like +helping out, don't hesitate to contact me! I'm available as 'Yorhel' on Direct +Connect at C and IRC at C, or just drop +me a mail at C. diff --git a/dat/home b/dat/home index 64e16e7..c030ef9 100644 --- a/dat/home +++ b/dat/home @@ -12,6 +12,8 @@ you decide to do with it. =over +=item C<2012-02-15 > Added a new article on my new L. + =item C<2012-02-13 > ncdc 1.8 released. =item C<2012-01-19 > TUWF 0.2 released. diff --git a/dat/ncdc b/dat/ncdc index 4f79f70..5c653b0 100644 --- a/dat/ncdc +++ b/dat/ncdc @@ -37,7 +37,6 @@ L - L - L - L - -L - L The L sub { changelog(shift, 'tuwf-changelog', 'TUWF', 'tuwf', 'changes', 'TUWF Changelog') }, qr{doc} => sub { podpage(shift, 'doc', 'doc', '', 'Articles') }, qr{doc/sqlaccess} => sub { podpage(shift, 'sqlaccess', 'doc', '', 'Multi-threaded Access to an SQLite3 Database', 1) }, + qr{doc/commvis} => sub { podpage(shift, 'doc-commvis', 'doc', '', 'A Distributed Communication System for Modular Applications', 1) }, qr{dump} => sub { podpage(shift, 'dump', 'dump', '', 'Code dump') }, qr{demo} => sub { podpage(shift, 'dump-demo', 'dump', 'demo', 'Demos') }, qr{dump/awshrink} => sub { podpage(shift, 'dump-awshrink', 'dump', 'awshrink', 'AWStats Data File Shrinker') }, @@ -194,6 +195,14 @@ sub htmlHeader { br; a href => 'http://yorhel.nl', 'yh'; txt ' - '; a href => 'http://g.blicky.net', 'git'; txt ' - '; a href => 'http://pgp.mit.edu:11371/pks/lookup?search=0x8c2739fa', 'pgp'; + br;br; + lit q| +
+ + + + +
|; end; end 'div'; div id => 'main'; @@ -307,8 +316,10 @@ sub printCSS { h1.title { margin-top: 0; font-size: 25px } h1 { margin-top: 50px; } h2 { margin-top: 25px; } + h3 { margin-top: 0; margin-left: 10px } h1, h1 a { font-size: 19px; color: #000; margin-bottom: 5px; text-decoration: none } h2, h2 a { font-size: 16px; color: #000; margin-bottom: 1px; text-decoration: none } + h3, h3 a { font-size: 15px; color: #000; margin-bottom: 1px; text-decoration: none } li { margin-left: 35px; margin-right: 15px; text-align: justify } p { margin: 3px 15px 13px 15px; text-align: justify } p + ul, p + ol { margin-top: -10px }