Added article section + SQLite threading article + minor ncdc FAQ fixes
This commit is contained in:
parent
291c0ea142
commit
f1f08a5fe0
3 changed files with 693 additions and 4 deletions
|
|
@ -16,7 +16,7 @@ mentioned projects are dead: neither LDCC, DCTC nor TurboVision are seeing any
|
||||||
recent development.
|
recent development.
|
||||||
|
|
||||||
L<ShakesPeer|http://shakespeer.bzero.se/> - Appears to have a commandline
|
L<ShakesPeer|http://shakespeer.bzero.se/> - Appears to have a commandline
|
||||||
inter-face as well. I haven't personally tried it, but have not heard much
|
interface as well. I haven't personally tried it, but have not heard many
|
||||||
positive things about it. Has not seen any recent development, either.
|
positive things about it. Has not seen any recent development, either.
|
||||||
|
|
||||||
|
|
||||||
|
|
@ -61,8 +61,8 @@ And then follow the instructions in the README.
|
||||||
Most likely this is caused by a L<bug in
|
Most likely this is caused by a L<bug in
|
||||||
glib-networking|https://bugzilla.gnome.org/show_bug.cgi?id=664321>. To get
|
glib-networking|https://bugzilla.gnome.org/show_bug.cgi?id=664321>. To get
|
||||||
around it, install the "certtool" utility that comes with gnutls (package
|
around it, install the "certtool" utility that comes with gnutls (package
|
||||||
"gnutls-bin" on Ubuntu), delete the old certificates ("rm ~/.ncdc/cert/*"), and
|
"gnutls-bin" on Ubuntu), delete the old certificates (C<rm ~/.ncdc/cert/*>),
|
||||||
then start ncdc again.
|
and then start ncdc again.
|
||||||
|
|
||||||
|
|
||||||
=head2 Why doesn't ncdc rotate log files automatically?
|
=head2 Why doesn't ncdc rotate log files automatically?
|
||||||
|
|
@ -109,7 +109,7 @@ the DC Development hub before ncdc had native TLS support:
|
||||||
accept = 127.0.0.1:16591
|
accept = 127.0.0.1:16591
|
||||||
connect = hub.dcbase.org:16591
|
connect = hub.dcbase.org:16591
|
||||||
|
|
||||||
The URL `adc://127.0.0.1:16591/' can then be used to connect to the hub from
|
The URL C<adc://127.0.0.1:16591/> can then be used to connect to the hub from
|
||||||
within ncdc.
|
within ncdc.
|
||||||
|
|
||||||
|
|
||||||
|
|
|
||||||
635
dat/sqlaccess
Normal file
635
dat/sqlaccess
Normal file
|
|
@ -0,0 +1,635 @@
|
||||||
|
Multi-threaded Access to an SQLite3 Database
|
||||||
|
|
||||||
|
=head1 Introduction
|
||||||
|
|
||||||
|
As I was porting L<ncdc|http://dev.yorhel.nl/ncdc> over to use SQLite3 as
|
||||||
|
storage backend, I stumbled on a problem: The program uses a few threads for
|
||||||
|
background jobs, and it would be nice to give these threads access to the
|
||||||
|
database.
|
||||||
|
|
||||||
|
Serializing all database access through the main thread wouldn't have been very
|
||||||
|
hard to implement in this particular case, but that would have been far from
|
||||||
|
optimal. The main thread is also responsible for keeping the user interface
|
||||||
|
responsive and handling most of the network interaction. Overall responsiveness
|
||||||
|
of the program would significantly improve when the threads could access the
|
||||||
|
database without involvement of the main thread.
|
||||||
|
|
||||||
|
Which brings us to the following question: What solutions are available for
|
||||||
|
providing multi-threaded access to an SQLite database? What problems may I run
|
||||||
|
in to? I was unable to find a good overview in this area on the net, so I wrote
|
||||||
|
this article with the hope to improve that situation.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
=head1 SQLite3 and threading
|
||||||
|
|
||||||
|
Let's first see what SQLite3 itself has to offer in terms of threading support.
|
||||||
|
The official documentation mentions threading support several times in various
|
||||||
|
places, but this information is scattered around and no good overview is given.
|
||||||
|
Someone has tried to organize this before on a L<single
|
||||||
|
page|http://www.sqlite.org/cvstrac/wiki?p=MultiThreading>, and while this
|
||||||
|
indeed gives a nice overview, it has unfortunately not been updated since 2006.
|
||||||
|
The advices are therefore a little on the conservative side.
|
||||||
|
|
||||||
|
Nonetheless, it is wise to remain portable with different SQLite versions,
|
||||||
|
especially when writing programs that dynamically link with some random version
|
||||||
|
installed on someone's system. It should be fairly safe to assume that SQLite
|
||||||
|
binaries provided by most systems, if not all, are compiled with thread safety
|
||||||
|
enabled. This doesn't mean all that much, unfortunately: The only thing
|
||||||
|
I<thread safe> means in this context is that you can use SQLite3 in multiple
|
||||||
|
threads, but a single database connection should still stay within a single
|
||||||
|
thread.
|
||||||
|
|
||||||
|
Since SQLite 3.3.1, which was released in early 2006, it is possible to move a
|
||||||
|
single database connection along multiple threads. Doing this with older
|
||||||
|
versions is not advisable, as explained in L<the SQLite
|
||||||
|
FAQ|http://www.sqlite.org/faq.html#q6>. But even with 3.3.1 and later there is
|
||||||
|
an annoying restriction: A connection can only if be passed to another thread
|
||||||
|
when any outstanding statements are closed and finalized. In practice this
|
||||||
|
means that it is not possible to keep a prepared statement in memory for later
|
||||||
|
executions.
|
||||||
|
|
||||||
|
Since SQLite 3.5.0, released in 2007, a single SQLite connection can be used
|
||||||
|
from multiple threads simultaneously. SQLite will internally manage locks to
|
||||||
|
avoid any data corruption. I can't recommend making use of this facility,
|
||||||
|
however, as there are still many issues with the API. The L<error fetching
|
||||||
|
functions|http://www.sqlite.org/c3ref/errcode.html> and
|
||||||
|
L<sqlite3_last_insert_row_id()|http://www.sqlite.org/c3ref/last_insert_rowid.html>,
|
||||||
|
among others, are still useless without explicit locking in the application. I
|
||||||
|
also believe that the previously mentioned restriction on having to finalize
|
||||||
|
statements has been relaxed in this version, so keeping prepared statements in
|
||||||
|
memory and passing them among different threads becomes possible.
|
||||||
|
|
||||||
|
When using multiple database connections within a single process, SQLite offers
|
||||||
|
a facility to allow L<sharing of its
|
||||||
|
cache|http://www.sqlite.org/sharedcache.html>, in order to reduce memory usage
|
||||||
|
and disk I/O. The semantics of this feature have changed with different SQLite
|
||||||
|
versions and appear to have stabilised in 3.5.0. This feature may prove useful
|
||||||
|
to optimize certain situations, but does not open up new possibilities of
|
||||||
|
communicating with a shared database.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
=head1 Criteria
|
||||||
|
|
||||||
|
Before looking at some available solutions, let's first determine the criteria
|
||||||
|
we can use to evaluate them.
|
||||||
|
|
||||||
|
=over
|
||||||
|
|
||||||
|
=item Implementation size
|
||||||
|
|
||||||
|
Obviously, a solution that requires only a few lines of code to implement is
|
||||||
|
preferable over one that requires several levels of abstraction in order to be
|
||||||
|
usable. I won't be giving actual implementations here, so the sizes will be
|
||||||
|
rough estimates for comparison purposes. The actual size of an implementation
|
||||||
|
is of course heavily dependent on the programming environment as well.
|
||||||
|
|
||||||
|
=item Memory/CPU overhead
|
||||||
|
|
||||||
|
The most efficient solution for a single-threaded application is to simply have
|
||||||
|
direct access to a single database connection. Every solution is in principle a
|
||||||
|
modification or extension of this idea, and will therefore add a certain
|
||||||
|
overhead. This overhead manifests itself in both increased CPU and memory
|
||||||
|
usage. The order of which varies between solutions.
|
||||||
|
|
||||||
|
=item Prepared statement re-use
|
||||||
|
|
||||||
|
Is it possible to prepare a statement once and keep using it for the lifetime
|
||||||
|
of the program? Or will prepared statements have to be thrown away and
|
||||||
|
recreated every time? Keeping statement handles in memory will result in a nice
|
||||||
|
performance boost for applications that run the same SQL statement many times.
|
||||||
|
|
||||||
|
=item Transaction grouping
|
||||||
|
|
||||||
|
A somewhat similar issue to prepared statement re-use: From a performance point
|
||||||
|
of view, it is very important to try to batch many UPDATE/DELETE/INSERT
|
||||||
|
statements within a single transaction, as opposed to running each modify query
|
||||||
|
separately. Running each query separately will force SQLite to flush the data
|
||||||
|
to disk separately every time, whereas a single transaction will batch-flush
|
||||||
|
all the changes to disk in a single go. Some solutions allow for grouping
|
||||||
|
multiple statements in a single transaction quite easily, while others require
|
||||||
|
more involved steps.
|
||||||
|
|
||||||
|
=item Background processing
|
||||||
|
|
||||||
|
In certain situations it may be desirable to queue a certain query for later
|
||||||
|
processing, without explicitly waiting for it to complete. For example, if
|
||||||
|
something in the database has to be modified as a result of user interaction in
|
||||||
|
a UI thread, then the application would feel a lot more responsive if the
|
||||||
|
UPDATE query was simply queued to be processed in a background thread than when
|
||||||
|
the query had run in the UI thread itself. A database accessing solution with
|
||||||
|
built-in support for background processing of queries will significantly help
|
||||||
|
with building a responsive application.
|
||||||
|
|
||||||
|
=item Concurrency
|
||||||
|
|
||||||
|
Concurrency indicates how well the solution allows for concurrent access. The
|
||||||
|
worst possible concurrency is achieved when a single database connection is
|
||||||
|
used for all threads, as only a single action can be performed on the database
|
||||||
|
at any point in time. Maximum concurrency is achieved when each thread has its
|
||||||
|
own SQLite connection. Note that maximum concurrency doesn't mean that the
|
||||||
|
database can be accessed in a I<fully> concurrent manner. SQLite uses internal
|
||||||
|
database-level locks to avoid data corruption, and these will limit the actual
|
||||||
|
maximum concurrency. I am not too knowledgeable about the inner workings of
|
||||||
|
these locks, but it is at least possible to have a large number truly
|
||||||
|
concurrent database I<reads>. Database I<writes> from multiple threads may
|
||||||
|
still allow for significantly more concurrency than when they are manually
|
||||||
|
serialized over a single database connection.
|
||||||
|
|
||||||
|
=item Portability
|
||||||
|
|
||||||
|
What is the minimum SQLite version required to implement the solution? Does it
|
||||||
|
require any special OS features or SQLite compilation settings? As outlined
|
||||||
|
above, different versions of SQLite offer different features with regards to
|
||||||
|
threading. Relying one of the relatively new features will decrease
|
||||||
|
portability.
|
||||||
|
|
||||||
|
=back
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
=head1 The Solutions
|
||||||
|
|
||||||
|
Here I present four solutions to allow database access from multiple threads.
|
||||||
|
Note that this list may not be exhaustive, these are just a few solutions that
|
||||||
|
I am aware of. Also note that none of the solutions presented here are in any
|
||||||
|
way new. Most of these paradigms date back to the entire notion of concurrent
|
||||||
|
programming, and have been applied in software since decades ago.
|
||||||
|
|
||||||
|
|
||||||
|
=head2 Connection sharing
|
||||||
|
|
||||||
|
By far the simplest solution to implement: Keep a single database connection
|
||||||
|
throughout your program and allow every thread to access it. Of course, you
|
||||||
|
will need to be careful to always put locks around the code where you access
|
||||||
|
the database handler. An example implementation could look like the following:
|
||||||
|
|
||||||
|
// The global SQLite connection
|
||||||
|
sqlite3 *db;
|
||||||
|
|
||||||
|
int main(int argc, char **argv) {
|
||||||
|
if(sqlite3_open("database.sqlite3", &db))
|
||||||
|
exit(1);
|
||||||
|
|
||||||
|
// start some threads
|
||||||
|
// wait until the threads are finished
|
||||||
|
|
||||||
|
sqlite3_close(db);
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
void *some_thread(void *arg) {
|
||||||
|
sqlite3_mutex_enter(sqlite3_db_mutex(db));
|
||||||
|
// Perform some queries on the database
|
||||||
|
sqlite3_mutex_leave(sqlite3_db_mutex(db));
|
||||||
|
}
|
||||||
|
|
||||||
|
=over
|
||||||
|
|
||||||
|
=item Implementation size
|
||||||
|
|
||||||
|
This is where connection sharing shines: There is little extra code required
|
||||||
|
when compared to using a database connection in a single-threaded context. All
|
||||||
|
you need to be careful of is to lock the mutex before using the database, and
|
||||||
|
to unlock it again afterwards.
|
||||||
|
|
||||||
|
=item Memory/CPU overhead
|
||||||
|
|
||||||
|
As the only addition to the single-threaded case are the locks, this solution
|
||||||
|
has practically no memory overhead. The mutexes are provided by SQLite,
|
||||||
|
after all. CPU overhead is also as minimal as it can be: mutexes are the most
|
||||||
|
primitive type provided by threading libraries to serialize access to a shared
|
||||||
|
resource, and are therefore very efficient.
|
||||||
|
|
||||||
|
=item Prepared statement re-use
|
||||||
|
|
||||||
|
Prepared statements can be safely re-used inside a single enter/leave block.
|
||||||
|
However, if you want to remain portable with SQLite versions before 3.5.0, then
|
||||||
|
any prepared statements B<must> be freed before the mutex is unlocked. This can
|
||||||
|
be a major downside if the enter/leave blocks themselves are relatively short
|
||||||
|
but accessed quite often. If portability with older versions is not an issue,
|
||||||
|
then this restriction is gone and prepared statements can be re-used easily.
|
||||||
|
|
||||||
|
=item Transaction grouping
|
||||||
|
|
||||||
|
A reliable implementation will not allow transactions to span multiple
|
||||||
|
enter/leave blocks. So as with prepared statements, transactions need to be
|
||||||
|
committed to disk before the mutex is unlocked. Again shared with prepared
|
||||||
|
statement re-use is that this limitation may prove to be a significant problem
|
||||||
|
in optimizing application performance, disk I/O in particular. One way to lower
|
||||||
|
the effects of this limitation is to increase the size of a single enter/leave
|
||||||
|
block, thus allowing for more work to be done in a single transaction. Code
|
||||||
|
restructuring may be required in order to efficiently implement this. Another
|
||||||
|
way to get around this problem is to do allow a transaction to span multiple
|
||||||
|
enter/leave blocks. Implementing this reliably may not be an easy task,
|
||||||
|
however, and will most likely require application-specific knowledge.
|
||||||
|
|
||||||
|
=item Background processing
|
||||||
|
|
||||||
|
Background processing is not natively supported with connection sharing. It is
|
||||||
|
possible to spawn a background thread to perform database operations each time
|
||||||
|
that this is desirable. But care should be taken to make sure that these
|
||||||
|
background threads will execute dependent queries in the correct order. For
|
||||||
|
example, if thread A spawns a background thread, say B, to execute an UPDATE
|
||||||
|
query, and later thread A wants to read that same data back, it must first wait
|
||||||
|
for thread B to finish execution. This may add more inter-thread communication
|
||||||
|
than is preferable.
|
||||||
|
|
||||||
|
=item Concurrency
|
||||||
|
|
||||||
|
There is no concurrency at all here. Since the database connection is protected
|
||||||
|
by an exclusive lock, only a single thread can operate on the database at any
|
||||||
|
point in time. Additionally, one may be tempted to increase the size of an
|
||||||
|
enter/leave block in order to allow for larger transactions or better re-use of
|
||||||
|
prepared statements. However, any time spent on performing operations that do
|
||||||
|
not directly use the database within such an enter/leave block will lower the
|
||||||
|
maximum possible database concurrency even further.
|
||||||
|
|
||||||
|
=item Portability
|
||||||
|
|
||||||
|
Connection sharing requires at least SQLite 3.3.1 in order to pass the same
|
||||||
|
database connection around. SQLite must be compiled with threading support
|
||||||
|
enabled. If prepared statements are kept around outside of an enter/leave
|
||||||
|
block, then version 3.5.0 or higher will be required.
|
||||||
|
|
||||||
|
=back
|
||||||
|
|
||||||
|
|
||||||
|
=head2 Message passing
|
||||||
|
|
||||||
|
An alternative approach is to allow only a single thread to access the
|
||||||
|
database. Any other thread that wants to access the database in any way will
|
||||||
|
then have to communicate with this database thread. This communication is done
|
||||||
|
by sending messages (I<requests>) to the database thread, and, when query
|
||||||
|
results are required, receiving back one or more I<response> messages.
|
||||||
|
|
||||||
|
Message passing schemes and libraries are available for many programming
|
||||||
|
languages and come in many different forms. For this article, I am going to
|
||||||
|
assume that an asynchronous and unbounded FIFO queue is used to pass around
|
||||||
|
messages, but most of the following discussion will apply to bounded queues as
|
||||||
|
well. I'll try to note the important differences between two where applicable.
|
||||||
|
|
||||||
|
A very simple and naive implementation of a message passing solution is given
|
||||||
|
below. Here I assume that C<queue_create()> will create a message queue (type
|
||||||
|
C<message_queue>), C<queue_get()> will return the next message in the queue, or
|
||||||
|
block if the queue is empty. C<thread_create(func, arg)> will run I<func> in a
|
||||||
|
newly created thread and pass I<arg> as its argument. Error handling has been
|
||||||
|
ommitted to keep this example consice.
|
||||||
|
|
||||||
|
void *db_thread(void *arg) {
|
||||||
|
message_queue *q = arg;
|
||||||
|
|
||||||
|
sqlite3 *db;
|
||||||
|
if(sqlite3_open("database.sqlite3", &db))
|
||||||
|
return ERROR;
|
||||||
|
|
||||||
|
request_msg *m;
|
||||||
|
while((m = queue_get(q)) {
|
||||||
|
if(m->action == QUIT)
|
||||||
|
break;
|
||||||
|
if(m->action == EXEC)
|
||||||
|
sqlite3_exec(db, m->query, NULL, NULL, NULL);
|
||||||
|
}
|
||||||
|
|
||||||
|
sqlite3_close(db);
|
||||||
|
return OK;
|
||||||
|
}
|
||||||
|
|
||||||
|
int main(int argc, char **argv) {
|
||||||
|
message_queue *db_queue = queue_create();
|
||||||
|
thread_create(db_thread, db_queue);
|
||||||
|
// Do work.
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
This example implementation has a single database thread running in the
|
||||||
|
background that accepts the messages C<QUIT>, to stop processing queries and
|
||||||
|
close the database, and C<EXEC>, to run a certain query on the database. No
|
||||||
|
support is available yet for passing query results back to the thread that sent
|
||||||
|
the message. This can be implemented by including a separate C<message_queue>
|
||||||
|
object in the request messages, to which the results can be sent.
|
||||||
|
|
||||||
|
=over
|
||||||
|
|
||||||
|
=item Implementation size
|
||||||
|
|
||||||
|
This will largely depend on the used programming environment and the complexity
|
||||||
|
of the database thread. If your environment already comes with a message queue
|
||||||
|
implementation, and constructing the request/response messages is relatively
|
||||||
|
simple, then a simple implementation as shown above will not require much code.
|
||||||
|
On the other hand, if you have to implement your own message queue or want more
|
||||||
|
intelligence in the database thread to improve efficiency, then the complete
|
||||||
|
implementation may be significantly larger than that of connection sharing.
|
||||||
|
|
||||||
|
=item Memory/CPU overhead
|
||||||
|
|
||||||
|
Constructing and passing around messages will incur a CPU overhead, though with
|
||||||
|
an efficient implementation this should not be significant enough to worry
|
||||||
|
about. Memory usage is highly dependent on the size of the messages being
|
||||||
|
passed around and the length of the queue. If messages are queued faster than
|
||||||
|
they are processed and there is no bound on the queue length, then a process
|
||||||
|
may quickly run of out memory. On the other hand, if messages are processed
|
||||||
|
fast enough then the queue will generally not have more than a single message
|
||||||
|
in it, and the memory overhead will remain fairly small.
|
||||||
|
|
||||||
|
=item Prepared statement re-use
|
||||||
|
|
||||||
|
As the database connection will never leave the database thread, prepared
|
||||||
|
statements can be kept in memory and re-used without problems.
|
||||||
|
|
||||||
|
=item Transaction grouping
|
||||||
|
|
||||||
|
A naive but robust implementation will handle each message in its own
|
||||||
|
transaction. A more clever database thread, however, could wait for multiple
|
||||||
|
messages to be queued and can then batch-execute them in a single transaction.
|
||||||
|
Correctly implementing this may require some additional information to be
|
||||||
|
specified along with the request, such as whether the query may be combined in
|
||||||
|
a single transaction or whether it may only be executed outside of a
|
||||||
|
transaction. Some threads may want to have confirmation that the data has been
|
||||||
|
successfully written to disk, in which case responsiveness will not improve if
|
||||||
|
such actions are queued for later processing. Nonetheless, since the database
|
||||||
|
thread has all the knowledge about the state of the database and any
|
||||||
|
outstanding actions, transaction grouping can be implemented quite reliably.
|
||||||
|
|
||||||
|
=item Background processing
|
||||||
|
|
||||||
|
Background processing is supported natively with a message passing
|
||||||
|
implementation: a thread that isn't interested in query results can simply
|
||||||
|
queue the action to be performed by the database thread without indicating a
|
||||||
|
return path for the results. Of course, if a thread queues many messages that
|
||||||
|
do not require results followed by one that does, it will have to wait for all
|
||||||
|
earlier messages to be processed before receiving any results for the last one.
|
||||||
|
In the case that the actions are not dependent on each other, the database
|
||||||
|
thread may re-order the messages in order to process the last request first.
|
||||||
|
This requires knowledge about dependencies and may significantly complicate the
|
||||||
|
implementation, however.
|
||||||
|
|
||||||
|
=item Concurrency
|
||||||
|
|
||||||
|
As with a shared database connection, database access is exclusive: Only a
|
||||||
|
single action can be performed on the database at a time. Unlike connection
|
||||||
|
sharing, however, any processing within the application will not further
|
||||||
|
degrade the maximum attainable concurrency. As long as unbounded asynchronous
|
||||||
|
queues are used to pass around messages, the database thread will be able to
|
||||||
|
continue working on the database without waiting for another thread to process
|
||||||
|
the results.
|
||||||
|
|
||||||
|
=item Portability
|
||||||
|
|
||||||
|
This is where message passing shines: SQLite is only used within the database
|
||||||
|
thread, no other thread will have a need to call any SQLite function. This
|
||||||
|
allows any version of SQLite to be used, even those that have not been compiled
|
||||||
|
with thread safety enabled.
|
||||||
|
|
||||||
|
=back
|
||||||
|
|
||||||
|
|
||||||
|
=head2 Thread-local connections
|
||||||
|
|
||||||
|
A rather different approach to giving each thread access to a single database
|
||||||
|
is to simply open a new database connection for each thread. This way each
|
||||||
|
connection will be local to the specific thread, which in turn has the power to
|
||||||
|
do with it as it likes without worrying about what the other threads do. The
|
||||||
|
following is a short example to illustrate the idea:
|
||||||
|
|
||||||
|
void *some_thread(void *arg) {
|
||||||
|
sqlite3 *db;
|
||||||
|
if(sqlite3_open("database.sqlite3", &db))
|
||||||
|
return ERROR;
|
||||||
|
|
||||||
|
// Do some work on the database
|
||||||
|
|
||||||
|
sqlite3_close(db);
|
||||||
|
}
|
||||||
|
|
||||||
|
int main(int argc, char **argv) {
|
||||||
|
int i;
|
||||||
|
for(i=0; i<10; i++)
|
||||||
|
thread_create(some_thread, NULL);
|
||||||
|
|
||||||
|
// Wait until the threads are done
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
=over
|
||||||
|
|
||||||
|
=item Implementation size
|
||||||
|
|
||||||
|
Giving each thread its own connection is practically not much different from
|
||||||
|
the single-threaded case where there is only a single database connection. And
|
||||||
|
as the example shows, this can be implemented quite trivially.
|
||||||
|
|
||||||
|
=item Memory/CPU overhead
|
||||||
|
|
||||||
|
If we assume that threads are not created very often and each thread has a
|
||||||
|
relatively long life, then the CPU and I/O overhead caused by opening a new
|
||||||
|
connection for each thread will not be very significant. On the other hand, if
|
||||||
|
threads are created quite often and lead a relatively short life before they
|
||||||
|
are destroyed again, then opening a new connection each time will soon require
|
||||||
|
more resources than running the queries themselves.
|
||||||
|
|
||||||
|
There is a significant memory overhead: every new database connection requires
|
||||||
|
memory. If each connection also has a separate cache, then every thread will
|
||||||
|
quickly require several megabytes only to interact with the database. Since
|
||||||
|
version 3.5.0, SQLite allows sharing of this cache with the other threads,
|
||||||
|
which will reduce this memory overhead.
|
||||||
|
|
||||||
|
=item Prepared statement re-use
|
||||||
|
|
||||||
|
Prepared statements can be re-used without limitations within a single thread.
|
||||||
|
This will allow full re-use of prepared statements if each thread has a
|
||||||
|
different task, in which case every thread will have different queries and
|
||||||
|
access patterns anyway. But when every thread runs the same code, and thus also
|
||||||
|
the same queries, it will still need its own copy of the prepared statement.
|
||||||
|
Prepared statements are specific to a single database connection, so they can't
|
||||||
|
be passed around between the threads. The same argument for CPU overhead works
|
||||||
|
here: as long as threads are long-lived, then this will not be a very large
|
||||||
|
problem.
|
||||||
|
|
||||||
|
=item Transaction grouping
|
||||||
|
|
||||||
|
Each thread has full access to its own database connection, so it can easily
|
||||||
|
batch many queries in a single transaction. It is not possible, however, to
|
||||||
|
group queries from the other threads in this same transaction as well. The
|
||||||
|
grouping may therefore not be as optimal as a message passing solution could
|
||||||
|
provide, but it is still a large improvement compared to connection sharing.
|
||||||
|
|
||||||
|
=item Background processing
|
||||||
|
|
||||||
|
Background processing is not easily possible. While it is possible to spawn a
|
||||||
|
separate thread for each query that needs to be processed in the background, a
|
||||||
|
new database connection will have to be opened every time this is done. This
|
||||||
|
solution will obviously not be very efficient.
|
||||||
|
|
||||||
|
=item Concurrency
|
||||||
|
|
||||||
|
In general, it is not possible to get better concurrency than by providing each
|
||||||
|
thread with its own database connection. This solution definitely wins in this
|
||||||
|
area.
|
||||||
|
|
||||||
|
=item Portability
|
||||||
|
|
||||||
|
Thread-local connections are very portable: the only requirement is that SQLite
|
||||||
|
has been built with threading support enabled. Connections are not passed
|
||||||
|
around between threads, so any SQLite version will do. In order to make use of
|
||||||
|
the shared cache feature, however, SQLite 3.5.0 is required.
|
||||||
|
|
||||||
|
=back
|
||||||
|
|
||||||
|
|
||||||
|
=head2 Connection pooling
|
||||||
|
|
||||||
|
A common approach in server-like applications is to have a connection pool.
|
||||||
|
When a thread wishes to have access to the database, it requests a database
|
||||||
|
connection from a pool of (currently) unused database connections. If no unused
|
||||||
|
connections are available, it can either wait until one becomes available, or
|
||||||
|
create a new database connection on its own. When a thread is done with a
|
||||||
|
connection, it will add it back to the pool to allow it to be re-used in an
|
||||||
|
other thread.
|
||||||
|
|
||||||
|
The following example illustrates a basic connection pool implementation in
|
||||||
|
which a thread creates a new database connection when no connections are
|
||||||
|
available. A global C<db_pool> is defined, on which any thread can call
|
||||||
|
C<pool_pop()> to get an SQLite connection if there is one available, and
|
||||||
|
C<pool_push()> can be used to push a connection back to the pool. This pool can
|
||||||
|
be implemented as any kind of set: a FIFO or a stack could do the trick, as
|
||||||
|
long as it can be accessed from multiple threads concurrently.
|
||||||
|
|
||||||
|
// Some global pool of database connections
|
||||||
|
pool_t *db_pool;
|
||||||
|
|
||||||
|
sqlite3 *get_database() {
|
||||||
|
sqlite3 *db = pool_pop(db_pool);
|
||||||
|
if(db)
|
||||||
|
return db;
|
||||||
|
if(sqlite3_open("database.sqlite3", &db))
|
||||||
|
return NULL;
|
||||||
|
return db;
|
||||||
|
}
|
||||||
|
|
||||||
|
void *some_thread(void *arg) {
|
||||||
|
// Do some work
|
||||||
|
|
||||||
|
sqlite3 *db = get_database();
|
||||||
|
|
||||||
|
// Do some work on the database
|
||||||
|
|
||||||
|
pool_push(db_pool, db);
|
||||||
|
}
|
||||||
|
|
||||||
|
int main(int argc, char **argv) {
|
||||||
|
int i;
|
||||||
|
for(i=0; i<10; i++)
|
||||||
|
thread_create(some_thread, NULL);
|
||||||
|
|
||||||
|
// Wait until the threads are done
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
=over
|
||||||
|
|
||||||
|
=item Implementation size
|
||||||
|
|
||||||
|
A connection pool is in essense not very different from thread-local
|
||||||
|
connections. The only major difference is that the call to sqlite3_open() is
|
||||||
|
replaced with a function call to obtain a connection from the pool and
|
||||||
|
sqlite3_close() with one to give it back to the pool. As shown above, these
|
||||||
|
functions can be fairly simple. Note, however, that unlike with thread-local
|
||||||
|
connections it is advisable to "open" and "close" a connection more often in
|
||||||
|
long-running threads, in order to give other threads a chance to use the
|
||||||
|
connection as well.
|
||||||
|
|
||||||
|
=item Memory/CPU overhead
|
||||||
|
|
||||||
|
This mainly depends on the number of connections you allow to be in memory at
|
||||||
|
any point in time. If this number is not bounded, as in the above example, then
|
||||||
|
you can assume that after running your program for a certain time, there will
|
||||||
|
always be enough unused connections available in the pool. Requesting a
|
||||||
|
connection will then be very fast, since the overhead of creating a new
|
||||||
|
connection, as would have been done with thread-local connections, is
|
||||||
|
completely gone.
|
||||||
|
|
||||||
|
In terms of memory usage, however, it would be more efficient to put a maximum
|
||||||
|
limit on the number of open connections, and have the thread wait until another
|
||||||
|
thread gives a connection back to the pool. Similarly to thread-local
|
||||||
|
connections, memory usage can be decreased by using SQLite's cache sharing
|
||||||
|
feature.
|
||||||
|
|
||||||
|
=item Prepared statement re-use
|
||||||
|
|
||||||
|
Unfortunately, this is where connection pooling borrows from connection
|
||||||
|
sharing. Prepared statements must be cleaned up before passing a connection to
|
||||||
|
another thread if one aims to be portable. But even if you remove that
|
||||||
|
portability requirement, prepared statements are always specific to a single
|
||||||
|
connection. Since you can't assume that you will always get the same connection
|
||||||
|
from the pool, caching prepared statements is not practical.
|
||||||
|
|
||||||
|
On the other hand, a connection pool does allow you to use a single connection
|
||||||
|
for a longer period of time than with connection sharing without negatively
|
||||||
|
affecting concurrency. Unless, of course, there is a limit on the number of
|
||||||
|
open connections, in which case using a connection for a long period of time
|
||||||
|
may starve another thread.
|
||||||
|
|
||||||
|
=item Transaction grouping
|
||||||
|
|
||||||
|
Pretty much the same arguments with re-using prepared statements also apply to
|
||||||
|
transaction grouping: Transactions should be committed to disk before passing a
|
||||||
|
connection back to the pool.
|
||||||
|
|
||||||
|
=item Background processing
|
||||||
|
|
||||||
|
This is also where a connection pool shares a lot of similarity with connection
|
||||||
|
sharing. With thread-local storage, creating a worker thread to perform
|
||||||
|
database operations on the background would be very inefficient. But since this
|
||||||
|
inefficiency is being tackled by allowing connection re-use with a connection
|
||||||
|
pool, it's not a problem. Still the same warning applies with regard to
|
||||||
|
dependent queries, though.
|
||||||
|
|
||||||
|
=item Concurrency
|
||||||
|
|
||||||
|
Connection pooling gives you fine-grained control over how much concurrency
|
||||||
|
you'd like to have. For maximum concurrency, don't put a limit on the number of
|
||||||
|
maximum database connections. If there is a limit, then that will decrease the
|
||||||
|
maximim concurrency in favor of lower memory usage.
|
||||||
|
|
||||||
|
=item Portability
|
||||||
|
|
||||||
|
Since database connections are being passed among threads, connection pooling
|
||||||
|
will require at least SQLite 3.3.1 compiled with thread safety enabled. Making
|
||||||
|
use of its cache sharing capibilities to reduce memory usage will require
|
||||||
|
SQLite 3.5.0 or higher.
|
||||||
|
|
||||||
|
=back
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
=head1 Final notes
|
||||||
|
|
||||||
|
As for what I used for ncdc. I initially chose connection sharing, for its
|
||||||
|
simplicity. Then when I noticed that the UI became less responsive than I found
|
||||||
|
acceptable I started adding a simple queue for background processing of
|
||||||
|
queries. Later I stumbled upon the main problem with that solution: I wanted to
|
||||||
|
read back a value that was written in a background thread, and had no way of
|
||||||
|
knowing whether the background thread had finished executing that query or not.
|
||||||
|
I then decided to expand the background thread to allow for passing back query
|
||||||
|
results, and transformed everything into a full message passing solution. This
|
||||||
|
appears to be working well at the moment, and my current implementation has
|
||||||
|
support for both prepared statement re-use and transaction grouping, which
|
||||||
|
measurably increased performance.
|
||||||
|
|
||||||
|
To summarize, there isn't really a I<best> solution that works for every
|
||||||
|
application. Connection sharing works well for applications where
|
||||||
|
responsiveness and concurrency isn't of major importance. Message passing works
|
||||||
|
well for applications that aim to be responsive, and is flexible enough for
|
||||||
|
optimizing CPU and I/O by re-using prepared statements and grouping queries in
|
||||||
|
larger transactions. Thread-local connections are suitable for applications
|
||||||
|
that have a relatively fixed number of threads, whereas connection pooling
|
||||||
|
works better for applications with a varying number of worker threads.
|
||||||
|
|
||||||
|
=cut
|
||||||
54
index.cgi
54
index.cgi
|
|
@ -23,6 +23,8 @@ TUWF::register(
|
||||||
qr{tuwf} => \&tuwf,
|
qr{tuwf} => \&tuwf,
|
||||||
qr{tuwf/man(?:/(db|misc|request|response|xml))?}
|
qr{tuwf/man(?:/(db|misc|request|response|xml))?}
|
||||||
=> \&tuwfmanual,
|
=> \&tuwfmanual,
|
||||||
|
qr{doc} => \&docindex,
|
||||||
|
qr{doc/sqlaccess} => \&docsqlaccess,
|
||||||
qr{dump} => \&dump,
|
qr{dump} => \&dump,
|
||||||
qr{demo} => \&dumpdemo,
|
qr{demo} => \&dumpdemo,
|
||||||
qr{dump/awshrink} => \&dumpawshrink,
|
qr{dump/awshrink} => \&dumpawshrink,
|
||||||
|
|
@ -54,6 +56,7 @@ sub home {
|
||||||
E;
|
E;
|
||||||
end;
|
end;
|
||||||
h2 'Updates';
|
h2 'Updates';
|
||||||
|
b '2011-11-26'; txt ' Added article section and the article on SQLite.'; br;
|
||||||
b '2011-11-03'; txt ' ncdc 1.5 and ncdu 1.8 released!'; br;
|
b '2011-11-03'; txt ' ncdc 1.5 and ncdu 1.8 released!'; br;
|
||||||
b '2011-10-26'; txt ' ncdc 1.4 released!'; br;
|
b '2011-10-26'; txt ' ncdc 1.4 released!'; br;
|
||||||
b '2011-10-19'; txt ' PGP-signed all releases of ncdu, ncdc and TUWF.'; br;
|
b '2011-10-19'; txt ' PGP-signed all releases of ncdu, ncdc and TUWF.'; br;
|
||||||
|
|
@ -474,6 +477,50 @@ sub tuwfmanual {
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
# D O C S T U F F
|
||||||
|
|
||||||
|
|
||||||
|
sub docindex {
|
||||||
|
my $s = shift;
|
||||||
|
$s->htmlHeader(title => 'Articles', page => 'doc');
|
||||||
|
p 'When programming stuff, I sometimes come across a situation where I am not
|
||||||
|
happy with the documentation or articles available online, and feel the urge
|
||||||
|
to do something about this situation. Most of the time I resist this urge
|
||||||
|
because I otherwise won\'t get any programming done, but sometimes this
|
||||||
|
urge is just too hard to resist.';
|
||||||
|
p 'I don\'t really have a blog - at least not one that I take seriously - so
|
||||||
|
I\'ll just use this site to publish my articles. Since I\'ve just started
|
||||||
|
writing these, this page is still quite empty. I\'ll add more as soon as my
|
||||||
|
urge to write an article surprasses my urge to get some programming done
|
||||||
|
again.';
|
||||||
|
br;
|
||||||
|
p;
|
||||||
|
txt '2011-11-26 - '; b 'Multi-threaded Access to an SQLite3 Database';
|
||||||
|
txt ' ['; a href => '/doc/sqlaccess', 'HTML'; txt ' - '; a href => '/dat/sqlaccess', rel => 'nofollow', 'POD'; txt ']';
|
||||||
|
end;
|
||||||
|
$s->htmlFooter;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
sub docsqlaccess {
|
||||||
|
my $s = shift;
|
||||||
|
$s->htmlHeader(title => 'Multi-threaded Access to an SQLite3 Database', page => 'doc', tab => 'sqlaccess');
|
||||||
|
p;
|
||||||
|
lit <<' E;';
|
||||||
|
Written on <b>2011-11-26</b>. Also available in <a
|
||||||
|
href="/dat/sqlaccess">POD format</a>.
|
||||||
|
<br />Feedback, questions, comments, additions? Forward those to <a
|
||||||
|
href="mailto:projects@yorhel.nl">projects@yorhel.nl</a>.
|
||||||
|
E;
|
||||||
|
end;
|
||||||
|
br;br;
|
||||||
|
$s->htmlPOD("$ROOT/dat/sqlaccess");
|
||||||
|
$s->htmlFooter;
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
# C O D E D U M P
|
# C O D E D U M P
|
||||||
|
|
||||||
|
|
@ -860,6 +907,7 @@ sub htmlHeader {
|
||||||
a href => '/ncdu', $o{page} eq 'ncdu' ? (class => 'sel') : (), 'ncdu'; txt ' ';
|
a href => '/ncdu', $o{page} eq 'ncdu' ? (class => 'sel') : (), 'ncdu'; txt ' ';
|
||||||
a href => '/ncdc', $o{page} eq 'ncdc' ? (class => 'sel') : (), 'ncdc'; txt ' ';
|
a href => '/ncdc', $o{page} eq 'ncdc' ? (class => 'sel') : (), 'ncdc'; txt ' ';
|
||||||
a href => '/tuwf', $o{page} eq 'tuwf' ? (class => 'sel') : (), 'tuwf'; txt ' ';
|
a href => '/tuwf', $o{page} eq 'tuwf' ? (class => 'sel') : (), 'tuwf'; txt ' ';
|
||||||
|
a href => '/doc', $o{page} eq 'doc' ? (class => 'sel') : (), 'articles'; txt ' ';
|
||||||
a href => '/dump', $o{page} eq 'dump' ? (class => 'sel') : (), 'code dump';
|
a href => '/dump', $o{page} eq 'dump' ? (class => 'sel') : (), 'code dump';
|
||||||
end;
|
end;
|
||||||
if($o{page} eq 'ncdu') {
|
if($o{page} eq 'ncdu') {
|
||||||
|
|
@ -886,6 +934,12 @@ sub htmlHeader {
|
||||||
a href => '/tuwf/man', $o{tab} eq 'man' ? (class => 'sel') : (), 'manual'; txt ' ';
|
a href => '/tuwf/man', $o{tab} eq 'man' ? (class => 'sel') : (), 'manual'; txt ' ';
|
||||||
end;
|
end;
|
||||||
}
|
}
|
||||||
|
if($o{page} eq 'doc' && $o{tab}) {
|
||||||
|
use utf8;
|
||||||
|
div id => 'mtabs', style => 'margin-right: 620px';
|
||||||
|
a href => '/doc', '« article index';
|
||||||
|
end;
|
||||||
|
}
|
||||||
if($o{page} eq 'dump') {
|
if($o{page} eq 'dump') {
|
||||||
div id => 'mtabs';
|
div id => 'mtabs';
|
||||||
a href => '/dump', !$o{tab} ? (class => 'sel') : (), 'misc'; txt ' ';
|
a href => '/dump', !$o{tab} ? (class => 'sel') : (), 'misc'; txt ' ';
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue