Understanding Berkeley DB

1/1/08, by artyom ; Posted in: Storage, Berkeley DB; 2 comments

There are many high quality, high performance, both open and closed source data bases available on the market: MySQL, PostgreSQL, Firebird, Sqlite, Oracle, MS SQL etc. These are industry standard SQL databases that usually power many web sites. The well known LAMP stack is de-facto standard for the web hosting companies.

So why had I chosen to use Berkeley DB instead of many other data bases that most of web technologies work with?

There are several reasons:

Outstanding performance.
Direct C++ API instead of SQL one.
It is as mature as any other databases.

In order to understand the advantages of Berkeley DB over standard SQL data bases we need to understand the architectures of typical DB.

The live of the query.

Lets assume we want to read latest, published, N posts from the data base and get the information about the authors of these posts in the same query. I would write this simple query in SQL like that:

SELECT posts.id, posts.title, authors.author_name
FROM post
JOIN authors ON posts.author_id=authors.id
WHERE posts.state='published'
ORDER BY posts.date DESC
LIMIT [number N]

I'll create the query in my code, giving N the correct number, converting it to text, then I'll send it over the TCP/IP network or local UNIX socket to the SQL server.

It will interpret the query to its internal format and will do following steps:

Read the index of posts.date and will try to fetch the data from it's internal cache.
For each record it finds in posts index, it will check the state=published, and if so, it will fetch records from authors primary index.
Convert the data to output format and return SQL result and send it back over the socket to the application.

The application will receive inputs form the server, check it and convert to the C format -- text of post.id to integer.

Thus for each query, we have several operation that costs a lot:

Sending the query over socket -- IPC or network communication.
SQL query parsing and converting it fetch logic
Converting fetched data to output TCP/IP stream
Sending the information over TCP/IP
Converting data back to internal C format.

Even if the query is simple and typical, even if SQL cached all the required DB information in memory without even requiring disk access, this operation had involved lots of unnecessary high cost operations like text parsing and IPC.

Thus, all this leads to simple results, even if all the data is cached in the process memory of SQL DB and can be fetched as fast as direct memory access it will be served in huge delay due to IPC/Network/Parsing overhead that costs a lot.

What is the ideal DB?

Imagine that you have unlimited process memory space and hardware that never fails and process that runs for a years.

How would you organize your data?

I'd probably use STL. All the data will be stored in maps, lists and other STL structures with fast access. So the query I had written would look like:

for(i=posts.rend(),n=0;i!=posts.rbegin() && n<Max;i++) {
   if(i->second.published) {
      name=authors.find(i->second.author_id);
      n++;
      // Do something with date
   }
}

It is definitely not more complicated then previous example of SQL query. But it is about hundreds times faster then fetching same data from traditional SQL data base.

So lets use STL? No!

Unfortunately the system resources are limited and the data should be frequently flushed to the permanent storage. This "data base" also can't be shared between different processes and do not support transactions and has much more other limitations that do not exist in traditional DB.

So, what do I need?

An efficiency of STL -- the speed of direct memory access bounded by the top limit of best algorithms implementations, together with the stability and scalability of traditional DB.

How can we do it? Let's assume we have an STL container that stores the data on the disk disk and caches parts of it in the memory. Thus every DB access will be as fast as access to STL container (with overhead of more complex data management) in case the data is already cached, and if not it will fetch the required data pages from the disk. This solution will be the best compromise between the speed of STL access and DB sizes and stability.

That is actually the problem that is solved by Berkeley DB.

What Berkeley DB proposes to developer.

C++ API
Cache that is shared between threads or processes.
Support of transactions (optional)
Support of replication.

Thus, using this API I was able to get about 200,000 fetches per second, when MySQL could generate about 10,000 fetches per second and STL map allowed 700,000. Thus the performance of Berkeley DB was only 3.5 times lower then performance of STL map and 20 times faster then typical SQL solution.

And... What are the costs?

As we can expect, we have to pay for these advantages.

A developer that uses Berkeley DB has to know how to fetch data using indexes. In is not enough to write "WHERE a>=10 AND b<=5", the developer needs to translate it into walking on a B-Tree. Usually the learning curve of Berkeley DB API is much steeper then SQL one. Many web developers are familiar with SQL queries and usually know where and how to put correct indexes. The rest they live to SQL engine itself. In case of Berkeley DB, they will have to understand, what exactly SQL engine does when it runs a query. Sometimes these are not trivial questions even for good algorithm developers.

Berkeley DB has good documentation, but it leaves many details for you. You can usually find an answer one the web or with help of your co-workers "how to fetch something" in SQL terms much faster then in Berkeley DB terms. Also, the API is very complex and the documentation assumes your knowledge of DB internals in many cases. So, it is definitely not as easy as work with STL.

The maintenance of the Berkeley DB is significantly more complex then maintenance of SQL DB, because BDB does not understand the structure of your data, it is just a memory block that can be associated with C structure, null terminated string and any other binary data. So, you can't just make a simple SQL query for debug purposes or testing purposes, you need to write a program that knows how you DB is build and what is the meaning of every byte in your data.

The same problem exists on updates of internal structures lite adding new field or changing its format. More problems exist for architecture migration of DB like Little and Big Endian architectures or even 32 and 64 bit software.

It is also not "Relational DB". It does not checks foreign key constrains, do not perform "checks" on your fields. It leaves all these to the developer.

Berkeley DB, does not work over the network, it is a library. It can be replicated using "master-slave" model when sigle node performs updates and others are used for read only access. For classic SQL DB it is not a problem, because the queries are anywhere send over the network and many applications placed at different network nodes can update the DB. In case of Berkeley DB, only one node in network can write to the DB, others can only read.

This adds serious restrictions to your application architecture. For example, from the beginning, I had divided all the URLs for this blog into two types -- that require write access and read only access to DB. Thus in future, if my blog will be scaled up to several servers, the "write" urls will be redirected to the node that is responsible for updating the DB and others will be spread over all other nodes. Sometimes these are very hard restrictions especially for applications that use session or statistics management in DB where almost every web page access requires DB update.

But not all is so bad!

How many times have you tried to find a bad SQL query that do not uses indexes and makes your DB very slow? Probably, many experienced web developers know this well.

Using Berkeley DB makes it very hard to write bad queries. Not because it is very smart, it is just because you have no way to fetch something if you do not have indexes. And if you find out that you should iterate over all the table to fetch your data, there is probably something wrong in the query you do, and it is very hard to miss this.

There are several wrappers for Berkeley DB that simplify the API and make it more "STL like". One of them is a part of CppCMS.

And the last, but not the least, using Berkeley DB you learn a lot about the structure of SQL data bases, and this allows you write better SQL queries in future or understand what are you actually doing when write some complex "SELECT" statement.

Summary

Berkeley DB is really good and high performance DB. It brings lots of advantages to the application it uses, but it also should be used with lots of brain power invested to the correct application design.

Thats why I had chosen Oracle Berkeley DB as a primary data base for CppCMS -- "C++ Web Development Framework"

Comments

Shai, at 1/3/08, 1:41 PM

Hi Artyom,

It appears that you keep taking the line of performance above all other considerations. This is nice; it certainly gives your CMS a distinction.

There are a couple of inaccuracies in your post; if they reflect your arguments for selecting Berkeley, you might want to reconsider this decision, because the situation with alternatives is much better than you describe it.

1) SQLite is a library, not a client-server solution, so it gets rid of the TCP/IP communication overhead.

2) With most client/server solutions (I know it as a fact for Oracle and Ms Sql Server, it's been a while since I programmed with others) you can "prepare" a statement before sending it to the server. This may mean just some partial compilation on the client, or it could mean full compilation on the server; either way, after preparing once, multiple invocations of the query can be done much faster than by sending each as if it was a completely new one. Some of the servers (Oracle, at least) will even hold a cache of prepared statements, so that you get the benefits even if you don't explicitly use the "prepare" API, provided you use the same string for the query. Which brings me to another point:

3) The use of query parameters as you wrote, by splicing their string representation into the query text, is the wrong way to do it. All major SQL dialects support the notion of query parameters; you write something like "where num < :p1", and provide a number for p1 in some other way (without changing the query string). This data is then passed in binary form, saving the format/parse overhead. When the data is a string, this is not more efficient, but a lot safer, because it prevents SQL injections. Most CMSs fail to use this feature because it requires harder code generation, when they already have the mechanism for text splicing, and because this area is not standardized in SQL (most dialects use ":name" for variables, but Sybase and Microsoft use "@name").

4) Your description of what happens when executing a query is just one possibility. In some cases, a database engine might decide to do things in other ways (e.g. start by matching authors to articles, and only then filter for published). The best way to execute a query depends on many factors -- which conditions are imposed on which table, the relative sizes of tables, which indexes are present, et cetera. Using BDB, as you described it, means you make this decision before compile time, and changing it requires source changes -- while the best solution might change either with further, unrelated development (someone add an index later which will make it possible to answer faster) or just with normal use of the system (changes in table sizes). SQL databases can change strategies accordingly, with very little human intervention (if any).

BDB is good when you really know a lot about the system in advance. I'm not sure this can be the case for a general CMS.

artyom, at 1/3/08, 2:31 PM

I know, but sqlite3 fails due to bad concurrency support.
I know that most of DB has prepared SQL statements, but as practice show not many use them, mostly due to additional complexity of managing SQL queries. But you are absolutely right, the good practice is to use "prepared" statements.
You had answered, on this question by yourself. However you right.
I agree, but in general, the developer should know better the structure of the data. And what is rare case and which is frequent. I don't think that SQL

Anyway, there are still enough overheads -- and the hardest one is TCP/IP.

For example: if your application sends out about 100MBit stream of HTML data, the data steam of same size approximately should be received from your DB (I assume you do not cache it). So for high loads, the data traffic becomes very important and dominant.

BDB is good when you really know a lot about the system in advance. I'm not sure this can be the case for a general CMS.

I do not try to do a general solution, when I write specific CMS I know what are my data structures and the relation between them.

Anyway, I agree that proper use of SQL can lead to good results and the "path" I had described is very simple. But simple benchmarks can show that Berkeley DB is usually significantly faster then traditional SQL databases.

I do not enforce any programmer to use BDB, he can use any other DB he wants. EasyDBD API is independent part of CppCMS.

As I had described before, Using BDB has its benefits but it has its costs as well, and they are not low by no means.

P.S.: I think the name of the project CppCMS is misleading. This framework is not CMS it is a framework for writing different CMSs.

Add Comment:

Author
E-Mail	the email would not displayed
URL

You can write your messages using Markdown syntax.

You must enable JavaScript in order to post comments.

Project

Some rights reserved, the content of this blog is available under Creative Commons Attribution License 2.5 Israel.

Creative Commons