What is a good open source database for exploring database design? (DBMS design, not normalization of tables, etc.)

As stated in the question, I'm not looking for help on creating a database in terms of creating tables, normalization, etc.

As a programming project, I want to write my own DBMS. This is for learning more than anything, so reinventing the wheel is your goal.

I began my search by looking at SQLite - I found my SVN branch from 2001 to 2004, which is amazingly commented on, but still a lot needs to be digested right away. But even in this case, I go through it for about an hour or two, and my head is already in hyper movement with ideas.

So, I ask here, hoping to find out if anyone knows about a small and very simple DBMS that I could get some ideas or inspiration regarding query analysis, data storage, search creation, etc.

Thanks!

+8
c database sqlite parsing relational-database
source share
6 answers

I was told that the PostgreSQL source code is very well documented and structured.
But he, obviously, does not qualify as a "small base DBMS."

In addition, the only “little ones” that I know of are Java-based DBMSs:

Not sure if a Java based implementation will help you.

+1
source share

There is Edward Sciore SimpleDB (not related to Amazon SimpleDB), "A Simple Java Multi-User System for Learning Internal Databases." This is in Java, but I think the ideas will be pretty easy to translate to C.

From http://www.cs.bc.edu/~sciore/simpledb/intro.html :

SimpleDB is a multi-user transactional database server written in Java that interacts with Java client programs through JDBC. The system is intended for pedagogical use only. The code is clean and compact. The APIs are simple. The learning curve is relatively small. Everything about it is aimed at improving the experience with the database system internal course. Consequently, the system is intentionally bare bones. It implements only a small part of SQL and JDBC, and has little or no error checking. Although this is a great learning tool, I cannot imagine that anyone would want to use it for anything else.

There is also a book:

Design and implementation of databases

+1
source share

As already mentioned, SQLite, JavaDB, and SimpleDB are good examples. I would add Berkeley DB to the list. Berkleley DB is well documented, has existed for several years, has several available APIs, as well as many access methods, such as HASH, QUEUE and RECNO, in addition to the traditional B-tree. Berkeley DB is a key / value database library written in C. Berkeley DB XML is an XML data library written in C ++ on top of Berkeley DB. Berkeley DB Java Edition is a 100% Java database / value library. All of them are available under the GPL, and source code is included in the distribution .

The Berkeley DB SQL API includes the SQLite API, mainly implementing a BDB key / value pair data store under the SQLite query layer. Berkeley DB was also the first MySQL data warehouse implementation, again taking the SQL query layer and storing the data in a simple key / data data format. This is certainly an interesting way to look at the problem - if you have a flexible, fast, scalable and reliable data warehouse, you can add any type of API or data presentation / abstraction on top of it. This is exactly what Berkeley DB does by providing a choice between key / value core data storage or XML, SQL, Java Collections, or a POJO-like Persistence over the base key / value infrastructure.

Berkeley DB is close to the “clean” storage engines you find. He makes no assumptions about the structure, content, or format of the stored data. This allows the upper tiers to provide these abstractions, while the lower tier focuses on fast, scalable, and reliable storage. One of the reasons Berkeley DB is so widely used is its simplicity and focus make it very fast, reliable and scalable.

Disclaimer: I am one of the product managers for Berkeley DB, so I am clearly a bit biased. But I have also been working on database products for 25 years, and I know a little about internal DBMSs :-)

Good luck with your research.

Dave

+1
source share

Perhaps you can look at the Apache Derby database. This is a full implementation of the RDBMS; Well, it is written entirely in Java, though. and, of course, this is not a small and simple implementation. But this can serve as a good reference.

0
source share

Maybe SQLite is a good start. It is as simple as possible (there is no network layer, simplified locking, etc.), but it understands real SQL, has indexes and restrictions, and is implemented in C. However, its storage is peculiar.

0
source share

If you need a simple relational database system that uses the SQL query language, then SQLite is. Keep reading this code.

But if you are not hung up on completely relational data warehouses, then google for the B + source code. The B + tree is a fundamental data structure that allows you to maintain a sorted index on disk, and several packages of C source code were implemented 15-20 years ago. This is much simpler because there is no SQL and basically two parts, one for managing blocks on disk, and the other to manage the structure of the B + Tree.

Once you understand this, you can return to the SQLite code and, without a doubt, identify similar modules among the rest of the code.

Sometimes the best way to learn is to repeat some historical steps.

0
source share

All Articles