Database Implementation - How to Get Started

I tried to learn programming for a while. I have studied Java and Python, and I like their syntax. Recently, I wanted to use what I found out with encoding tangible software from scratch.

I want to implement a database engine, such as a NoSQL database. I put together a small document, like the specification that should follow during my adventure in coding it. But all I know is a bunch of keywords. I donโ€™t know where to start.

Can someone help me learn how to gather the knowledge that I need for this kind of work and in what order to learn something? I was looking for documents, but I feel that in the end I will find unrelated / erroneous content or start from the wrong point, because implementing the full database engine (it seems, is) is really a difficult task.

I do not want to express that I would prefer abstracts and prints and (e) books for codes of other projects, because I asked a question about good, in which people are usually given answers in the form "read the project - x 'source code". I am not at the level of comfortable reading and understanding of the source code.

+6
source share
2 answers

First, you can see what the answers are. How to write a simple database engine . Although it focuses on the SQL engine, there is still a lot of good material in the answers.

Otherwise a good project tutorial Implementing the B-Tree Database Class . The sample code is in C ++, but a description of what has been done and why, probably, you still want to look.

In addition, there is Design and Implementation of a Structured Storage (Database Engine) at MSDN. Here is a lot of information to help you with your training project.

+8
source

Since the accepted answer only offers (good) links to other resources, I thought I shared my experience with writing webdb , a small experimental database for browsers. I also invite you to read the source code. This is pretty small. You should be able to read this and get a basic idea of โ€‹โ€‹what it does in a couple of hours. Warning I am n00b, and after writing this, I learned a lot more about it and see that I am doing something wrong. This can help you get started.

Theory: BTree

I started by adapting the AVL tree to suit my needs. The AVL tree is a kind of self-balancing binary search tree. You save the key K and its associated data (if any) in node, and then all elements with key < K in node in the left subtree and all elements with key > K in the right subtree. You can use an array to store data items if you want to support non-unique keys.

This tree will give you the basics: creating, updating, deleting and a way to quickly get an element by key or all elements with key <x or with key between x and y, etc. It can serve as an index for our table.

Scheme

As a next step, I wrote code that allows the client code to define a schema. Methods like createTable() , etc. Schemas are usually SQL related, but even no-SQL sorting has a schema; they usually require you to check the ID field and any other fields you want to find. You can make your schema as bizarre as you want, but you usually want to model at least which column serves as the primary key and which fields will often be executed and need an index.

Creating a data structure for storing a table

I decided to use the tree that I created in the first step to save my objects. These were simple JS objects. Having determined which field PK contains, I could simply insert the item into the tree using this field value as a key. This gives me a quick search by ID (range).

Then I added another tree for each column that needs an index. In these trees, I did not store the full record, but only the key. Therefore, to get the customer by last name, I would first use the index for the last name to get the identifier, and then the primary key index to get the actual record. The reason why I didnโ€™t just save the actual object (link to) is because it simplifies installation operations (see next step)

Inquiries

Now that we have a table with indexes for PK and search fields, we can execute the query. I have not gone so far as it gets complicated quickly, but you can get some nice functionality with just a few basics. WebDB does not implement joins; all queries work on only one table. But once you understand this, you see a fairly clear-cut (albeit long and winding) path to creating associations and other complex things.

In WebDB, to get all clients with firstName = 'John' and city = 'New York' (assuming these are two search fields), you should write something like:

 var webDb = ... var johnsFromNY = webDb.customers.get({ firstName: 'John', city: 'New York' }) 

To solve this problem, we first do two searches: we get a set of X of all customer identifiers named "John", and we get a set of Y of all customer identifiers from New York. Then we cross over on these two sets to get all the customer IDs, both of which are called "John" And from New York. Then we run our set of resulting identifiers, getting the actual record for each of them and adding it to the array of results.

Using the given operators, such as union and intersection, we can execute AND and OR. I just implemented AND.

Doing joins (I think) involves creating temporary tables in memory, and then populating them as the query runs with the combined results, and then applying the query criteria to the temp table. I never got there. I tried several synchronization logic, but it was too ambitious, and it went down from here :)

+3
source

All Articles