Java Project made with the use of Java Libraries and core concepts like Java OOPS, Multithreading, Serialization, Java Swing, etc.
Java, Java Swing and AWT, and core Java Programming concepts like OOPS, Multithreading, Serialization, etc.
The relational database i.e. a database in the form of rows and columns is stored in the form of extents and data pages. The extents can be understood as folders and the data pages can be understood as files. So, each row is stored in the data page sequentially. Usually, the limit of a data page is 8KB. So, after a data page gets completely filled, the next row goes into a new data page. Usually, an extent can accomodate 8 data pages i.e. its size is usually limited to 64 KB. So, after an extent has accomodated 8 data pages, a new extent is created and a new data page is created and the further rows start getting stored from there.
Now, if we want to search some data, the entire database is searched for it i.e. every extent, and every data page in those extents and every row in those data pages is searched. So, the time complexity of this searching is O(N) where N is the number of rows in the data base. This technique of searching the data is called Table Scan as the entire table is being scanned.
We know that if the data has some particular indices attached to it, the searching becomes faster. We also know that arranging the data in the form of a Binary Tree makes searching faster. In a binary tree, the data in the left sub-tree of a node is smaller than its data and the data in the right sub-tree of a node is larger than it. So, a B-Tree has the same property but not just with 2 children, with multiple children. So, we create indices upon the attributes of the relational database and for each attribute, we maintain a seperate B-Tree that stores the indices. The data is present in the last level of the B-Tree and it tells us the exact location of the data. Exact location means the extent number, data page number and the row-offset of the data. So, we first have to reach the correct index and then that index helps us reach the data. Reaching the correct index means traversing the B-Tree and it takes O(logdN) (d is in the base of the log) time where d is the number of child nodes of a node in B-Tree and N is the number of rows in the database. Then, it tells us the exact location of the data which means it takes O(1) time to reach the data after we have reached the correct index. This technique of using indices to make searching faster is called Index Seek Method.
So, the project aims at showing the differnce in speed and searching time of the Index Seek and Table Scan Searching methods and analyzing the advantages and siadvantages of both the methods.
The advantage of Index Seek Method over the Table Scan method is very clear. The searching time is reduced from Linear to Logarithmic.
- In index seek method, let us say that there are C number of columns (attributes). So, if we want to create indices upon every attribute, we will have to scan the entire data base C times initially for creating the indices. So, the entire database will be scanned for creating the indices every time. After this, the search upon the attribute using Index Seek method will always be logarithmic.
- The second major drawback is extra space. We will create Index Pages. These index pages are nothing but the nodes of the B-Tree. So, extra space is used for making the searching faster.
Download the source code from GitHub to your local. Now, go to the location B-Tree-master\IndexDemo\IndexDemo\src. Here, compile the Java Client program using the command: javac Client.java Now, you have to create an empty folder at any location. This will be the folder in which the entire database will be created. So, we have to run the Java Client file and while running, provide the path of the empty folder that you have just created as a command line argument. For instance, in my case, I have created a folder named IndexPrj. So, this is the command that I run on the terminal: java Client E:\IndexPrj
When you do this, the Java Application should run and this is what you will see.
For creating the random database, you have to input the number of rows in the text area available in the Data-Panel as shown below.
A random test data of 10000 rows will be created in the same folder whose location you mentioned in the command line arguments.
You can see the data in the form of extents and data pages. You can open any random page in any random extent. You will see some random strings seperated by the pipe (|) symbol. They are Roll Number, Name, UserName and Password respectively.
The Metadata file contains the number of rows in the database.
The size of each page is maximum 8 KB and the size of each extent is maximum 64 KB.
Open any random extnet and random data page. Pick any row from it and choose any attribute. Let us say UserName. Now, search the UserName in the Query Panel.
For instance, the UserName with Roll Number 7658 in Page 4 of extent 5 is chosen here.
As you can see, it takes 201ms time and 35 pages were read.
Now, let us create indices upon UserName.
We can see the indices being created in the progress bar of the Project as well as in the terminal.
This is a time consuming process as the entire database is being read.
After the process is completed, we can see that indices have been created upon the UserName.
As you can see above, we have a root page and all the other pages are the nodes of the B-Tree. The metadata file consists the data that which pages are non-leaf, root, and leaves, etc.
So, let us now search for the same username. Now that indices have been created, Index Seek method will be used automatically by the Project.
So, you can see that the Search time reduced to 7ms and only 2 pages were read. This is the advantage of index seek method over the table scan method.