File Organization
Introduction to storage medium
Database is the computer system which will save, retrieve, update whenever required must be store physically on the part of computer is nothing bus storage medium.
Storage medium mainly divided 3 parts according to access time, storage capacity and cost per bit.
Storage medium
Primary storage medium
Secindary storage medium
Tertiary storage medium
Access time :
Time required to locate and retrieve data
Storage capacity :
Amount of data to be store in storage medium
Cost per bit :
How much cost required to store one bit of data
Primary storage medium
It is the part of CPU, so it is monitor by CPU.
It is the higher level in memory hierarchy.
This medium contains 2 memory.
It is also called high speed buffer.
Main and cache.
The access time of primary storage medium is very high.
The storage capacity of primary storage medium is less.
The cost per bit is high.
Disadvantage of primary storage medium is it has limited storage capacity.
Secondary storage medium
It can't be directly monitor by CPU. To handle such memory first it should be copied into primary storage.
It is middle level in memory hierarchy.
It is called as online storage medium. Because it will automatically store backup of your important files and data into the security data center.
Secondary memory storage contains flash memory and magnetic disk.
Magnetic disc is hard disc and floppy disc.
Access time of secondary storage medium is less.
Storage capacity of secondary storage medium is high.
Cost per bit is also less.
Disadvantage of secondary storage medium is access time is less.
Tertiary storage medium
It is also not part of CPU. This type of memory is usually used for backup.
It is low level in memory hierarchy.
It is also called offline storage medium. Because we can't access to it until it is mounted.
It consists of 2 memory. Magnetic tape and optical disc.
Access time of tertiary storage medium is less than secondary storage medium.
Storage capacity of tertiary storage medium is higher than secondary storage medium.
Cost per bit is less.
Cache memory
It is a primary storage medium that will terporarily store data. CPU can handle it. But we can't.
It contains the instruction of the process. It is also called high speed buffer. Because it's processing time is really near to CPU.
Main memory or Primary
It is internal memory. It is directly connect to CPU through data, address, control bus.
Flash memory
Can store data permanently. It is basically used in digital camera.
Magnetic disc
It is basically used to store large amount of data for user requirements.
Optical disc
It's CD, DVD. It can store data on reflective surface and can be read by laser.
Magnetic tape
File organization
File organization is nothing but a way of arranging record in a file.
Access method
Records from the file access by aplying certain operations like insertion, deletion, updation is known as access method.
File header
In file header, we have to store all the description about the file. It includes file name, no.of records, data types of records, sights of records.
According to updation operation there are 2 types of files.
Static file
Dynamic file
Static file
If update operation is rarely perform on the file such file is known as static file.
Dynamic file
If update operation is frequently perform on the file such file is known as dynamic file.
There are 3 types of file organization.
Heap file
Sorted file
Hash file
Heap file
Heap or Pile file is files of unordered record.
The records are placed into the file in the order which they are inserted.
The new records are inserted at the end of file. Such file organization is called as heap/pile file.
Some operations in the Heap file.
Insertion of new record
The last block of this file will copied into buffer. (Temporary storage area)
Add new record to the buffer.
Rewrite the block from buffer to disc
The address of the last file block is kept in the file header.
Deletion of record
There are 2 method to delete the record from the file.
a. Using buffer
Find first block which is to be deleted.
Copy that block into the buffer.
Delete the record block from the buffer.
Rewrite the block back to the disc.
b. Using deletion marker
An extra byte or bit called deletion marker stored with each record.
If record consist value 1 means record remains as it is in the file.
And if record consist value 0 means record is deleted from the file.
Searching of the record
Searching of the record from the heap/pile file is very difficult and really expensive.
Linear searching is used for searching record from the file. It means record search by block by block. Because heap file is in unordered order.
2. Sorted file
Files of ordered record is known as sorted file.
We can physically order the records of a file on the disc.
Insertion is easy, only find the free space and insert.
Deletion method is using deletion marker only.
We use binary searching method to search value.
List on value of one of the field is called ordering field.
If the ordering field is also has field this field has unique value.
There are some advantages of sorted file.
Reading record in order of the ordering field value is more efficient.
Searching of record is easy by using binary search technique.
Operation of Sorted file
A. Insertion of record
To insert record we must find correct position.
After making space in the file insert record in that position.
B. Deletion of record
For deletion of record we use deletion marker.
C. Searching of record
For searching of record from sorted file, binary searching is in use.
In this method data elements are arrange in sorted manner.
The required value is compare with middle value.
If required value is less than middle, search proceeds in upper half otherwise search proceeds in lover half.
Process will continue till value found.
3. Hash file
File organization is best on technique of hashing. It allows us to avoid accessing and index structure.
In hashing we use the term bucket. It is used to denote a unique of storage that can stored 1 or more records.
Bucket is nothing but a disc block.
In hash file we use a hash function.
h(k) = k mod m for insertion of record
m is nothing but no.of record
n is nothing but value tobe inserted
If records are larger than the capacity of the block, the overflow of collision occur.
There are 3 methods to solve this collision.
Open addressing
Chaining
Multiple hashing
In Open addressing method, it will add new record to the next or previous record.
In chaining, we have to set pointer. Pointer is set to the record for storing new record.
In multiple hashing we have to use various hash functions like h(k) = k mod m
Index file organization
In this chapter we will introduce indexes which are specially designed for speed of the retrieval of record.
Single level ordered index
It is similar to the ordered index scheme of any kind of book. It has 3 types.
Primary index
Clustered index
Secondary index
1. Primary index
It is index whose seach key defines sequential order of file.
Search key of Primary index is usually a primary key.
Primary index is an ordered file whose records are of fixed length having 2 fields.
i. Primary key
ii. Address pointer
It is the first field which having some data as the ordering key field.
It refers to the location of disc where actually data files stored.
Ex. <K(i) = Anand nagar, address of block(1)>
Ordered index
Dense index
Sparse index
Index which can take 1 index entry for every search key value such as dense index.