A Database Fix For The File SystemA Database Fix For The File System
Most everyone agrees that migrating unstructured files into a structured object-oriented database is the best way to manage the explosion of enterprise data. We explain where this effort stands and the obstacles to creating a better file system infrastructure.
File system data is often called "unstructured data" to differentiate it from the structured data in a database. But file systems are actually organized, and they provide an interface between applications, end users and physical disk storage.
The typical Unix file system structure is shown in "The 'I' in Inode" on page 67. The inode contains the file's metadata presented to the end user or application, and it facilitates the reading and writing of data to the disk itself. The complex combination of metadata and direct and indirect block pointers--extents, in Microsoft parlance--found in inodes reflects an era in which storage (both memory and disk) was expensive and coding elegance was prized over convenience.
Still, contemporary file systems present some painful limitations that could be better addressed by an object-oriented database approach to storing data. These limitations include:
Data isolation: Data is accessed via inode stovepipes, which makes coordinating, indexing and managing data a daunting task.
Data duplication: Required for sharing between heterogeneous file systems, results in wasted space, and versioning and consistency issues.
Data incompatibility: There is no guarantee that a file can be opened by a given application, and different semantics used by various OS file systems can limit your access to just one instance of the file.
Some of these limitations, however, have more to do with the way users deploy file-system capabilities. If users more carefully named their files and if file directories or folders were more logically defined and administered, it would help identify files that require disaster-recovery backup or regulatory compliance.
An object-oriented file system is a better option (See "A New File System Paradigm," page 69). Databases provide greater controls over who has access to what, and database controls facilitate concurrent server and client access, even when there are different operating systems involved. Data is stored in a uniform fashion that all OSs can understand.
Other benefits of the database approach are:
Enhanced data naming and metadata management, thanks to descriptive header information that can be used to establish data classes, enable different data views and improve segregation techniques.
Tracking access frequency and access types (read versus write) to fine-tune data placement on storage and to identify suitable candidates for archiving.
More efficient indexing and retrieval for regulatory compliance and disaster recovery.
About the Author
You May Also Like