文档库 最新最全的文档下载
当前位置:文档库 › Abstract Virtual Allocation A Scheme for Flexible Storage Allocation

Abstract Virtual Allocation A Scheme for Flexible Storage Allocation

Virtual Allocation:A Scheme for Flexible Storage Allocation Sukwoo Kang,and A.L.Narasimha Reddy

Dept.of Electrical Engineering

Texas A&M University

College Station,Texas,77843

swkang,reddy@https://www.wendangku.net/doc/de10407338.html,

Abstract

Traditional?le systems allocate and tie up entire storage space at the time the?le system is created.This creates a situation where one?le system could be running out of space,while another?le system has ample unused storage space.In such environment,storage management?exibility is seriously hampered.This paper presents virtual alloca-tion,a scheme for?exible storage allocation.It separates storage allocation from?le system.It employs dynamic allocation strategy based on time of write,which lets ap-plications?t into the actual usage of storage space with-out regard to con?gured?le system size.Unused storage space can be provided to any application or?le system as the needs arise.This improves?exibility by allowing us to share storage space across different?le systems.It also makes it possible for a storage device to be expanded easily to other available storage resources because storage allo-cation is not tied to?le systems.This paper presents the design of virtual allocation and an evaluation of it through benchmarks.To illustrate our approach,we implemented a prototype system on PCs running Linux.We present the re-sults from the prototype implementation and its evaluation. 1Introduction

Traditional?le systems are closely tied to the underlying storage.Storage space is allocated and owned at the time the?le system is created.This directly con?icts with the notion of storage as a discoverable resource since unused space in one?le system cannot be used easily by another?le system running out of space.The?le systems currently em-ploy a static allocation approach where entire storage space is claimed at the time of?le system creation.This is some-what akin to running with only physical memory in a mem-ory system.Memory systems employ virtual memory for many reasons:to allow?exible allocation of resources,safe sharing,to allow larger programs to run etc.Traditional?le

Windows NT

(NTFS)

Linux

(Ext2)

Windows NT

(NTFS)

Linux

(Ext2)

Normal File Systems File Systems with Virtual Allocation

Region

100 GB Common Storage Pool

Allocated

Figure1:Example illustrating virtual allocation.Virtual allocation lets storage space be allocated based on actual usage.Highlighted region indicates current usage and bold line represents actually allocated region.In this case,with virtual allocation,we can get100GB(60GB+40GB)un-allocated storage space across different operating systems and different?le systems.

systems lack such?exibility.To address this problem and to enable storage as a discoverable resource,we have devel-oped a virtual allocation technique at the block level.

A?le system can be created with GBs.Instead of allocating the entire GBs of space,virtual allocation al-lows the?le system to be created with only GBs,where could be much smaller than.The remaining storage space GBs will be unused and available as a dis-coverable resource.If the created?le system does not ex-ceed the actual allocation of GBs,the remaining space GBs can be used by another?le system,or appli-cation.As the?le system grows beyond GBs,the storage space can be allocated on demand or when certain usage thresholds are exceeded.

Such an approach separates the storage space allocation from?le system size and allows us to create virtual storage systems where unused storage space can be provided to any

Data Allocation Unit

Virtual Disk

Figure2:Allocation strategy of virtual allocation.It employs dynamic allocation strategy based on time of write.All writes is done by the unit of the extent.

application or?le system as the needs arise.This approach allows us to share the storage space across multiple(and possibly different)operating systems.The?le systems can function transparently on top of such virtual storage sys-tems as if they have access to a large storage device even though only a small fraction of that device is actually allo-cated and used at this time.Figure1shows two example ?le systems with and without virtual allocation.In Figure 1,highlighted regions illustrate the current usage of disks and bold rectangles indicate actually allocated region.The ?gure shows that unused storage space can be pooled across different operating systems and different?le systems with virtual allocation.

Many of the ideas presented in this paper are pursued in different forms earlier.IBM’s Storage Tank[1]separates meta data operations from data operations to detach storage allocation from?le systems.IBM’s Storage Tank allows a common pool of storage space to be shared across hetero-geneous environments.Object based storage proposed by CMU[2]allows storage allocation to be done at the stor-age systems as opposed to a?le system manager.Log-Structured?le systems(for example,LFS[3]and W AFL [4])allow that storage allocation to be detached from?le systems because all new data is written at the end of a log. These approaches are either at the?le system level or they require new storage paradigm.In contrast,virtual alloca-tion is a device level approach,so it can provide?exible storage allocation scheme transparent to the?le systems. It also works well with existing storage environments with minimal changes.Veritas V olume Manager[5]and other existing LVM products provide considerable?exibility in storage management,but allocate storage space based on ?le system size.

Virtual allocation has the following combination of char-acteristics:

It uses the generic block interface widely used in to-day’s systems.This allows it to support a wide range of heterogeneous platforms,and allows the simplest reuse of existing?le system and operating system technology.

It provides a mechanism to create a virtual storage systems where unused storage space can be provided to any application or?le system as a discoverable re-source.

It can be built on existing systems with little changes to operating systems.

The rest of the paper is organized as follows:Section2 presents the design rationale for virtual allocation and Sec-tion3describes some details about our prototype imple-mentation and evaluation.Section4points to the discus-sions and future work and Section5concludes the paper.

2Design

In this section,we present the design rationale for virtual allocation and its component architectures.Virtual alloca-tion employs allocate-on-write policy i.e.,storage space is allocated when data is written.Figure2illustrates storage allocation strategy of virtual allocation.First,virtual allo-cation writes all data to disk sequentially in a log-like struc-ture based on time of write.The?gure shows an example that new data is written at time t=t and at time t=t where t t.This approach is similar to Log-Structured ?le systems[3,4]where every write is written at the end of a log.However,in contrast,in virtual allocation,only storage allocation is done at the end of a log.Once data is written to disk,data can be accessed from the same loca-tion i.e.,data is updated in-place.Virtual allocation main-tains block address mapping information(block map)for

Extent Device Start Block

Number Number Address

0Dev01000

1Dev05000

2Dev12000

3Dev23000

Table1:Example of block map in virtual allocation.It con-tains information where data of each extent really exists. this purpose.Virtual allocation’s block map is similar to a logical volume manager’s(LVM’s)block map i.e.,it con-verts?le system block addresses to actual physical device block addresses.However,virtual allocation’s block map is dynamically constructed as data is written and not at the time of?le system creation.Virtual allocation can be seen as a generalization of LVM’s functionality.

This block map data structure is maintained in memory and regularly written to disk for hardening against system failures.The on-disk block map,called VA(Virtual Alloca-tion)meta data,is stored at the front of the allocation log,as shown in Figure2.Virtual allocation uses an extent,which is a group of(?le system)blocks,to reduce the V A meta-data information that must be maintained.The block map data structure contains information such as extent number, (mapped)device number,and(mapped)block address.Ta-ble1shows an example of block map data structure.For ex-ample,the?rst row of Table1means that data correspond-ing to extent0is located at block address1000of block device Dev0.

Each entry of the block map corresponds to one extent and whenever a new extent is written,meta data is hardened for data consistency in case of a system failure.If we use a 128KB extent,the size of V A meta data is less than96KB per1GB disk https://www.wendangku.net/doc/de10407338.html,rge extents can retain spatial local-ity and reduce the block map size that must be maintained. However,larger extents may cause data fragmentation on disk.For example,since even small requests will occupy large allocated space size of an extent,storage space will be fragmented if following requests are not sequential.The tradeoffs with the size of an extent are similar to the trade-offs in page size in virtual memory or block size in cache memory systems.We plan to explore this space thoroughly in the future.

File systems tend to create metadata at the time of?le system creation.Due to our allocation policy,?le system meta data is clustered in front of the allocation log.This meta data clustering is denoted by a single FS(?le system) meta data region in the?gure2.IBM Storage Tank takes an approach of separating meta data and normal data at the ?le system level.Our allocation policy tends to result in a similar separation of?le system metadata from the?le system data.

Figure2also shows that when the capacity of physical disk is exhausted or reaches a certain limit(storage expan-sion threshold),a physical disk can be expanded to other available storage resources.This process of storage expan-sion allows?le systems or applications to potentially span multiple devices.Storage expansion is done in terms of al-location units.For example,if the allocation unit is1GB, whenever storage expansion is done,virtual allocation adds 1GB of disk space to an existing disk by?nding other avail-able storage resources.

We de?ne a virtual disk as a storage device seen by?le systems or applications.File systems or applications can function transparently on top of such virtual disk as if they have access to a large storage device even though only a small fraction of that device is actually allocated and used at this time.In virtual allocation,storage expansion can be easily done due to allocate-on-write policy,which makes storage devices not tied to?le systems.

Virtual allocation is implemented as a loadable kernel module that can be inserted below any?le system.Fig-ure3shows that virtual allocation is a thin layer between the?le system and the disk device driver.Virtual alloca-tion can be integrated into a LVM layer when it is present. When virtual allocation is stacked on top of the disk device driver,all I/O requests are intercepted by virtual allocation before being passed to the underlying disk.For a write re-quest to an allocated extent,virtual allocation dynamically allocates space of an extent and routes the request to that lo-cation.It adds this allocation information to the block map so that data can be accessed later from the same location. For a read request,the block map is consulted for mapping information.If it exists,it is routed to that location.Other-wise,it will be an illegal access because an accessed extent is not written yet.Such a read miss will not occur unless ?le system metadata is corrupted.We plan to address this exception in the future for applications that access the raw device.

Figure3shows component architectures of virtual allo-cation.Disk layout manager in Figure3routes incoming read and write requests to appropriate locations and disk block map manager maintains this information as a block map.Disk capacity manager continuously monitors disk usage and it generates a service message to a storage re-source manager once the disk usage reaches a storage ex-pansion threshold.If a storage resource manager is invoked, it tries to?nd available storage resources in local computer or on the network.Once storage is discovered and obtained, the existing device will be expanded to incorporate the new storage.

File System Application

Manager Manager

Storage Resource

Disk Layout

Manager Disk Capacity Manager

Virtual Allocation

Disk Device Driver

Disk Block Map

Figure 3:Virtual allocation architecture.It is located be-tween O/S storage system and disk device driver layer.It intercepts every incoming I/O request and routes it to an appropriate location based on block map.

3Implementation and Evaluation

We developed a prototype of virtual allocation as a kernel

module for Linux 2.4.22based on layered driver architec-ture.Virtual allocation layered driver resides between O/S storage system and the disk device driver interface.It in-tercepts every incoming I/O requests and routes them dy-namically based on block map.In-memory block map was implemented as a hash table.

We evaluated the performance of virtual allocation on a 3.0GHz Pentium 4machine with 900MB of RAM.All ex-periments were conducted on two 36.7GB 10,000RPM Seagate ST336607LW SCSI disks formatted with Ext2.The partition size was just large enough to accommodate the test data on each experiment.The machine ran Red Hat Linux 9with a 2.4.22kernel.Before doing each test,we unmounted the ?le system on which the experiments took place to ensure cold cache.We ran all tests at least ten times,and computed 95%con?dence intervals for the mean throughput.

Virtual allocation can be used with multiple different con?gurations.In our benchmarks,we chose indicative con?gurations to evaluate performance over available fea-tures.We selected four con?gurations to compare perfor-mance:

NORMAL-DISK:It serves as a baseline for performance of other con?gurations.

V A-SINGLE:Virtual allocation con?gured to operate on a single disk.128KB extent is used.

LVM:Logical volume composed of two local disks.We con?gured LVM to have 1GB partition of one disk and 2GB partition of the other disk.

V A-MULTI:Virtual allocation con?gured to span two disks.128KB extent is used.Its allocation unit is 1GB and storage expansion threshold is 80%.Its’disk con?gu-ration is same as LVM.If disk usage becomes more than 80

10

20304050607080

90T h r o u g h p u t (M B /s )

VA?SINGLE NORMAL?DISK LVM VA?MULTI

Disk Configuration

Figure 4:Throughput for Bonnie++benchmark.Each group of bars represents a throughput.The leftmost group shows the write and read throughput of a normal disk.%of one disk,it is expanded to other disk for adding 1GB (allocation unit)disk space.

We tested our con?gurations using two workloads:one with large ?les and the other with typical distribution of ?le sizes in a ?le system.We chose these benchmarks so that we could evaluate the performance under different system activity levels.The ?rst workload was sequential reads and writes of large ?les of Bonnie++benchmark [6].We used a test ?le of 2GB size and a 4KB chunk size.It sequentially writes a test ?le to disk and then sequentially reads it by the unit of the chunk size.In the above con?gurations of LVM and V A-MULTI ,1GB of the test ?le is written to one disk and the other 1GB of test ?le is written to the second disk in both the cases.

The second workload we chose was Postmark [8].We con?gured Postmark to create 30,000?les (between 8KB and 64KB)and perform 100,000transactions in 200direc-tories.This ?le size range matches ?le size distributions reported in ?le system studies [7].Total accessed data of this test was 5GB (Read:2GB,Write:3GB),which was much larger than the system RAM (900MB).Postmark fo-cuses on stressing the ?le system by performing a series of ?le system operations such as ?le creations,deletions,reads,and appends.

Figure 4shows the results of the Bonnie++benchmark.The ?gure depicts write and read throughputs under dif-ferent con?gurations.The height of the bar depicts the throughput.Each group of bars shows the throughput for a particular con?guration.The leftmost group shows the throughput of normal disk for reference.

Virtual allocation con?gured on a single disk with 128KB extent induced little overhead compared to normal disk.

VA?SINGLE NORMAL?DISK LVM VA?MULTI Disk Configuration T r a n s a c t i o n s / s e c

50

100150200

0Figure 5:Transactions for Postmark.Each group of bars

represents a transaction throughput.The leftmost group shows the transaction throughput of a normal disk.Similarly,virtual allocation con?gured on multiple disks with 128KB extent performs comparable to the LVM con-?guration.Above results show that virtual allocation works comparable to a normal disk in a workload with large ?les.Figure 5shows the transaction throughput for Postmark.This ?gure has the same structure as Figure 4and shows results for the same con?gurations for Postmark workload.The ?gure shows that virtual allocation on a single disk with 128KB extent incurs 2.6%overhead.Virtual allocation on multiple disks with 128KB extent incurs 0.9%overhead.Figure 6shows write and read throughput for Postmark.The ?gure shows that the read and write performance of virtual allocation is comparable to a normal disk or LVM.These results show that virtual allocation also can be used well in a workload with small ?les.

Our preliminary results show that virtual allocation may incur little or no signi?cant overhead in both the workloads.As explained earlier,our allocation policy results in the sep-aration of ?le system’s metadata from its data.We plan to conduct more tests in the future to study the impact of such separation.The ?le systems put signi?cant effort into keep-ing the metadata in close proximity to the data.Earlier stud-ies [9,10,11]have shown the importance of such strategies and it is necessary to closely evaluate our approach in light of these studies.

4Discussions &Future Work

The IP connectivity of I/O devices [12]makes it possible for storage devices to be added to the network and en-able storage as a discoverable resource.Virtual allocation,through its separation of storage allocation from ?le sys-

VA?SINGLE NORMAL?DISK LVM VA?MULTI

Disk Configuration

T h r o u g h p u t (M B /s )

2

4

6

8

Figure 6:Write and read throughput for Postmark.Each group of bars represents a throughput.The leftmost group shows the write and read throughput of a normal disk.tem creation,potentially allows ?le systems to span mul-tiple devices across the network.If ?le systems are de-signed to span multiple devices across a network,locality and the resulting performance issues will be of immediate concern.While ?exible deployment may dictate that we do not worry about the actual location of storage,performance issues may force us to revisit data allocation and organiza-tion issues in the future.

When storage resources need to be shared over a net-work,virtual allocation’s block map could be extended to enable local disk caching in unallocated disk space to offset large data access latencies.We plan to pursue this in the future.We plan to investigate further the effects of meta data and data separation.We plan to study the tradeoffs in virtual allocation,such as the impact of the extent size,the impact of the allocation unit and the thresholds for trigger-ing allocation more thoroughly in the future.

We will also investigate the interaction between virtual allocation,and RAID,mirroring and striping typically em-ployed within storage systems [13].Our allocate-on-write policy would need to be suitably modi?ed to ?t into the log-ical device characteristics advertised by the LVM.

5Conclusion

We have proposed virtual allocation employing an allocate-on-write policy for improving the ?exibility of managing storage across multiple ?le systems/platforms.By separat-ing storage allocation from ?le system creation,common storage space can be pooled across different ?le systems and ?exibly managed to meet the needs of different ?le sys-tems.Virtual allocation also makes it possible for storage

devices to expand easily so that existing devices incorporate other available resources.We have shown that this scheme can be implemented with existing?le systems without sig-ni?cant changes to operating systems.Our experimental results from a Linux PC-based prototype system demon-strated the effectiveness of our approach.

6Acknowledgments

We would like to thank Uday Gupta and Jerry Cotter at EMC.Discussions with them motivated this work.We also thank Pat Shuff for his valuable comments. References

[1]IBM.IBM Storage Tank–A Dis-

tributed Storage System.IBM White Paper, https://www.wendangku.net/doc/de10407338.html,/storagesystems/?le sys tems/storage tank/papers.shtml,2002.

[2]Mike Mesnier and et al.Object-Based Storage.IEEE

Communications Magazine,v.41n.8pp84-90,August 2003.

[3]Mendel Rosenblum and John K.Ousterhout.The De-

sign and Implementation of a Log-Structured File Sys-tem.ACM Transactions on Computer Systems,1991.

[4]Dave Hitz and et al.File System Design for an NFS

File Server Appliance.Proceedings of the USENIX Winter1994Technical Conference,1994.

[5]VERITAS.V olume Manager for Win-

dows with MSCS.VERITAS White Paper, https://www.wendangku.net/doc/de10407338.html,/van/products/volumemanager win.html,2002.

[6]Bonnie++.A benchmark suite aimed at performing a

number of tests of hard drive and?le system perfor-mance.https://www.wendangku.net/doc/de10407338.html,.au/bonnie++,2001. [7]W.V ogels.File System Usage in Windows NT4.0.In

Proceedings of the17th ACM Symposium on Operat-ing Systems Principles,pages93-109,Dec.1999. [8]J.Katcher.PostMark:a New Filesystem Benchmark.

Technical Report TR3022,Network Appliance,1997.

https://www.wendangku.net/doc/de10407338.html,/tech library/3022.html.

[9]M.McKusick,W.Joy,S.Lef?er and R.Fabry.A fast

?le system for UNIX.ACM Trans.on Computer sys-tems,Aug.1984.

[10]G.Ganger and Y.Patt.Metadata update performance

in?le systems.Proc.of OSDI,1994.[11]R.Wang,T.Anderson and D.Patterson.Virtual log

based?le systems for a programmable disk.Proc.of OSDI,Feb.1999.

[12]M.Krueger,R.Haagens, C.Sapuntza-

kis,and M.Bakke.RFC3347:Small Computer Systems Interface protocol over the Internet(iSCSI) Requirements and Design Considerations,2002. [13]D.A.Patterson,G.Gibson,and R.H.Katz.A Case

for Redundant Array of Inexpensive Disks(RAID).

In Proceedings of ACM SIGMOD,1988.

相关文档
相关文档 最新文档