Design and implementation of a self-securing storage device

Self-securing storage prevents intruders from undetectably tampering with or permanently deleting stored data. To accomplish this, self-securing storage devices internally audit all requests and keep all versions of all data for a window of time, regardless of the commands received from potentially compromised host operating systems. Within the window, system administrators are guaranteed to have this valuable information for intrusion diagnosis and recovery. The S4 implementation combines log-structuring with novel metadata journaling and data replication techniques to minimize the performance costs of comprehensive versioning. Experiments show that self-securing storage devices can deliver performance that is comparable with conventional storage. Further, analyses indicate that several weeks worth of all versions can reasonably be kept on state-of-the-art disks, especially when differencing and compression technologies are employed. The authors would like to thank the member companies of the Parallel Data Consortium (CLARiiON Array Development, EMC Corporation, Hewlett-Packard Labs, Hitachi, Infineon Technologies, Intel Corporation, LSI Logic, MTI Technology Corporation, Novell, Inc., PANASAS, L.L.C., Procom Technology, Quantum Corporation, Seagate Technology, Sun Microsystems, Veritas Software Corporation, and 3Com Corporation). The authors also thank IBM Corporation and CMU's Data Storage Systems Center for their support of this project.


Introduction
Despite the best efforts of system designers and implementors, it has proven difficult to prevent computer security breaches.This fact is of growing importance as organizations find themselves increasingly dependent on wide-area networking (providing more potential sources of intrusions) and computer-maintained information (raising the significance of potential damage).A successful intruder can obtain the rights and identity of a legitimate user or administrator.With these rights, it is possible to disrupt the system by accessing, modifying, or destroying critical data.
Even after an intrusion has been detected and terminated, system administrators face two difficult tasks: determining the damage caused by the intrusion and restoring the system to a safe state.Damage includes compromised secrets, creation of back doors and Trojan horses, and tainting of stored data.Detecting each of these is made difficult by crafty intruders who understand how to scrub audit logs and disrupt automated tamper detection systems.System restoration involves identifying a clean back-up (i.e., one created prior to the intrusion), reinitializing the system, and restoring information from the back-up.Such restoration often requires a significant amount of time, reduces the availability of the original system, and frequently causes loss of data created between the safe back-up and the intrusion.
Self-securing storage offers a partial solution to these problems by preventing intruders from undetectably tampering with or permanently deleting stored data.Since intruders can take the identity of real users and even the host OS, any resource controlled by the operating system is vulnerable, including the raw storage.Rather than acting as slaves to host operating systems, self-securing storage devices view them, and their users, as questionable entities for which they work.These self-contained, self-controlling devices internally version all data and audit all requests for a guaranteed amount of time (e.g., a week), thus providing system administrators time to detect and recover from intrusions.The critical difference between self-securing storage and host-controlled versioning (e.g.Elephant [29]) is that intruders can no longer bypass the versioning software by compromising a complex OS or its poorly-protected user accounts.Instead, intruders must compromise single-purpose devices that export only a simple storage interface, and in some configurations, they may have to compromise both.This paper describes self-securing storage and our implementation of a self-securing storage system, called S4.A number of challenges arise when storage devices distrust their clients.Most importantly, it may be difficult to keep all versions of all data for an extended period of time, and it is not acceptable to trust the client to specify what is important to keep.Fortunately, disk capacities increase faster than most computer characteristics (100%+ per annum in recent years).Analysis of recent workload studies suggests that it is possible to version all data on modern 30-100GB drives for far longer than a week [29,35].Further, aggressive compression and cross-version differencing techniques extend the intrusion detection window offered by self-securing storage devices.Other challenges include maintaining on-disk locality when blocks cannot be overwritten, achieving secure administrative control, and dealing with denial-of-service attacks.
The S4 system addresses these challenges with a new storage management structure.The storage management system uses a log-structured object system for data versions, a novel journal-based structure for metadata versions, and an opportunistic on-disk anti-entropy cache for restoring sequentiality to version-scrambled objects.In addition to reducing space utilization, the metadata journaling simplifies background compaction and reorganization for blocks shared across many versions.Experiments with S4 show that the security and data survivability benefits of self-securing storage can be realized with reasonable performance.Specifically, the performance of network-attached S4/NFS is comparable to FreeBSD's NFS for both microbenchmarks and application benchmarks.
The remainder of this paper is organized as follows.Section 2 discusses intrusion survival and recovery difficulties in greater detail.Section 3 describes how self-securing storage addresses these issues, presents some challenges inherent to self-securing storage, and discusses design solutions for addressing them.Section 4 describes the implementation of S4.Upon gaining access to a system, an intruder has several avenues of mischief.Most intruders attempt to destroy evidence of their presence by erasing or modifying system log files.Many intruders also install back doors in the system, allowing them to gain entry at will in the future.They may also install other software, read and modify sensitive files, or use the system as a platform to launch additional attacks.Depending on the skill with which the intruder hides his presence, there will be some detection latency before the intrusion is discovered by an automated intrusion detection system (IDS) or by a suspicious user or administrator.
During this time, the intruder can continue his malicious activities while users continue to use the system, thus entangling legitimate changes with those of the intruder.Once an intrusion has been detected and discontinued, the system administrator is left with two difficult tasks: diagnosis and recovery.
Diagnosis is challenging because intruders can usually compromise the "superuser" account on most operating systems, giving them full control over all resources.In particular, this gives them the ability to manipulate everything stored on the system's disks, including audit logs, file modification times, and tamper detection utilities.Recovery is difficult both because diagnosis is difficult and because user-convenience is an important issue.This section discusses intrusion diagnosis and recovery in greater detail, and the next section describes how self-securing storage addresses these issues.

Diagnosis
Intrusion diagnosis consists of three phases: detecting the intrusion, discovering the weaknesses that were exploited (for future prevention), and determining what the intruder has done.All are difficult when the intruder has free reign over storage and the OS.
Without the ability to protect storage from compromised operating systems, intrusion detection may be limited to attentive users and system administrators noticing odd behavior.
Examining the system logs is the most common approach to intrusion detection [6], but when intruders can manipulate the log files, such an approach is not useful.Some intrusion detection systems also look for changes to important system files [17].These systems are vulnerable to intruders who can change what the IDS thinks is a "safe" copy.
Determining how the intruder compromised the system is often impossible in conventional systems, because he will scrub the system logs.In addition, any tools that may have been stored on the target machine for use in multi-stage intrusions may have been deleted.
The common "solutions" are to try to catch the intruder in the act or to hope that he forgot to delete his exploit tools.
The last step of diagnosing an intrusion is to discover what was accessed and modified by the intruder.This is extremely difficult, because file access and modification times can be changed, and system log files can be doctored.In addition, checksum databases are of limited use, since they are effective only for static files, thus providing no protection for user data.

Recovery
Because it is usually not possible to diagnose an intruder's activities, full system recovery generally requires that the compromised machine be wiped clean and reinstalled from scratch.
Prior to erasing the entire state of the system, users may insist that critical data be saved.
Critical data is any data that has changed since the last backup and requires significant effort to recreate.The more effort that went into creating the changes, the more motivation there is to keep this data.Unfortunately, as the size and complexity of the data grows, the likelihood that tampering will go unnoticed increases.Foolproof assessment of the data is very difficult, and overlooked modifications may hide tainted information or a back door inserted by the intruder.
Upon restoring the OS and any applications on the system, the administrator must identify a backup that was made prior to the intrusion; the most recent backup may not be usable.After restoring data from a verified backup, the critical data can be restored to the system, and users may resume using the system.This process often takes a considerable amount of time-time during which users are denied service.
3 Self-Securing Storage Self-securing storage ensures information survival and auditing of all accesses by establishing a security perimeter around the storage device.Conventional storage devices are slaves to the host operating system, relying on it for protection of the users' data.A self-securing storage device operates as an independent entity, tasked with the responsibility to not only store data, but to protect it as well.This shift of storage security functionality into the storage device's firmware allows data and audit information to be safeguarded in the presence of file server and client system intrusions.Even if the OSes of these systems are compromised and an intruder is able to issue commands directly to the self-securing storage device, the new security perimeter remains intact.
Behind the security perimeter, the storage device ensures data survival by keeping all versions of data.This history pool of old data versions, combined with the audit log of accesses, can be used to diagnose and recover from intrusions.This section discusses the benefits of self-securing storage and several core design issues that arise in realizing it.

Enabling intrusion survival
Self-securing storage assists in intrusion recovery by allowing the administrator to view audit information and quickly restore modified or deleted files.The audit logs of data accesses help to diagnose intrusions and detect the propagation of any maliciously modified data.
Self-securing storage maintains old versions of data objects.This simplifies diagnosis of an intrusion since system logs and programs cannot be imperceptibly altered.Because of this, self securing storage makes conventional tamper detection systems obsolete.In addition, since the drive maintains these old versions, they can quickly be restored to their pre-intrusion state.
Since the administrator has the complete picture of the system's state, from intrusion until discovery, it is considerably easier to establish the method used to gain entry.For instance, the system logs would have normally been erased, but by examining the versioned copies of the logs, the administrator can see any messages that were generated during the intrusion and later removed.In addition, any exploit tools temporarily stored on the system may be recovered.Previous versions of system files, from before the intrusion, can be quickly and easily restored by resurrecting them from the history pool.This prevents the need for a complete re-installation of the operating system, and it does not rely on having a recent, off-line backup or up-to-date checksums (for tamper detection) of system files.Additionally, by utilizing the storage device's audit log, it is possible to assess which data might have been directly affected by the intruder.Further, there is no need to pre-back-up "critical files" before the restore, since files can be selectively pulled forward by the user and/or backed up after restoration.
The data protection provided by self-securing storage allows easy detection of modifications, selective recovery of tampered files, prevention of data loss due to out-of-date backups, and speedy recovery, since data need not be loaded from an off-line archive.

Device security perimeter
The device's security model is what makes the ability to keep old versions more than just a user convenience.The security perimeter consists of self-contained software that supports only a simple storage interface to the outside world and verifies each command's integrity before processing it.In contrast, most file servers and client machines run a multitude of services that are susceptible to attack.Since the self-securing storage device is a singlefunction embedded device, the task of making it secure is much easier; compromising its firmware is analogous to breaking into an IDE or SCSI disk.
For network-attached devices (as compared to devices attached directly to a single host system), the internally managed audit log becomes more useful if the device can verify each request as coming from both a valid user and a valid client.This can allow the device to enforce access control decisions and partially track propagation of tainted data.If clients must be authenticated, requests can be tracked to a single client machine, and the device's audit log can yield the scope of direct damage from the intrusion of a given machine.By making sure any given request is bound to a {client, user} pair, a self-securing storage device can assure the following: • For an uncompromised client, requests are bound to the correct user's credentials and not those of another user on that machine.Any client not exhibiting this behavior would be considered compromised.
• For a compromised client, accesses are bound to the correct machine's credentials, but user information may or may not be correct.
Network-attached storage must also deal with privacy and authenticity of network traffic [7,9].One solution would be the use of a network-level mechanism like IPSec [16], for which hardware support is expected to minimize the performance consequences.

History pool management
The old versions of objects kept on the drive comprise the history pool.Every time an object is modified or deleted, the version that existed just prior to the modification becomes part of the history pool.Eventually the previous version will age and have its space reclaimed by the drive.Because clients cannot be trusted to demarcate versions consisting of multiple modifications, a separate version must be kept for every modification.This is in contrast to versioning file systems that generally create new versions only when a file is closed.
A self-securing storage device guarantees a lower bound on the amount of time that a deprecated object remains in the history pool before it is reclaimed.During this window of time, the old version of the object can be completely restored by requesting that the drive copy forward the old version, thus making a new version.The window of time during which an object can be restored is called the detection window.When determining the size of this window, the administrator must examine the tradeoff between the detection latency provided by a large window and the extra disk space that is consumed by the proportionally larger history pool.
While the capacity of disk drives is growing at an incredible rate, it is still finite, which presents two problems: 1. Providing a reasonable detection window in exceptionally busy systems.
2. Dealing with malicious users that attempt to fill the history pool.(Note that space exhaustion attacks are not unique to self-securing storage.However, device-managed versioning makes per-user quotas ineffective for limiting them.)In a busy system, the amount of data written could make providing a reasonable detection window difficult.Fortunately, the analysis in section 5.2 suggests that multi-week detection windows can be provided in many environments at a reasonable cost.Further, aggressive compression and differencing of old versions can significantly extend the detection window.
Deliberate attempts to overflow the history pool cannot be prevented by simply increasing the space available, and as with most denial of service attacks, there is no perfect solution.There are three flawed approaches to addressing this type of abuse.The first is to have the device reclaim the space held by the oldest objects when the history pool is full.
Unfortunately, this would allow an intruder to destroy information by causing its previous instances to be reclaimed from the overflowing history pool.The second flawed approach is to stop versioning objects when the history pool fills; while this will allow recovery of the old data, system administrators would no longer be able to diagnose the actions of an intruder or differentiate them from subsequent legitimate changes.The third approach is for the drive to deny any action that would require additional versions once the history pool fills; this would result in denial of service to all users (legitimate or not).
Our hybrid approach to this problem is to try to prevent the history pool from being filled by detecting probable abuses and throttling the source machine's accesses.When successful, this allows human intervention before the system is forced to choose from the above poor alternatives.Selectively increasing latency and/or decreasing bandwidth allows well-behaved users to continue to utilize the system even while it is under attack.Experience will show how well this works in practice.
Since the history pool will be used for intrusion diagnosis and recovery, not just recovering from accidental destruction of data, it is difficult to construct an algorithm that would save space in the history pool by pruning versions within the detection window.Almost any algorithm that could be constructed to selectively remove versions has the potential to be abused by an intruder to cover his tracks and to successfully destroy/modify information during a break-in.

Interface to history information
The history pool contains a wealth of information about the system's recent activity.This makes accessing the history pool a sensitive operation, since it allows the resurrection of deleted and overwritten objects.This is a standard problem posed by versioning file systems, and it is exacerbated by the inability to selectively delete versions.
There are two basic approaches that can be taken toward access control for the history pool.The first is to allow only a single administrative entity to have the power to view and restore items from the history pool.This could be useful in situations where the old data is considered to be highly sensitive.Having a single tightly-controlled key for accessing historical data decreases the likelihood of an intruder gaining access to it.While this improves security, it prevents users from being able to recover from their own mistakes, thus consuming the administrator's time to restore users' files.The second approach is to allow users to recover their own old objects (in addition to the administrator).This provides the convenience of a user being able to recover their deleted data easily, but also allows an intruder, who obtains valid credentials for a given user, to recover that user's old file versions.It is important to note that permitting full deletion of objects would be perilous to the integrity of the data, since such a mechanism could be used by intruders to destroy information.
Our compromise is to allow users to selectively decide, on a file by file basis.By choice, a user could thus delete an object, version, or all versions from visibility by anyone other than the administrator.Complete removal should not be permitted, since permanent deletion of data via any other method than aging would be unsafe.This would allow users to enjoy the benefits of versioning for presentations and source code, while preventing access to visible versions of embarrassing images or unsent e-mail drafts.

Version-administration tools
Since self-securing storage devices store versions of raw data, users and administrators will need assistance in parsing the history pool.Tools for traversing the history must assist by bridging the gap between standard file interfaces and the raw object versions that are stored on the device.By being aware of both the versioning system and formats of the data objects, utilities can present interfaces similar to that of Elephant [29], with "time-enhanced" versions of standard utilities such as Is and cp.
In addition to allowing a simple view of data objects in isolation, intrusion diagnosis tools can utilize the audit log to provide an estimate of damage.For instance, it is possible to see all files and directories that a client modified during the period of time that it was compromised.Further estimates of the propagation of data written by compromised clients are also possible, though imperfect.For example, diagnosis tools may be able to establish a link between objects based on the fact that one was read just before another is written.
Such a link between a *.c source file and its corresponding *. o would be useful if a user determines that a source file had been tampered with; in this situation, the object file should also be restored or removed.Exploration of such tools will be an important area of future work.
4 Implementation of the S4 device S4 is a self-securing storage device, which maintains an efficient object-versioning system transparently for its clients.It aims to perform comparably with current systems, while providing the benefits of self-securing storage and minimizing the corresponding space explosion.

4,1 Object store
Considerable research has gone into providing a useful abstraction above the standard blocklevel interface to mass storage devices.This work resulted in several proposals and prototypes for object-based disks [8,25].It has also shown that the object abstraction simplifies access control by treating an arbitrary collection of bytes as a management unit, as compared to a standard block device.Since the S4 drive is responsible for enforcing and managing its own access control decisions, we have chosen an object-based interface.The S4 object system is layered above two separate subsystems: the infrastructure and the front end (see Figure 1).The infrastructure contains the drive's RPC and disk interface, and exports all of the drive's interfaces to clients through a set of RPCs (see Table 1) layered over the network [24,33].The front end acts as a communication layer between the RPC and the object store, and enforces security.
To support persistent mount points, an S4 drive uses "named objects.

On-disk data organization
The main goals for the S4 object system are to avoid performance overhead and to minimize wasted space, while keeping all versions of all objects for a given period of time.Achieving these goals required a combination of known and novel techniques for organizing on-disk data.
Since data within the history pool cannot be overwritten, the object system uses a log structure similar to LFS [28].This structure allows multiple data and metadata updates to be clustered into fewer, larger writes and obviates any need to move previous versions before writing.With additional metadata, the old versions can be retrieved.To do this efficiently, we use a variation of journaling to track object changes.Marking changes with journal entries saves space and simplifies the process of recreating old object versions.A segment cleaner, similar to LFS's cleaner, removes expired versions from the history pool.
Several studies have examined log-structured file systems' performance problems and ways to alleviate them [21,30].One of the bigger problems is the loss of data locality for frequently changing files.We have created a new data structure called an anti-entropy cache to help solve this problem.An anti-entropy cache opportunistically keeps an additional read-optimized copy of the object on the disk.If the drive is low on space, this copy can be reclaimed to allow for further log growth.

Object structure
An S4 disk object consists of two basic parts, an onode and an ACL table.The onode holds standard inode metadata along with an opaque space for file system specific attributes.S4-

Overall disk layout
The S4 Object system divides the disk into fixed size log segments.LFS used a segment size of 1MB, but based on recent research results [21], we chose a segment size of 64KB.We group these segments into 4GB allocation groups, which contain a summary segment marking each 4GB boundary (see Figure 2).This summary segment contains a copy of the superblock, the free segment bitmap for that allocation group, and indirect blocks used for an object map, similar to LFS's inode map.S4 uses the object map to track the location of the most current copy of an object's metadata within the log, and potentially, its anti-entropy cache.This flexible structure gives us the ability to have a large number of objects in the system without having to pre-allocate space for them.

Bitmap
S4 does all of its log writes to data segments (see Figure 3).Each data segment has one block for summary information and 31 blocks available for journal entries and data.
The summary block keeps track of the layout and allocation of space within the segment.
Specifically, it has a free count and bitmap to track available space, and a mapping of objects to their metadata within the segment.

Journal-based metadata versioning
S4 stores the history of metadata changes in a journal.and indirect blocks of an object into the log.Also, all entries can serve to either undo or redo the specified operation.The undo operation is necessary to allow in-time access of objects, while the redo is helpful for the cleaner, allowing it to efficiently roll changes forward as it deletes expired versions.
Storing an object's changes within the log is done using journal sectors.Each journal sector contains the packed journal entries that refer to a single object's changes made within that data segment (see Figure 4).The sectors are tracked within a data segment using the metadata mapping found in the summary information.Journal sectors are chained together backward in time to allow for version reconstruction.
Journal entries also allow efficient differencing between versions.Since the exact changes between writes are noted within the entry, it is easy to find the blocks that should be compared.Once the differencing is complete, the old blocks can be discarded, and the difference left in its place.For subsequent reads of old versions, the data for each block must be recreated as the entries are traversed.Still, cross-version differencing of old data will often be effective in reducing the amount of space used by old versions [2].

Cleaning
Like any log-structured file system, the S4 object system must use a cleaner to recover disk space.Unlike others, S4 can only reclaim sufficiently old versions.The cleaner runs during idle time, when specifically requested by an administrator, or when the system becomes low on disk space.
To reclaim disk space, the cleaner searches through the object map, looking for objects with an oldest time greater than the detection window.Once it finds an object, the cleaner searches back through the object's journal entries for resources to free.When it finds a part of the state of one or more objects, the client sends an additional RPC to the drive to flush buffered writes to the disk.Since this RPC does not return until the synchronization is complete, NFSv2 semantics are supported even though the drive normally buffers writes.
Because the client must overlay a file system on top of the flat object namespace, some operations require several drive operations (and hence several RPC calls) to implement a single NFS operation.These multiple operations are analogous to the multiple operations file systems must perform on block-based devices.To minimize the number of RPC calls necessary, the client aggressively maintains an attribute and a directory cache.The drive also supports combining the SetAttr, GetAttr, and SyncO operations with the Create, Read, Write, and Append operations.

Evaluation
This section evaluates the feasibility of self-securing storage, finding that it is possible for storage devices to assist with intrusion survival.Experiments with S4 indicate that devicecontrolled versioning can be done without significant performance reduction.Also, estimates of capacity growth based on reported workload characterizations indicate that history windows of a week or more can easily be supported in several real environments.It is important to note that although the prototype has been implemented on a Linux system, the drive should be viewed as a single-purpose embedded device.

Performance
Our main performance goal for S4 is to be comparable to other networked file systems.To explore this, we ran a number of micro and macro-benchmarks against a S4 drive mounted through the client NFS module.We compared these results to the FreeBSD 4.0 NFSv2 server.(Since the S4 drive runs on Linux, we would have preferred to to compare S4 against the Linux NFS server.However, Linux NFS does not comply with the NFSv2 semantics of committing data to stable storage before operation completion.)

Experimental setup
All experiments were run on three configurations: (1) a S4 drive running on RedHat 6.2 Linux communicating to a Linux client over S4 RPC through the S4 client module mounted via loopback (see Figure 6), (2) a BSD server communicating with a Linux client over UDP NFS, and (3) a BSD server communicating with a BSD client over TCP NFS (the BSD client is used because performance is much worse for a Linux client).In all cases, NFS was configured to use 4KB read/write transfer sizes, which is the only option supported by Linux.Read-ahead was disabled for TCP NFS and is not currently supported by the S4 client.Both BSD NFS configurations export a BSD FFS file system.All experiments were run a minimum of 5 times and have a standard deviation of less than 3% of the mean (unless reported otherwise).The S4 drives were configured with a 128MB buffer cache and an object cache capable of storing 8192 objects.These numbers were chosen given that the NFS servers' caches could grow to fill local memory (512MB).S4 was also configured with the anti-entropy cache and cleaner disabled.
In all experiments, the client system has a 550

Micro-benchmark results
To measure the performance of read, write, create, delete we ran benchmarks similar to those used by Rosenblum and Ousterhout to evaluate LFS [28].In addition to creates, deletes, reads, and writes, these benchmarks also heavily test attribute and access control list operations, because NFS relies upon them heavily.The first benchmark measures small file performance; it consists of three phases: creation of 10,000 1KB files (split across 10 directories), reads of the newly created files in creation order, and deletion of the files in creation order.The second benchmark measures large file performance; it consists of four phases: sequential write, sequential read, random write, and random read.This benchmark either reads or writes a 100MB file in 4KB blocks.For each of these benchmarks, the cache was flushed between each phase.
Figure 7 shows the results of the small file benchmark.Overall, S4/NFS performance is similar to BSD/NFS.S4 performs slightly better on creates and deletes, due to a decreased number of disk I/Os.Since S4 is log-structured, only one disk write is needed to create the file and add the directory entry; FFS needs at least two.(Recall that delayed writes cannot be used without NVRAM, because NFS semantics require immediate persistence.) Reads are slower than the BSD UDP and TCP configurations, due to an artifact of S4 client directory management.
Figure 8 shows the results of the large file benchmark.Again, S4/NFS performance is similar to BSD/NFS.S4 performs somewhat better in all cases except for the read sequential case.Random and sequential writes are better due to the log structure of the drive.Random

Macro-benchmark results
To evaluate performance for more realistic workloads, we present results from two macrobenchmarks: the Postmark benchmark [14] and the SSH-build benchmark [37].These benchmarks crudely represent Internet server and software development workloads.
Postmark was designed to measure the performance of a file system used for electronic mail, netnews, and web based services.It does this by creating a large number of small randomly-sized files (between 512B and 9KB for our tests), on which a specified number of transactions are performed.Each transaction consists of two sub-transact ions, with one being a create or delete and the other being a read or append.The transaction types are chosen randomly with consideration given to user definable weights.Our configuration consists of 20,000 transactions on 5,000 files, with a file size of between 512B and 9KB.The biases for transaction type are set equal.

Write
Read Sequential

Write
Read Random The results of the postmark benchmark are shown in Figure 9.Each bar shows the total running time in seconds split between initial file creation time (creating initial 5000 files) and the time to complete all transactions.S4/NFS outperforms the other configurations because of its superior performance for non-sequential small file operations.
The SSH-build benchmark was constructed as a replacement for the Andrew file system benchmark [12].It consists of 3 phases: The unpack phase, which unpacks the compressed tar archive of SSH vl.2.27.(SSH is approximately 1MB in size before decompression) This phase stresses metadata operations on files of varying sizes.The configure phase consists of the automatic generation of header files and Makefiles, which involves building various small programs that check the existing system configuration.The build phase compiles, links, and removes temporary files.This last phase is the most CPU intensive, but it also generates a large number of object files and a few executables.
The times of SSH-build ; s three phases are shown in Figure 10.Performance is similar across the 3 configurations.S4 outperforms the BSD configurations in all phases due to the relative performance advantages of creates, deletes, and writes.

Capacity requirements
To evaluate the capacity required to maintain a week-long history pool, we examine data from three recent workload studies (see Table 2).The AFS trace study [32] reports 143MB per day of write traffic.Even if we pessimistically increase this number to 1GB per day, as was observed in Vogels' Windows NT file usage study [35], it is not unreasonable to believe that seven days worth of data could be kept on a modern 50GB disk.The Elephant paper [29] 250 200 presents a data rate of 110MB written per day, again over a week of changes could easily be kept on a modern disk.

S4-NFS BSD-NFS UDP BSD-NFS TCP
Much work has been done in evaluating the efficiency of differencing and compression [2,3,4].To briefly explore the potential benefits for S4, we retrieved the code base for S4 from the CVS repository at a single point each day for a week.After compiling the code, both differencing and combined differencing + compression were applied between each tree and its direct neighbor in time using xdelta [19,20].The total sizes of all seven trees, the object files of all the trees, and the source files of all the trees were then compared to their respective differenced sizes.For all versions of all files, 250MB of storage is needed.This number drops to 80MB with differencing and 50MB with differencing + compression.

Discussion
This section discusses several important implications of self-securing storage.
Selective versioning: There are data that users would prefer not to have backed up at all.The common approach to this is to store them in directories known to be skipped by the backup system.Since one of the goals of S4 is to allow recovery of exploit tools, it does not support designating objects as non-versioned.A system may be configured with non-S4 partitions to support selective versioning.While this would provide a way to prevent versioning of temporary files and other non-critical data, it would also create a location where an intruder could store exploit tools without fear that they will be recovered.
Versioning file systems vs. self-securing storage: Versioning file systems excel at providing users with a safety net for recovery from accidents.They can maintain old file versions long after they would be reclaimed by the S4 system, but provide little additional system security.This is because they rely on the host's OS for security and aggressively prune apparently insignificant versions.By combining self-securing storage with a versioning file system, recovery from users' accidents could be well-supported while also maintaining the benefits of intrusion survival.

Client-side cache effects:
In order to improve efficiency, most client systems use caches minimize storage latencies.This is at odds with the desire to to have the device audit users' accesses and capture exploit tools.Client-side read caches hide data dependency information that would otherwise be available to the drive in the form of reads followed quickly by writes.
However, this information could be provided by client systems as (questionable) hints during writes.Write caches cause a more serious problem when files are created then quickly deleted, thus never being sent to the drive.This could cause difficulties capturing exploit tools since they may never be written to the drive.While the client cache effects may obscure some of the activity in the system, users' data that is stored on the device is still completely protected.
Object-based vs. block-based storage: Attempting to implement a self-securing storage device with a block device presents several problems.Since objects are designed to contain one data item (file or directory), enforcing access control at this level is more manageable than attempting to properly assign permissions on a per-block basis.In addition, maintaining versions of objects as a whole, rather than having to collect and correlate individual blocks, simplifies recovery tools and internal reorganization mechanisms, like the anti-entropy cache.Still, although some of S4's benefits would be lost, we see no roadblock to self-securing block-based storage.
Multi-device coordination: Multi-device coordination is necessary for operations such as striping data or implementing RAID on multiple disks.In addition to the normal coordination that is necessary to make sure that the multiple copies of data are synchronized, recovery operations must also coordinate old versions to make sure that the objects are consistent when they are recovered from the history pool.On the other hand, clusters of self-securing storage devices could maintain a single history pool and balance the load of versioning objects.

Related Work
Self-securing storage and S4 build on many ideas from previous work.Perhaps the clearest example is versioning: many versioned file systems have helped their users to recover from mistakes [22,10].Santry, et.al, provides a good discussion of techniques for traversing versions and deciding what to retain [29].S4's history pool corresponds to Elephant's "keep all" policy (during its time window), and it uses Elephant's time-based access.The largest advantage of S4 over previous versioning systems is that it has been partitioned from the operating system.While this creates another level of indirection, it adds to the survivability of the storage.
S4's device-embedded storage management is another instance of many recent "smart disk" systems [1,7,15,27,36].All of these exploit the increasing computation power of such devices.Some also put these devices on networks and exploit an object-based interface.
There is now an ANSI X3T10 (SCSI) working group looking to create a new standard for object-based storage devices.The S4 interface is similar to these.
The standard method of intrustion recovery is to keep a periodic backup of files on trusted storage.Several file systems simplify and extend this process by allowing a snapshot

Section 5
evaluates the S4 implementation.Section 6 discusses a number of open issues related to self-securing storage.Section 7 discusses related work.Section 8 summarizes this paper's contributions.

Figure 2 :
Figure 2: Disk layout overview.Shows the allocation group summary segment and the object map.

Figure 5 :
Figure 5: Anti-entropy segment.Holds the anti-entropy caches for various objects.

Table 1 : S4 Remote Procedure Call List
. Note that all modifications create new versions without effecting the previous version.
Early experiments with such a conventional versioning system showed that writing a large file could cause up to a 4x growth in disk usage.Conventional versioning file systems avoid this performance problem by only creating new versions when a file is closed.S4's journal-based approach significantly reduces these problems.By persistently keeping journal entries of all metadata changes, metadata writes can be safely delayed, since onodes and indirect blocks can be recreated in the event of a failure.To avoid rebuilding an object's current state from the journal during normal operation, an object's onode and ACL table are committed to the log before the object is evicted from the cache.For the same reasons, indirect block writes can also be delayed until this time, since the journal entries contain sufficient data to recreate them.
Because clients are not trusted to notify S4 when objects are closed, every update requires a new version and thus a new onode.Further, when data pointed to by indirect blocks are modified, the indirect blocks must be versioned as well.For triple-indirect blocks, a single-block update could require 4 new blocks and a new onode.S4 uses six kinds of journal entries: attribute, ACL, delete, truncate, write, and checkpoint.Each of these entries represent a specific metadata change with the exception of checkpoint.Checkpointing denotes writing a consistent copy of the onode, ACL table, Ultra2 SCSI controller, and an Intel Etherexpress ProlOO 100Mb net-work adapter.The client and server are on the same subnet and are connected by a 100Mb network switch.All versions of Linux used an unmodified 2.2.14 kernel, and all BSD systems used a stock FreeBSD 4.0 installation.

Table 2 :
Space usage survey.This table shows the expected detection window that could be provided by utilizing 10GB of a modern disk drive, assuming no differencing or compression.This conservative history pool would consume only 20% of a 50GB disk's total capacity.