BiTDB: Constructing A Built-in TEE Secure Database for Embedded Systems

In this paper, we propose BiTDB, a built-in Trusted Execution Environment (TEE) database for embedded systems, to realize higher system availability while ensuring data confidentiality. With BiTDB, dilemmas that the state-of-the-art research work on secure embedded databases has to face can be significantly reduced and eliminated, including (i) complicated research and realization on searchable encryption algorithms (SEA), (ii) limited support to all database operations, and (iii) almost none of specific design and optimizations toward build-in TEE embedded databases. Through BiTDB, all database operations can process plaintext in TEE instead of retrieving ciphertext by developing complicated SEAs. To enable BiTDB to handle database files in Rich Execution Environment (REE) as local ones, we extend the TEE OS with generic file I/O libraries. Then, we contribute three critical optimizations to significantly reduce redundant memory and file operations between TEE and REE, and BiTDB achieve better system performance and availability in embedded systems. Finally, we have implemented the prototype system based on OP-TEE and SQLite for several typical platforms, including virtualization and hardware environments. The TPC-H test shows BiTDB can achieve 85% (on average) of the original database performance while guaranteeing data confidentiality and integrity.

embedded system application.In other words, the application with an embedded database can directly handle the data via Data Manipulation Language (DML) without extra database services.The embedded database usually stores all the data and metadata in a file (or a few files) for facilitating data management and cross-platform migration.For example, SQLite is a popular single-file-based database, which can be presented as an amalgamated library including only two files sqlite3.hand sqlite3.c,and conveniently included in user applications to perform rich database operations, such as CREATE, INSERT, SELECT, UPDATE, and DELETE.
However, for availability and efficiency considerations, all the data in embedded databases is processed and stored in plaintext, which can be easily leaked to malicious users or malware who have successfully compromised the system.This can cause privacy leaks if the database contains sensitive data.For example, a smartphone can collect our daily personal data, such as the number of steps, places visited, home and business location, phone usage habits, call history, message history, contact list, and payment history.Once an attacker has successfully compromised the system, it can read the privacy data from the database file or crack into the memory to obtain the data being processed.
To address the database security issues in embedded systems, plenty of related research mainly concentrates on ciphertext retrieval and storage to achieve data confidentiality and integrity.As the representative of secure databases, CryptDB [1] and similar systems [2] implement ciphertext-oriented data storage and operations via an encryption proxy with various searchable encryption algorithms (SEA), such as Deterministic Encryption (DET) [3] and Order-Preserving Encryption (OPE) [4] (please refer to Section VII for a detailed discussion on typical types of secure database schemes).However, developing varieties of SEAs to satisfy and cover the increasingly complicated application requirements and database operations becomes an endless and almost impossible task.Due to limited computing resources, deploying plenty of SEAs for databases on embedded systems can also significantly increase the computational burden.This will violate the real-time requirements and even make the system unavailable.Moreover, in some recent work, Trusted Execution Environment (TEE) is also adopted to protect critical database components [5], [6], [7], [8], [9], such as encryption proxy, from threats of system vulnerabilities and attackers.Unfortunately, due to the lack of specific design and optimizations, these TEE-assisted schemes do not have high performance or usability in embedded systems.
• Our Contributions: To satisfy increasingly complicated application scenarios and ensure the security requirements of data operations in embedded systems, we propose BiTDB (Built-in TEE Database) to construct a secure embedded database.In BiTDB, we deeply explore the potentials of the built-in TEE database scheme.Considering the real-time and availability requirements of embedded systems, we propose critical ingenious design and optimizations to endow the migrated database with capacities that can reduce redundant world switches.Furthermore, due to the plaintext processing in TEE, BiTDB can cover all the database operations without developing complicated SEAs.Finally, We have implemented the prototype system by orchestrating OP-TEE (the most prevalent open-source TEE project) with SQLite (one of the most popular lightweight and platform-crossing databases) to prove the feasibility, efficiency, and availability of our design.The main contributions of this paper are listed as follows: r We have proposed a built-in TEE database scheme for embedded systems such that all trusted applications (TA) within TEE can manage data by integrating database libraries into their code rather than relying on the extra database system.Thus, with the isolated environment provided by TEE, data can be securely manipulated in plaintext, and all the database operations can be covered without developing complicated SEAs.
r To enable the in-TEE database to directly handle database files in Rich Execution Environment (REE), we have extended the TEE operating system (OS) with generic file I/O interfaces such that file system calls within database libraries can manipulate the external files as local ones.
r To achieve better performance and availability for the in-TEE database, we have proposed three primary optimizations for cross-REE-and-TEE interactions, including One Allocation Multiple Use (OAMU), Combination of Related File Operations (CRFO) and In-Shared-Memory Page Cache (ISMPC).With these optimizations, redundant memory operations can be mitigated, and the performance imbalance between the external storage and TEE memory can be significantly made up.
r Based on OP-TEE and SQLite, our prototype system has been implemented for four typical embedded platforms, including hardware and virtualization environments, to prove the generality, feasibility, and availability of BiTDB.This paper is organized as follows.A brief introduction to embedded databases and TEE is given in Section II.In Section III, we clarify the threat model and security assumptions.After that, we detail the design of BiTDB in Section IV.Based on our design, the essentials of implementing the key components and functions in BiTDB are detailed in Section V. Section VI first introduces the experimental setup.Then, the system performance evaluations are presented and discussed, followed by a security analysis of our scheme.In Section VII, we discuss the typical types of secure database schemes and the related research work.Finally, we conclude the paper in Section VIII.

II. PRELIMINARY KNOWLEDGE
1) Embedded DBMS.The Embedded Database Management System (Embedded DBMS), which is referred to as the embedded database, means a database that is specifically designed to be deeply integrated in embedded systems.An embedded database is built into the application as a library instead of coming as standalone software, such as an extra database server.Thus, embedded applications with database libraries have inherent capacities to handle data by using DML, such as SQL.Furthermore, due to highly restricted computational resources, an embedded database must be small and optimized to achieve slight influence on the system resources and performance (e.g., small footprint in memory and high execution efficiency).
To satisfy various application requirements, developers have proposed plenty of embedded database products, such as SQLite [10], Berkeley DB [11], Mongo Realm [12], and LMDB [13], in which the relational, non-SQL, and key-value store databases are included.SQLite (one of the most popular lightweight relational embedded databases) is adopted as the object for our research and implementation.This can be a reference for designing and implementing other similar systems without loss of generality.
2) TEE.TEE establishes an isolated world that runs in parallel with a general-purpose operating system (GPOS).Usually, the isolated world is referred to as the secure OS or the TEE OS, such as Huawei iTrustee [14], Samsung KNOX [15] and open-source OP-TEE [16].While the GPOS is known as the rich OS or the untrusted OS, such as Linux, Windows and Android.TEE aims to protect sensitive data and code against privileged attacks from a potentially compromised rich OS.Currently, ARM TrustZone and Intel SGX are well-known TEE technologies that use hybrid hardware and software mechanisms to protect sensitive assets.
As the most popular TEE in embedded systems, TrustZone divides the system into two independent world: the secure world (where the secure OS is deployed) and the normal world (where the rich OS locates).In order to process sensitive code and data, a user needs to implement a TA in the secure OS, which interacts with the client application (CA) in the rich OS by invoking Secure Monitor Calls (SMC).Due to such a mechanism, the microcontroller can only work in one world; thus, an attacker who has successfully compromised the rich OS and even obtained root privilege cannot steal or tamper with sensitive code and data in TEE.

A. Threat Model
Since the normal world has rich services and interfaces which enlarge the attack surface and lacks sufficient security protection, attackers can compromise the embedded device by utilizing software and hardware vulnerabilities and obtain permission to operate the file system, including reading, writing, and modifying database files.Moreover, an attacker can even obtain a higher privilege (e.g., root permission of the rich OS) to control the execution of the application by accessing the code and data segments of the running process via memory-oriented attacks, such as memory corruption attacks (MCA) [17], [18], [19], [20].We summarize the threats to the embedded database system in two aspects: confidentiality and integrity.
1) Threats to confidentiality.An attacker can read data from the database file in plaintext after obtaining permission to control the file system.Moreover, considering an encrypted database file, although the attacker cannot obtain plaintext data from the file, it can access the memory space to obtain the plaintext data during decryption.Hence, we focus on the following two attacks.
Drag Attcak is a malicious behavior that scans database files and exports their data.This attack mainly targets the vulnerability of plain-text stored database files.
SQL Injection Attack means that the application fails to judge the legality of the input data, causing the database system to behave unauthorizedly.
2) Threats to integrity.Once the attacker can handle the file system, besides obtaining data, he can also write and modify data in the database file, and this will damage data integrity.For encrypted database files, the attacker can also tamper with the file, such as modifying the unencrypted part of the database file and corrupting the encrypted data.
What's more, the completeness of data query results is also an issue we need to consider.When service providers are dishonest, the query results returned may be incomplete or excessive, which is also unacceptable to users.

B. Assumptions
First, we assume that attacks on the rich OS cannot be avoided.Attackers can even gain administrator privileges, such as reading data or files directly from memory or external storage.Second, attackers mainly aim to steal or tamper with data rather than damage or paralyze the entire system.Moreover, we assume that TEE is trusted and tamper-proof, including its inner software and the peripherals connected to it.Finally, We do not consider TEE side-channel attacks since these vulnerabilities are platform-dependent [21], which can be avoided by adopting more secure TEEs as we need, such as an FPGA-based TEE [22].

IV. SYSTEM DESIGN
This section first presents the design goals based on security, efficiency, and availability requirements.Then, the system architecture and workflow for handling fundamental database operations are introduced.Finally, we discuss how to achieve data confidentiality and integrity in our proposed system, as well as the design of key management.

A. Design Goals
G1 Data Security.Confidentiality needs to be protected during data input/output, processing, and storage so that sensitive information cannot be leaked to attackers.On the other hand, integrity is also necessary for data storage, preventing attackers from tampering with it.
G2 Performance Requirement.BiTDB should run as fast as the original database does in REE (before the migration) to guarantee the system's availability.G3 Small Trusted Computing Base (TCB).A small TCB is necessary for both auditability and maintainability.To this end, BiTDB can only introduce essential libraries and components into TEE to ensure the fundamental functions of the database.
G4 Small Memory Footprint.Since the TEE of an embedded system can only access limited memory resources compared to REE, the designed database needs to occupy as little space as possible in the entire system memory.
G5 Design Generality.The proposed design and implementation should be easily adapted to most TEE-supported platforms without significant modifications to the original systems.
In the following, how the goals G1 ∼ G5 are achieved will be given in the demonstration of the system design.

B. An Overview of BiTDB Architecture
As mentioned in the main contributions in Section I, to realize a built-in TEE database, we first extend the secure OS with a generic file I/O library for handling database files from the TEE side.Second, we supplement the original database with data security operations, such as encryption/decryption and integrity protection functions.Finally, we reorganize the original database code with the newly added components and migrate it to TrustZone as a built-in library that TA can call.
Fig. 1 shows the architecture overview of BiTDB.Since the database file (DBF) is stored in the untrusted storage in REE, the generic file I/O is implemented on both REE and TEE sides, tee_fs_syscall library as the system call implementation for the trusted OS kernel and teec_fs_io library as the generic file I/O extension for tee_supplicant, which is an agent running in REE that finishes operations on the rich OS for TEE and also known as the TEE client (TC).BiTDB library is the reorganized database code migrated to the secure OS with an encryption/decryption module in it.Thus, a TA can utilize the database functions by including BiTDB library during its development.In order to simplify the key management of BiTDB, we do not maintain the data encryption key (DEK); instead, the DEK is generated from the Hardware Unique Key (HUK, a read-only key burnt into the hardware by the manufacturer) during each system bootstrap.
When developing a CA with a secure database in REE, a TA with BiTDB should be implemented in TEE to realize secure database manipulations under the control of that CA.We will discuss the workflow on how the BiTDB achieves secure fundamental data operations and storage in Section IV-C.
According to BiTDB architecture, the main challenges of our design are to pave the way for the migrated database to handle the external database files as local ones and achieve as close as possible to its pre-migration performance.Hence, database migration, generic built-in TEE file I/O interfaces, and file I/O performance optimization are key points to realizing such a built-in TEE system, which can also be generalized to other embedded databases and platforms.Even more, for some SQLite-like database systems, our database and secure OS extensions can be directly used with minor code changes in these systems' implementation.Therefore, BiTDB satisfies our design goal G5.

C. System Workflow
INSERT: There are mainly two sources to generate new data to be inserted into the database: peripherals and user input.For the former, a CA invokes a TA to collect data from the secure peripherals directly connected to TEE.For example, in a temperature sensor, a CA periodically invokes the TA to collect the real-time temperature data and insert the data into the database.For the latter, secure user input can be achieved by adopting secure input interfaces (SII) implemented in TEE, such as Schrodintext [23].Note that though SII is presented to the user, it runs in a secure OS, and an attacker cannot compromise it to obtain plaintext data.
SELECT: In order to achieve secure data retrieval, a CA invokes the TA to generate and execute SELECT statements via sending predefined REE-side database operation commands (RDOC, proposed in Section IV-B of literature [5]).RDOCs are a set of commands composed of a few keywords and symbols, and each command is mapped to a specific SQL statement.For example, we can define an RDOC SELECT 100 to map to a SQL statement SELECT * FROM t LIMIT 100.Thus, when a CA emits an RDOC to a TA, the command will be restored to a regular SQL statement in TEE according to the RDOC mapping policy.Since an RDOC does not expose sensitive information about the database, such as table structure and sensitive data, an attacker cannot obtain any valuable information from it.
UPDATE: This operation can be considered as a combination of SELECT and INSERT.When a TA executes the UPDATE operation, it first retrieves the matched results from the database according to the query conditions.After modifying the requested columns' values, the new data will be re-inserted into the table.
DELETE: This operation can be considered as deleting data after finishing a conditional query.

D. Data Security
This section details the data confidentiality and integrity design under BiTDB architecture, which satisfies the design goal G1.Fig. 2 shows how data security is achieved.The database engine and cryptographical operations (sensitive operations) reside in TEE, which is presented in pink rectangles.In contrast, the light blue framed areas contain the components located in REE, which face security threats.In our design, page is the fundamental unit in encryption and integrity verification since page is usually the minimal data management unit in an embedded database file, such as SQLite and BerkeleyDB.The following will detail and discuss how data confidentiality and integrity are achieved.
1) Data Confidentiality: As mentioned in Section IV-B, to avoid relying on developing various SEAs for complicated query statements to ensure security while handling data, we have migrated the database engine into the TEE to support direct data operations in plaintext.Besides, we also need to guarantee the confidentiality and integrity of data outside the secure OS.As shown in Fig. 2, when the database engine executes queries, the pages containing the results will be read from the ciphered database file into the Page Cache of shared memory between REE and TEE, which maintains the frequently used pages.With this cache, BiTDB can directly obtain the requested pages to the greatest extent from memory instead of locating them in the database file.Such a design can effectively reduce the performance overhead caused by frequent file I/O calls, and this design satisfies our design goal G2 and will be discussed detailedly in Section V-D.Then, these pages will be fetched into TEE for decryption and integrity verification (IV).The cache will maintain page duplicates based on the cache policy.To manage data integrity information, we have designed a Hash Searching Table (HST) in TEE, which will be detailed in the following sub-section.Finally, the database engine can directly handle the plaintext data.For example, according to Fig. 2, cpage 3 (the prefix c indicates the page is in ciphertext) is first read into the Page Cache.Second, cpage 3 will be transferred to TEE for decryption and IV.In this case, cpage 3 is restored to ppage 3 (the plaintext page), which can be manipulated by the database engine as it is in REE.
For INSERT/UPDATE operations, the related pages will be re-encrypted, and their integrity information will also be updated.Then, if there exist duplicates corresponding to the newly encrypted pages in the Page Cache, they will be replaced with new ones.Finally, the up-to-date ciphered pages are written to the database file.For example, in Fig. 2, the operations on ppage m and cpage m illustrate such a scenario.
Note that, according to the procedure described above, each page is processed in ciphertext in REE so that the sensitive data will not be leaked in such an untrusted normal world.Whereas, with TEE, the plaintext pages are isolated from the threats in REE.Hence, data confidentiality in our design can be achieved.
2) Data Integrity: To realize secure and efficient page integrity management, an HST is introduced in TEE, which manages all the hash values of the database pages.As shown in Fig. 2, the HST is an in-TEE vector that supports element access via page index, which will be encrypted and stored in REE when the device is halted (f HST ).For example, supposing a page with index k, the corresponding index in the HST can be determined by h k = I HST (k) (function I HST (x) maps x to the index in the HST).In this way, the kth page's hash value can be accessed.The time complexity during this process yields O(1), satisfying our design goal G2.
When a page is read from the database file, it will be decrypted in the TEE memory, and its hash value will be re-computed to compare with the standard value preserved in the HST to determine whether this page has been tampered with.For example, in Fig. 2, the hash value of page 3 is re-computed by H(Dec(cpage 3 )) and compared with the standard value h 3 in the HST.Note that H(x) and Dec(y) denote hash and decryption functions, respectively.In contrast, when there are modifications to the database, such as INSERT and UPDATE operations, the related pages' hash values will be re-computed and written to the HST.E.g., updating the HST with newly hashed h m = H(ppage m ).Moreover, since the HST is maintained in TEE and stored in ciphertext, it is difficult for an attacker to tamper with and forge it.
Note that according to our design goal G4, the components proposed for data security and operation performance, such as Page Cache and HST, only take a small portion of RAM compared to the entire memory.Based on our prototype system, the qualified memory occupation of the HST will be given in Section V-C.

E. Key Management
To simplify our discussion on key management of BiTDB, we concentrate on the full-lifecycle management of the DEK used during encrypting the database pages and the HST.
Key Generation.According to Fig. 1, the HUK, solidified in the hardware, is adopted as the root key to generate the DEK.During the TA initialization, the platform-dependent information (PDI), such as CPU ID, motherboard serial number, and the OS version, will be collected and combined with the HUK to generate the DEK.Equation (1) gives how the DEK is computed, in which HM AC denotes the function of Hash-based Message Authentication Code and Hash indicates the selected hash algorithm.

DEK = HM AC Hash (HU K||P DI).
(1) Thus, the DEK becomes platform-dependent, which will be invalid once the PDI has been modified unauthorizedly.Note that the HUK can only be accessed in TEE, which is isolated from the threats in REE.
Key Utilization.During the system running, the DEK will be preserved in TEE and utilized by the TA during the encryption/decryption of database pages and the HST.
Key Update.For the encrypted database, updating the DEK is almost an impractical task for the system since all the encrypted pages have to be restored to plaintext and re-encrypted with the new key.Such a procedure can bring a non-negligible computational burden to the hardware and affect the availability.Note that if there are PDI changes involved in normal system upgrades, the DEK must be updated with the new PDI.In this case, the old key will not be destroyed until all database pages are re-encrypted with the new key.
Key Storage.To ensure the DEK is tightly bound to the platform and not be reused by attackers, it must be generated during every bootstrap rather than be stored in external storage.
Key Destruction.Once the DEK has been deprecated and destroyed, all the encrypted data cannot be restored anymore.Moreover, as mentioned in Key Update, updating the DEK can impose a large computational burden on the system; hence, key destruction is not under consideration in our design.
Finally, since the key management only relies on the HUK solidified in the chip without deploying other software/hardware peripherals or occupying extra storage for storing keys, such a design satisfies our design goal G3.

V. ESSENTIALS IN IMPLEMENTING BITDB
According to the design of BiTDB, without loss of generality, we have implemented the prototype system based on OP-TEE and SQLite to prove the feasibility of our design.In this section, we present detailed discussions on the implementation of several significant system components and performance optimization approaches, including generic file I/O for TEE, data confidentiality & integrity module, and the optimizations on memory utilization & file operation.

A. Generic File I/O for TEE
As per our design in Section IV-B, in order to enable BiTDB handle the external database files as the local ones, the secure OS must provide generic file system calls for the database engine to finish the fundamental file operations on the database file.However, OP-TEE (the selected secure OS) only provides simple encrypted write and decrypted read operations rather than complete file system calls, such as open, creat, close, lseek, stat, access and link/unlink, which are vital to maintaining the database file.Therefore, in our implementation, we have extended OP-TEE with a generic file I/O module composed of a complete set of standard file system calls.This module mainly contains two parts, tee_fs_io built-in the secure OS and teec_fs_io located in REE-side tee_supplicant.
r tee_fs_io: TEE-side file I/O interfaces that do not imple- ment actual functionalities but provide standard file system call interfaces for the upper applications (TAs).Thus, the migrated database libraries can call these interfaces directly without modifying their file access code.r teec_fs_io: Proxy interfaces built-in tee_supplicant (the agent of OP-TEE), which only transfer file operation requests with data from tee_fs_io to the rich OS inherent file I/O system calls.To illustrate the calling relationship among the system inherent libraries and the proposed generic file I/O system calls in detail, Fig. 3 shows a calling chain generated by invoking fstat() from the database (the platform-independent function osFstat).The functions on the pink background belong to tee_fs_io interfaces, whereas those on the green background pertain to teec_fs_io interfaces.Also, the levels that each code file belongs to are given, such as TA, OP-TEE OS libutee, and OP-TEE OS kernel.
As mentioned above, in TEE, the file system call interfaces are only defined in the libraries of different levels in the secure OS.For example, tzvfs_ifce.cprovides file I/O interfaces for the in-TA database engine library, BiTDB.c.These interfaces are further registered in the assembly code, utee_syscall_a64.S, which declares the file operations used in REE and will be called by the interfaces defined in tee_tzvfs.c,the code in the TEE OS core level.Additionally, other components in TEE only pass the file operation requests to the lower-level ones till these requests arrive at the GPOS via SMC.Thus, SMC can invoke the tee_supplicant service of REE by Remote Procedure Calls (RPC).Finally, tee_supplicant finishes the file operation by invoking ree_tzvfs_fstat(), where the inherent system call of the rich OS (fstat) is called.

B. Data Encryption/Decryption
As mentioned in Section IV-D, since the database file is usually organized in pages, the page becomes the primary encryption unit.For example, in our prototype system, each page size of BiTDB is 4 KB by default, such that each page will be encrypted to a ciphertext page by using symmetric or stream cryptographical algorithms before being sent to REE.Note that due to adapting to various platforms and application requirements, the cryptographical algorithm is designed as a Plug-and-Play (PnP) module to achieve flexible algorithm configuration.In BiTDB, we have adopted AES128/256 and ChaCha20 [24] as our data encryption algorithms.The literature and our experiments show that ChaCha20 is more lightweight, faster than AES [25], and more suitable for embedded systems.Before writing the encrypted pages to the database file, they must be buffered in the In-Shared-Memory Page Cache (ISMPC will be detailed in Section V-D3) to accelerate the following database access from TEE.

C. Data Integrity
According to the design of data integrity protection in Section IV-D2, the HST is proposed to store the digest of all the database pages, which can be used to verify and update the page integrity information rapidly.For the convenience of maintenance and utilization, the HST is implemented as a continuous memory space (denoted as M HST ) that contains m sections.Each section preserves the digest of the corresponding page, whose address relative to the base address of M HST is equal to the page number.Therefore, the address of the digest in the HST can be determined by a given page number p as follow: Addr p denotes page p's digest address in the HST; Addr HST is the address of M HST in TEE memory, and size digest indicates the size of the digest in byte.In our implementation, SHA256 or Poly1305 [25], [26], [27] are adopted as the hash algorithms.SHA256 digest is 256-bit long, whereas Poly1305 is 128-bit long.Table I shows the memory occupation caused by the HST under different database scales.The heap a TA can obtain is about 16 ∼ 32MB, so the HST only occupies a small piece of TEE memory, which meets the requirements of design goals G4.
The HST is initialized when the TA (containing the database engine) accesses the database file for the first time.During the system runtime, the HST is maintained by the TA.When the system is going to be halted, the HST and its digest will be encrypted and stored in the external storage in REE for confidentiality and integrity considerations.In contrast, during the system startup, the TA first reads the HST file into TEE, decrypts it, and verifies its integrity by re-calculating the HST's digest and comparing it with the standard value included in the HST file.Then, the TA allocates TEE memory to maintain the HST structure.Moreover, in order to meet the needs of the growing database, the HST can also be easily extended via functions of memory management, such as bgetr() of BGET library [28].

D. BiTDB Performance Optimization
In order to realize mutual adjustment between TEE (including TEE OS and REE-side agent) and BiTDB, we have optimized the memory utilization and cross-world interaction mechanism to achieve the design goal G2 and G4, which includes three primary optimizations detailed as follows.
1) One-Allocation-Multiple-Use (OAMU): According to the system design (Fig. 1) and Section V-A, to realize direct in-REE file access from TEE, we have extended the secure OS with generic file I/O system calls.However, since TEE and REE need to use shared memory to transfer data mutually to finish such file-system calls, a piece of space in shared memory has to be allocated, which will be released after use.With frequent file-system calls, such memory operations (frequent allocation and release) can produce a non-negligible time overhead.Hence, to prevent the loss of system performance caused by excessive memory operations, as per the single-thread task model for in-TEE SQLite, a piece of shared memory is allocated in the database initialization, which will be used multiply in the following file-system calls (shown as Fig. 4).Such an OAMU policy has reduced the time overhead by 18.08% on average (Section VI-B gives the time overhead comparison of the proposed three optimization measures).
2) Combination of Related File Operations (CRFO): Since BiTDB has to frequently handle the in-REE database file when executing SQL commands, a large amount of file-system calls can be invoked between REE and TEE, which can bring a surge of world switches and significantly affect the system performance.To address this issue, we have combined the related file operations to mitigate such redundant calls.For example, there must be a lseek() call (used to set the file pointer offset) before executing read() or write() calls; hence, lseek() is related to read() and write() and can be combined with them in one interaction.Fig. 5 shows such a scenario.Therefore, for every TEE-to-REE data transfer, we encapsulate the related operations with their data into one package and send it to the shared memory of REE.Thus, the agent (e.g., tee_supplicant) can finish these file operations at one time, and only one world switch can be invoked.In Section VI-B, Fig.  shows the performance improvement after this optimization is about 8.85%.
3) In-Shared-Memory Page Cache (ISMPC): In the two optimizations above, although the system performance has been distinctly improved by mitigating unnecessary memory operations and world switches.However, locating and reading/writing the queried pages in the database file can significantly increase the time overhead due to low-speed file I/O operations on the external storage.To address the imbalanced running speed between the in-memory database and the external file system, we have implemented a Page Cache in shared memory to maintain the most used pages during database operation temporarily.Thus, when dealing with SQL commands, the database engine can first request the pages to the cache rather than the database file to fetch the requested pages rapidly.As shown in Fig. 6, the Page Cache is composed of two parts: in-shared-memory cache located in REE and in-TEE Page Cache Manager (PCM).The cache is a memory space in a fixed size (this can be re-configured according to the system capacity and application requirements), which is maintained by the PCM based on some pre-defined policies, such as Least Recently Used (LRU) and Most Recently Used (MRU).
In Listing 1, structure _shared_mem_cache is used to preserve an entire database page with a fixed size PAGE_SIZE (e.g., in SQLite, the page size is 4 KB by default), whereas the array shared_mem_cache is used to represent the Page Cache in shared memory, which has MAX_CACHE_NUM cache slots.Moreover, array page_location records each database page's offset in the Page Cache (ALL_PAGE_NUM indicates the total number of pages in the database); hence, if page_location[i] is an invalid value (e.g., -1), it implies that the related page i is not cached yet.Thus, the element in page_location with a valid offset can determine the cached page's address (Addr cp ) in shared_mem_cache via (3) (i indicate the page index in database file).Such a calculation process has only O(1) time complexity.
When the database engine handles queries, the PCM will be commanded to check whether there exist pages in the cache that contain the result records.If so, these pages (in ciphertext) will be directly read into TEE for decryption and further processing.If no matched page is found in the cache, it will be fetched from the database file into shared memory to update the cache and transferred to the TA with BiTDB.Thus, if the database engine often executes several fixed SQL commands, the corresponding Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE II CONFIGURATION OF THE SELECTED PLATFORMS FOR EXPERIMENTS
pages can be available in the cache with a high probability.Moreover, although the database engine usually has an inherent cache for performance considerations, the limited in-TEE memory constrains the capacity of the in-database cache.However, the in-shared-memory cache can make up for this shortcoming and avoid redundant, low-speed file system operations, thus significantly improving the performance of BiTDB.In Section VI-B, Fig. 7 shows the performance improvement with the introduction of the Page Cache is around 6.61%.

A. Experiment Setup
We have implemented BiTDB based on SQLite 3.39.4 and adopted OP-TEE 3.4.0as the secure OS running in TrustZone.To prove the feasibility and availability of BiTDB, we adopt several typical embedded platforms, including hardware and virtualization environments, to conduct the performance evaluations.Table II presents the configuration of the selected platforms used in our experiments.
Our experiments mainly include performance evaluations on BiTDB in three aspects.First, we use the TPC-H test to illustrate the effectiveness of our optimizations in adapting BiTDB to the secure OS.Second, we conduct the TPC-H test with BiTDB on four typical platforms to prove the feasibility, generality, and availability of our system in these TEE-supported environments.Finally, we discussed the performance of BiTDB in terms of security.

B. Performance Evaluations
In this part, we first evaluate BiTDB on Raspberry Pi 3B board to show the effectiveness of the three optimizations on the file I/O performance between TEE and REE.Then, we adopt four typical platforms to conduct performance evaluations on BiTDB to prove the generality, feasibility, and availability of our design (note that all the optimizations are combined in this experiment).
1) Performance Comparison Among Three Optimizations: Fig. 7 shows a performance comparison among the three optimizations on a TPC-H test database file in 61.7 MB.In this figure, according to the queries in the TPC-H test, the x-axis is divided into 22 groups, each of which contains five bars that indicate the time consumption of executing the query in different circumstances, including REE (baseline), TEE (baseline, no optimization), TEE-O (only OAMU optimization), TEE-OC (OAMU+CRFO optimization), and TEE-OCI (OAMU+CRFO+ISMPC optimization).Moreover, the polylines show the ratio of the time consumption brought by BiTDB executing queries in REE to that in TEE under different optimizations, which reflects the performance closeness of the optimized BiTDB in TEE compared to DB REE .Note that we abbreviate the original database (SQLite) running in REE as DB REE to facilitate our discussion.T REE , T T EE and T T EE−x denote the time consumption of finishing a query by the database in REE, TEE, and TEE with the x optimization.If T REE/ T T EE−x < 1.0, BiTDB with x optimization performs less efficiently than DB REE , whereas T REE /T T EE−x > 1.0 indicates that BiTDB performs more efficiently than DB REE .T REE /T T EE−x ≈ 1.0 means BiTDB performs nearly the same as DB REE .
Let's take qr 11 , qr 16 , and qr 17 as examples to demonstrate the performance improvement to the in-TEE database.We list the related indicators in Table III, including the total number of database pages (NP DB ), the number of pages each query processes in the experiments (NP proc ), and  values of executing the three queries.Although our optimizations minimize redundant world switches, these optimizations are still limited in some queries that involve a large number of database page operations.Therefore, we use the number of pages processed by a query to illustrate the problem.In addition, we use (4) to denote the performance improvement (P I OCI ) For qr 11 , the baseline value (T REE /T T EE ) is 66.13%, which is improved to be 109.9%after applying three optimizations (P I OCI = 66.19%).T REE /T T EE−OCI > 1 indicates that the in-TEE database performs better than DB REE .It also implies that query qr11 contains plenty of pages and file operations that can be significantly improved (about 2873 pages have been processed), and almost all the TEE resources can be utilized by the BiTDB rather than being shared by multiple processes in REE.In contrast to qr 11 and qr 17 , qr 16 processed the fewest pages (about 466 pages); hence, fewer file I/O and page encryption is involved during the query execution, and BiTDB can concentrate on data operations with memory rather than frequently handling pages between REE and TEE.Therefore, in this query, BiTDB distinctly performs better than DB REE at the beginning (T REE /T T EE = 130.1%).Moreover, with the optimizations OAMU and CROF, T REE /T T EE−OC becomes 154.73%, which increases about 18.93% compared to the baseline (T REE /T T EE = 130.1%).Hence, for the fewer pages scenario, the file I/O optimization cannot make significant performance improvements (even no improvement when applying ISMPC) to qr 16 compared to qr 11 and qr 17 .Finally, due to the largest number of pages involved in qr 17 (about 1,950,940 pages are processed, and one page can be accessed repeatedly), the Fig. 9. Current mainstream approaches to protect data privacy for databases.

TABLE IV LINES OF CODE (LOC) FOR BITDB TCB
three optimizations make about 167.32% performance improvement (from 23.07%∼ 61.67%), which is outstanding among the three queries and proves that the effect of optimizations is closely related to the number of pages processed.
2) Performance Comparison Among Different Platforms: To prove the generality (G5) of BiTDB, besides Raspberry Pi 3B, we also tested the performance of BiTDB on other three popular embedded platforms, including HiKey, Hikey960 and QEMU (ARM-v8), which are detailedly shown in Table II.As illustrated in Fig. 8 , the efficiency of executing TPC-H queries is significantly affected by the performance of the platforms.Note that although the QEMU-based ARM-v8 virtual machine relies on a generic computing platform, it is configured with a dual-core CPU at 1.0 GHz and about 1 GB RAM; hence, it takes the most time to finish each query compared to other platforms.However, Hikey960 has two CPUs (each one has four cores) at 2.4 GHz and 1.8 GHz respectively, and 3 GB RAM; hence, it performs the most outstandingly among these platforms.Moreover, Raspberry Pi 3B and Hikey have almost equivalent hardware configurations, and they perform evenly at each test point.3) TCB and Footprint Evaluations: From Table IV, we can see that in order to implement BiTDB, we added a total of 243,657 LoC to TEE, where sqlite.cand the TEE file system it relies on are the dominant components.In addition, since TrustZone does not natively support the ChaCha20-Poly1305 algorithm, we added the related cryptography libraries to BiTDB.Finally, the TA file size is 507 KB, which meets the design goal G3 when the TA RAM is mostly 16 MB or more, and TA only takes up 3.09% or less of the RAM.
In terms of memory usage, the default cache size of SQLite is 2000 pages, and each page is 4 KB in size.SQLite also requires 1000 KB of memory space to assist in sorting query results.Therefore, as shown in Table V, BiTDB occupies 13,760 KB of TEE memory (including the maximum size of the HST in Table I).According to our tests on four experimental platforms, the entire RAM is 1024 ∼ 3072MB, and BiTDB only takes up 0.44% ∼ 1.31% of it; thus, BiTDB meets the design goal G4.

C. Security Analysis
In this section, we conduct a discussion on how BiTDB can achieve both confidentiality and integrity according to the threats mentioned in Section III.
1) Confidentiality: Based on the assumptions, the attacker can hack into the rich OS and obtain or tamper with data in memory and storage, so we need to prove confidentiality in the fundamental data processing stages, including collection, transferring, computation, display, and storage.
Collection.For data collection, since we connect the data source peripherals, such as GPS, to TEE via the secure driver, an attacker in REE cannot intercept, peek, or modify the collected data.
Transferring.Since our research is stand-alone deviceoriented, we only focus on the data transferring between REE and TEE.To make up for the performance shortcomings of data interaction between REE and TEE, we realize in-sharedmemory Page Cache in REE.However, all the pages transferred from TEE to the Page Cache are in ciphertext so that an attacker cannot obtain any sensitive data in plaintext from shared memory.In addition, to protect the sensitive information about the database when invoking BiTDB from REE, a CA will emit an RDOC [5] to trigger a query execution for the in-TEE database, which avoids the disclosure of database sensitive information caused by SQL statements.
Computation.Here, Computation is a general concept that refers to various data operations.Although all the database operations are in plaintext, they are protected by TEE and isolated from the threats in REE.Therefore, any attackers in REE cannot obtain the data being processed and interfere with the data operations.
Display.To realize secure data display to users, we develop secure device drivers for display peripherals (e.g., a UART LCD screen) connected to TEE.Thus, the plaintext-based input/output operations can be achieved in TEE without being affected by the REE-side attackers.Once the query results need to be displayed to users, the plaintext data will be directly transferred to and shown on the TEE-side display device.
Storage.When BiTDB writes to the database, it will first encrypt the sensitive data and then transfer the corresponding pages to in-shared-memory Page Cache, which will finally be written into database files.Hence, data storage confidentiality can be guaranteed, though data locates in the untrusted normal world.Furthermore, the DEK is generated based on the HUK and the device information that can only be accessed in TEE; undoubtedly, an REE-side attacker has no means to steal the key.
Furthermore, based on these protections, BiTDB can resist Drag Attack and SQL Injection Attack.
against Drag Attack.Since all pages of database files are encrypted by the DEK, even if attackers obtain REE-side database files, they cannot decrypt the data.Also, attackers cannot steal the plaintext data displayed on the TEE-side device.
Defense against SQL Injection Attack.According to the description of the transferring stage, all SQL statements are generated within TEE, so attackers cannot send illegal requests to BiTDB.
2) Integrity: As the discussion above, while protecting the confidentiality of data and its operations, TEE also guarantees its integrity.Here, we mainly discuss how data integrity can be ensured when data is read from/written to database files since files are located in the untrusted normal world, where always faces various security threats.As mentioned in Sections IV-D2 and V-C, we design and implement the HST to maintain the hash values corresponding to all pages of the database file.When BiTDB starts, it will load the encrypted HST into TEE and restore it to TEE memory.During system execution, the HST locates in TEE memory, such that attackers cannot tamper with it.Moreover, any illegal modifications to the HST or database files will result in a mismatch between the newly calculated hash value h page based on the current decrypted page and the corresponding standard value h page in the HST.Before the system halts, the HST needs to be encrypted and stored in REE, so that attackers cannot forge and tamper with it.
Moreover, on the premise of ensuring the integrity of database files, since the database engine of BiTDB is completely protected by TEE, attackers or dishonest service providers cannot tamper with the data processing.Therefore, BiTDB can ensure the correctness and completeness of the retrieved data.

VII. RELATED WORK
In order to solve the data privacy leakage issue in databases, the current mainstream research on securing database systems can be classified into two categories: the schemes without/with TEE, which are shown in Fig. 9

Ciphertext Data Storage (CDS).
With CDS, all sensitive data will be encrypted before being stored in the database file.Thus, an attacker cannot obtain data in plaintext from the database file.However, this approach can only protect data in storage since the encrypted data needs to be restored to the plaintext before use [35].In this case, an attacker can easily obtain plaintext data from memory via memory-oriented attacks, such as MCAs [17], [18], [19], [20].Fig. 9(A) shows such a scheme.The encryption/decryption module (EDM) is a component within the database engine which encrypts/decrypts the sensitive data before being written into (or read from) the storage.Most of the database products support CDS, including MySQL [36], SQL Server [35], [37], Oracle [38], SQLCipher [32], SEE [33], etc.Note that although DBStore [34] claims to be a TrustZonebacked database, the database engine is located in a Genode's security partition (sandbox) rather than a real TEE.In this case, DBStore's engine is not protected by TrustZone; instead, it just uses TrustZone to encrypt/decrypt data securely.Hence, we classify DBStore as a CDS-type database.

Ciphertext Data Computation & Storage (CDCS).
To address the security issue of plaintext-data-processing in memory under the CDS scheme, CDCS (shown as Fig. 9(B)) is proposed in which the ciphered data will be processed directly in memory.Here, Computation is a general concept for various data operations.Hence, to realize ciphertext data computation, some SEAs are proposed, such as DET [3], OPE [4], Privacy-Preserving framework [39], [40] and Homomorphic Encryption (HE) [41].However, these algorithms cannot completely cover infinite varieties of database operations, and some algorithms are inefficient and even unusable for embedded systems, such as Fully Homomorphic Encryption (FHE) [42], [43].Moreover, an attacker can still obtain the plaintext data during the encryption/decryption processes by cracking into the memory.According to the figure, a Proxy is set between the user and database engine, which encrypts the user input data and decrypts the retrieved data.As mentioned above, although data is processed in ciphertext, the proxy becomes a vulnerable point that attackers can compromise via memory-oriented attacks.For instance, CryptDB [1] and Arx [2] use this architecture.
TEE-Assisted Data Computation Protection (TADCP) + CDS.In order to protect the vulnerable proxy, TEE is adopted to provide an isolated runtime environment for the proxy, such that an attacker cannot crack into the memory to obtain or tamper with the plaintext data.ARM TrustZone and Intel SGX are two mainstream TEE solutions.Fig. 9(C) shows the TADCP + CDS scheme.Due to the secure connection between the user and TEE (e.g., Schrodintext [23], a TEE-protected input component), data collection security can be guaranteed.Smaug [5] is a TEE-assisted database implemented by Lu et al.Similar systems include StealthDB [7], FE-in-GaussDB [6], ASD [8], etc.
In-TEE Database (ITD) + CDS.Although the TADCP + CDS scheme can protect the proxy from being compromised by strong adversaries, it still relies on a complete set of SEAs to support various database operations on the ciphered data as CDCS.However, implementing these SEAs can still bring significant difficulties in building and deploying such secure database systems.Not only realizing SEAs to support different SQL predicates (e.g., LIKE, ORDER BY, and GROUP BY) and aggregate functions (e.g., SUM, AVERAGE, and COUNT) is complicated but also these SEAs can cause privacy leakage during database operations, which threats data confidentiality and integrity.Therefore, in the ITD + CDS scheme (Fig. 9(D)), the entire database is migrated into TEE to operate data in plaintext, which gets rid of implementing and deploying various complicated SEAs and can cover 100% database operations.Some existing built-in TEE databases [29], [30], [31] are based on Intel SGX, which is a technology applied to PCs and servers, and has richer memory and system resources.However, due to limited resources in embedded system TEE and the small TCB requirement, ensuring the efficiency and availability of the migrated database becomes a very challenging issue in system design and implementation.

TABLE VII DESIGN GOAL (DG) COMPARISON OF TYPICAL TEE-ASSISTED SECURE DATABASES WITH BITDB
As a comparison of the system features, Table VI evaluates the characteristics of BiTDB and the typical secure databases from the aspects of security, usability, and generality.We can obtain from the table that most security databases are not designed for embedded systems.They use Intel SGX technology, which is mostly applied to PC/server processors.Some cross-platform database projects do not use TEE to ensure security.In terms of encryption granularity, since most column-level encryptionbased databases need to develop extra and complicated SEAs to adapt various query requirements, this can bring a considerable computational burden to embedded systems.According to our evaluation results, as a built-in TEE database based on page-level encryption, BiTDB is more suitable for embedded systems and supports more comprehensive functions.Based on the design goals proposed in Section IV-A, we compare the current representative TEE-assisted secure databases in Table VII.

VIII. CONCLUSION
To eliminate the availability dilemmas that secure embedded databases face, including continuously developing complicated SEAs to satisfy various application requirements, lacking complete database functions, and significant overheads caused by operations across REE and TEE, we propose BiTDB, a built-in TEE database.With BiTDB, all the data and its operations can be processed in TEE in plaintext, and stored in REE in ciphertext.Due to complete plaintext data operations, various complicated SEAs are unnecessary.To enable BiTDB to manipulate database files in REE as local ones, we extend TEE OS with a set of generic file I/O interfaces.Furthermore, to improve the availability, we present three critical optimizations toward database page operations between REE and TEE, so that redundant cross-world switches during database operations can be significantly reduced.In addition, a prototype system on a RPi 3B board, as well as on three other typical platforms, is implemented to prove the feasibility and generality of our design.
The evaluation results indicate that on the basis of ensuring data operation and storage security, BiTDB can perform similarly as the pre-migrated database does.Finally, with the complete implementation and evaluations, BiTDB can be a valuable work to be referred to when realizing other similar systems.In the future, we will prove that BiTDB has less impact on the TEE attack surface.Jianfeng Ma (Member, IEEE) received the BS degree in mathematics from Shaanxi Normal University, China, in 1985, and the MS and PhD degrees in computer software and communications engineering from Xidian University, China, in 1988 and 1995, respectively.Now, he is a professor with the School of Cyber Engineering, Xidian University, China.His current research interests include distributed systems, computer networks, and information and network security.

Fig. 4 .Fig. 5 .
Fig. 4. Schematic diagram of replacing frequent memory operations with maintaining a single memory piece during inter-TA-and-TC data transferring.

Fig. 6 .
Fig. 6.Schematic of the relationship among the Page Cache and the relevant components in REE and TEE.

Yulong
Shen (Member, IEEE) received the BS and MS degrees in computer science and PhD degree in cryptography from Xidian University, Xi'an, China, in 2002, 2005, and 2008, respectively.He is currently a professor with the School of Computer Science and Technology, Xidian University.His research interests include wireless network security and cloud computing security.

TABLE I SIZE
OF THE HST UNDER DIFFERENT DATABASE SCALES

TABLE III VALUES
OF THE KEY INDICATORS RELATED TO THE TPC-H TEST ON QUERIES, qr 11 , qr 16 , AND qr 17

TABLE V THE
SIZE OF TEE MEMORY OCCUPIED BY BITDB (IN KB) . Furthermore, each category contains two primary types, including Ciphertext Data Storage (CDS) and Ciphertext Data Manipulation & Storage (CDMS) for

TABLE VI FEATURE
COMPARISON OF TYPICAL SECURE DATABASES WITH BITDB the schemes without TEE; and TEE-Assisted Data Computation Protection (TADCP) + CDS and In-TEE Database (ITD) + CDS for the schemes with TEE.The following describes each type of secure database scheme in detail.