HyperDbg: Reinventing Hardware-Assisted Debugging

Software analysis, debugging, and reverse engineering have a crucial impact in today's software industry. Efficient and stealthy debuggers are especially relevant for malware analysis. However, existing debugging platforms fail to address a transparent, effective, and high-performance low-level debugger due to their detectable fingerprints, complexity, and implementation restrictions. In this paper,\footnote[2]For the extended version of this paper which includes additional appendices, refer to: https://arxiv.org/abs/2207.05676 . we present a new hypervisor-assisted debugger for high-performance and stealthy debugging of user and kernel applications. To accomplish this, HyperDbg relies on state-of-the-art hardware features available in today's CPUs, such as VT-x and Extended Page Table (EPT). In contrast to other widely used existing debuggers, we design HyperDbg using a custom hypervisor, making it independent of OS functionality or API. We propose hardware-based instruction-level emulation and OS-level API hooking via extended page tables to increase the stealthiness. Our results of the dynamic analysis of 10,853 malware samples show that HyperDbg 's stealthiness allows debugging on average 22% and 26% more samples thanWinDbg andx64dbg, respectively. Moreover, in contrast to existing debuggers, HyperDbg is not detected by any of the 13 tested packers and protectors. We improve the performance over other debuggers by deploying a VMX-compatible script engine, eliminating unnecessary context switches. Our experiment on three concrete debugging scenarios shows that compared toWinDbg as the only kernel debugger, HyperDbg performs step-in, conditional breaks, and syscall recording, 2.98x, 1319x, and 2018x faster, respectively. We finally show real-world applications, such as a 0-day analysis, structure reconstruction for reverse engineering, software performance analysis, and code-coverage analysis.


INTRODUCTION
Debuggers are an essential element in software development and analysis that are actively employed by computer engineers to improve efficiency, detect security flaws, and fix bugs in software programs.Additionally, debuggers are also utilized as a valuable tool for software reverse engineering and malware analysis purposes.There has been a series of commercial and open-source debugging software offering convenient features to address such needs [19,22,76,89,98].Given the outstanding growth in the sophistication and complexity of evasion and obfuscation methods, it is necessary to facilitate powerful debuggers to analyze, detect, and understand malware.
Modern binary executables, armed with packing [101], evasion [29], and hardware-assisted techniques [49,74], employ a series of methods that involve anti-virtualization [5], anti-emulation [55], as well as side effects and footprint detection [1] (e.g., call to specific OS APIs) to impede debugging.Despite many valuable efforts for development of transparent and effective analysis methods in the community [26,27,73,82,100,102], currently available debugging tools struggle to encounter modern protected programs and malware.These tools lack elaborate kernel-side components to offer deep scrutiny for reverse-engineering purposes.A comprehensive analysis of 4 million malware samples shows that 88% are equipped with anti-reversing, and 81% with anti-debugging or virtualization techniques [9].Utilizing OS APIs [11] or leveraging ring-0 options [35] leads to artifacts and leakages that high-privilege malware can detect.
All these complications have recently attracted the attention of researchers to integrate the debugging infrastructure deeper into the hardware.As a result, solutions based on bare metal [50,96,102], hypervisor-level (VT-x) [26,31,53,97], System Management Mode (SMM) [102], or even Intel Memory Management Engine (MME) [30] are used to minimize the leakage of the debugger's presence.This increases the transparency of the debugger and thus its stealthiness.While these lower-level realization of debugging mechanisms increase the transparency surface, they suffer from huge performance degradation.Although sub-kernel deployment [103] of debugging, monitoring and software analysis tools can offer a powerful platform for such use cases such as analyzing evasive malware, previously-proposed sub-kernel debuggers fail to provide rich debugging functionality as they have been either discontinued [20,31], developed for pure academic purposes [31], or have not been through thorough development and testing required for dealing with real-world applications and scenarios [103].Moreover, the availability of the source code for such tools is still known to be a requirement in the community.
In this paper, we propose HyperDbg, a hypervisor-based (ring -1) debugger designed to use modern hardware technologies to provide new features to the reverse-engineering community.It operates on top of Windows by virtualizing an already running system using Intel VT-x.As a primary goal, HyperDbg strives to be as stealthy and OS-independent as possible.HyperDbg avoids using any operating-system APIs and software debugging mechanisms.Instead, it extensively uses processor features such as Second Layer Page Table , i.e., Extended Page Tables (EPT), to monitor both the kernel and the user executions.
Avoiding OS-based debugging APIs increases the transparency against classic anti-debugging methods.Moreover, by directly relying on hardware feature, HyperDbg is hard to detect with timedelta methods that detect the presence of hypervisors, e.g., by detecting the overhead of traps into the hypervisor [65,77].Such hardware-enabled features also allows HyperDbg to offer various state-of-the-art functions such as hidden hooks, which are as fast as current inline hooks but also offer stealth debugging.Hyper-Dbg supports Hardware Debug Registers simulation to break on read and write accesses to a specific location while remaining entirely invisible to both the OS kernel and the programs.Moreover, such hardware-assisted features make it possible for HyperDbg to eliminate limitations previously imposed by Hardware Debug Registers in size and count [102].We evaluate the transparency by extensive evaluation against anti-debugging, anti-virtualization, anti-hypervisor methods, and packer software.HyperDbg was not detected by any of the 13 tested packers and protectors.No other existing debugger achieves this level of stealthiness, with debuggers being detected on average by 44% of packers and protectors, with no debugger detected by less than 3.We demonstrate the applicability of transparent debugging on 10,853 malware samples.Our results show that HyperDbg successfully analyzes 22% and 26% more malware samples compared to WinDbg and x64dbg respectively.We also describe an existing 0-day vulnerability in Windows 10 kernel successfully analyzed by HyperDbg's transparent mode, rediscovered during our experiments.
For high-performance debugging, HyperDbg uses a VMX-rootcompatible script engine that executes the entire debugging functionality in the kernel mode, enabling complex debugging functionality.Our script engine eliminates any user to kernel-mode interaction, making any OS-level API obsolete while providing a huge debugging performance.We evaluate the improved debugging performance in three concrete debugging scenarios: stepping, conditional breaks, and syscall recording.Compared to the state-ofthe-art debugger WinDbg, HyperDbg is 2.98, 1319, and 2018 times faster, respectively.
We show that the unique design of HyperDbg enables use cases beyond classical debugging scenarios.We describe how the proposed debugger enables transparent debugging of I/O devices, analyses performance of software, and provides means for code coverage usable for (kernel) fuzzing.Finally, our analysis of a Windows 10 0-day in a kernel-mode bootkit malware shows that HyperDbg is mature enough for real-world malware analysis.
Outline.The remainder of this paper is organized as follows.
In Section 2, we provide required background information.Section 3 presents the design, and Section 4 the architecture of Hy-perDbg.Section 5.2 introduces the script engine, and Section 6 the transparency-mode of HyperDbg.Section 7 provides the transparency and performance evaluations.Section 8 describes additional use cases.Section 9 discusses related work, and Section 11 concludes the paper.

TECHNICAL BACKGROUND
In this section, we survey the technical background knowledge to describe the design of the proposed debugger.We briefly review the structure and features of modern debuggers, hypervisors, and the main hardware capabilities provided by Intel, on top of which HyperDbg is implemented.

Modern Debuggers
Debugging is fundamentally defined as the process of examination and analysis of a software program to understand or locate the unsatisfying code snippets in terms of functionality, performance, or security flaw [2,52].To address the desired functionalities, a debugger should facilitate multiple mechanisms.Stepping through the source code or assembly, memory inspection and modification, as well as breakpoint definition are vital features in commodity debuggers.From the reverse engineering and malware analysis perspective, debuggers generally fall into two categories of user-mode and kernel-mode debuggers [35].User-mode debuggers provide the basic functionality to analyze a user-mode process.They are simply implemented and easy to use.User-mode debuggers give a convenient and isolated environment for the user.x64dbg [98], Ollydbg [76], and Immunity Debugger [44] are well-known examples of user-mode debuggers.Kernel-level debuggers run in kernel mode, which grants them higher privileges in terms of register and memory access during the program's execution.WinDbg [19] and GDB [22] are famous examples of kernel debuggers that are widely used for reverse engineering and malware analysis [1].With advances in malware evasion techniques [5], researchers have been showing interest towards virtualization, simulation, and hardwareassisted debugging methods [54] that can offer a more transparent environment for code analysis and low-level modification of the execution flow [73,102].

Instruction Set Architecture (ISA) Extensions
In this section, we briefly describe Intel VT-x, Intel EPT, and Intel TSX ISA extensions employed in the proposed hypervisor-level debugger.Note that HyperDbg in its current format only supports Intel processors and is built based on Intel technologies and terminology.However, similar hardware features exist both for AMD and ARM processors that can be exploited likewise.Further description is provided in Appendix F.
Intel Virtualization Technology (VT-x).Intel VT-x (formerly known as Vanderpool) is the hardware virtualization technology provided by Intel for IA-32 processors to simplify virtualization and increase the performance of VMMs [71].VT-x introduces new data structures and instructions to the ISA [32] and enables processors to act as if there were several independent processors to allow multiple operating systems to run simultaneously on the same machine.

Intel Extended Page Table (EPT).
Intel VT-x technology comes with a hardware-assisted Memory Management Unit (MMU) and the implementation of Second Level Address Translation (SLAT), known as Extended Page Table (EPT).By translating the Guest Physical Address (GPA) to Host Physical Address (HPA) on the CPU level [92], EPT eliminates the overhead associated with softwaremanaged shadow page tables [43].In Intel's design, each CPU core can use a separate EPT Table, which allows for multiple independent accesses from different OSs concurrently.

Intel Transactional Synchronization Extensions (TSX).
Intel TSX is the product name for a set of x86 instruction set extensions, including Restricted Transactional Memory (RTM), which adds support for the declaration of hardware transactions.Instructions inside such a transaction either all succeed or are rolled back altogether, in case any error or abortion occurs during the transaction, e.g., interrupt from OS [46,88].In this paper, by using the term Intel TSX, we refer to RTM specifically.

Terminology
As the implementation here is based on Intel processors and the target OS is Microsoft Windows, we describe the low-level design of our system based on Intel and Windows terminologies.
Hypervisor.A hypervisor (also known as a virtual machine monitor or VMM) is a software that makes virtualization possible by virtually sharing the resources, such as memory and processor [15,21,79].It abstracts guest machines and the operating system from the actual hardware and runs virtual machines (VMs).
Interrupt Request Level (IRQL).An Interrupt Request Level (IRQL) is a hardware-independent mechanism that Windows uses to prioritize interrupts and code.Processes running at a higher IRQL preempt a thread or interrupt running at a lower IRQL.[18].
There are different IRQLs used for different things.DIRQL is for interrupt service routines (ISRs) of hardware and external devices, DISPATCH_LEVEL is used for the scheduler, DPCs, and code protected by spinlocks, APC_LEVEL is for asynchronous procedure call (APC) routines, and PASSIVE_LEVEL for user code, dispatch routines, and PnP routines.Appendix C gives a complete description of the used terms featured by Intel, which are also briefly tabulated in Table 4.

HIGH-LEVEL OVERVIEW
This section provides a brief high-level description of the design of HyperDbg and its building blocks.Here, we describe how the proposed debugging functionalities are implemented by a high-level abstraction and propose three debugging operations modes.

High-level Debugging Flow
On the high level, like other debuggers, HyperDbg is designed to perform a level of analysis within a target system referred as the Guest.The source debugging instructions are usually sent from an external system known as the debugger Host. Figure 1 illustrates a high-level overview of HyperDbg's sub-systems and execution flow.As shown, the debugger is an end-to-end framework, connecting the guest and the host systems by a communication interface (e.g., Serial).While the core building blocks are all deployed within the hypervisor-level on the guest side, the host side provides a CLI interface with the user and deploys an assembler/dissembler as well as a front-end engine for the debugging functionalities.Multiple debugging sub-systems are deployed in the VMX-root mode of the guest system, which directly utilizes hardware features (e.g., EPT) for their functionality.As shown in Figure 1, the debugging commands are taken by the host where they are (dis)assembled and parsed through the script engine in 1 .Then, the commands are sent via a communication channel to the guest.These commands are interpreted on the script engine's back-end at the guest's hypervisor-level.Based on the requested debugging routine, any user or kernel-mode debuggee program code can be targeted on the guest side with a direct access to the execution flow as indicated in 4 .The sequence of the commands and functionalities are executed based on an event-triggered routine (Section 3.2) according to each sub-system as depicted in 2 .Finally, The sub-system functionalities utilize hardware-based features (e.g., EPT) to execute their operation in 3 .We describe the deployment of each sub-system in detail in Section 4. Note that in the practical debugging procedure, bi-directional communication is required between the host and guest.However, as shown in the figure, with the use of the script engine, HyperDbg can confine the communication in an automated routine within the guest kernel mode if necessary.

Event-Triggered Interface
To facilitate the debugging routines, we control the usage of the underlying functions and building blocks by an abstracted concept referred as an Event in HyperDbg.Subsequently, we define Conditions and Actions that are used in the sub-system procedures for debugging.

3.2.1
Events.An Event is the occurrence of an incident that is of interest to the debugger.This comprises a wide range of activities ranging from a specific system call (Syscall) that the debugger is set to monitor, to access to a particular memory address.HyperDbg

Serial Interface
Stepping Sub-Sys.can be configured to perform arbitrarily defined actions upon the occurrence of each event.A list of the supported events provided by HyperDbg is presented in Table 3 in the Appendix A.

Actions.
Upon having an event triggered, HyperDbg can invoke specific functionalities known as actions.HyperDbg provides three types of action: Break, Script, and Custom Codes.The Break action is the conventional feature of classic debuggers where all processing cores are paused until the debugger's further permission.The Script action allows viewing and modifying parameters, registers, and memory contents without breaking into the debugger.It also permits creating logs and running codes in the kernel space.The Custom Codes action provides the ability to run custom assembly codes whenever a specific event is triggered.

Conditions.
Conditions are specific circumstances that can be defined by the user in form of logical expressions to constrain the execution of an event.This, in turn, allows for the definition of conditional events where an event is triggered only upon evaluation of an expression to true.

Operating Modes
Based on different applicability, HyperDbg provides two modes of operation described as follows.
3.3.1 VMI Mode.Virtual Machine Introspection (VMI) Mode is presented for regular user application debugging and kernel-mode local debugging.Although it offers a conventional debugging experience by providing access to all HyperDbg features (including debugging, halting, and stepping user-mode applications) in an out-of-the-box fashion, kernel-mode breaking to the debugger and stepping are limited.VMI mode also allows scripts and custom codes in both user-mode and kernel-mode for local or remote debugging.

Debugger
Mode.Debugger Mode is a powerful operating mode that allows for connecting to the kernel and halting the system to step-in and step-over through the kernel and user instructions.Here, debugging connectivity is carried out with a serial cable or a virtual serial device.

Transparent Mode. Both modes can be used in Transparent
Mode, which offers stealth debugging by attempting to conceal Hy-perDbg's presence on timing and micro-architectural levels.While the adversarial dynamic between malware generators and antimalware producers is a never-ending process and this mode does not guarantee 100% transparency, it makes it substantially more challenging for the anti-debugging and anti-hypervisor methods to detect the debugger.It is noteworthy to mention that HyperDbg is already immune to high-level anti-debugging methods that rely on API-specific methods to detect debugging environments (e.g., selfdebugging binaries).The presenting transparency methodology is described in Section 6 and is thoroughly evaluated in Section 7.1.

BACK-END ARCHITECTURE
This section explores the architectural design of HyperDbg on a sub-system level.We describe the challenges and shortcomings of the existing methods and debuggers for each sub-system.Then, by describing the underlying detailed implementation of the core sub-systems, we propose HyperDbg's approach to address each of these challenges.

Stepping Subsystem
In this section, we investigate the stepping mechanism used in conventional debuggers and their shortcomings with regards to their capability in delivering a true line-by-line stepping procedure.We discuss the solutions offered in HyperDbg as a VMX-root mode debugger to provide different stepping mechanisms and address these issues.

Step-in.
Step-in offers the conventional step functionality available in commodity debuggers (e.g., WinDbg [19], GDB [22]) by setting the RFLAGS trap flag to make the system stop after execution of a single instruction.This allows the debugger to read/modify the content of the registers and the memory by following a trap flag in the kernel.
Challenge.Conventional stepping mechanisms cannot guarantee a line-by-line stepping procedure as all other CPU cores and processes may execute their routines, and interrupts can drastically alter a program's execution flow.Figure 2a shows an example of the step-in where the execution flow is disrupted by a #DB exception interruption.A naive solution would mask all external interrupts by clearing the Interrupt Flag in RFLAGS.However, intercepting/preventing the interrupts can easily break the OS semantics.HyperDbg introduces the instrumental step-in to provide a guaranteed stepping mechanism in debugging routine.

Approach. Considering the shortcomings of the conventional
Step-in mechanism, HyperDbg introduces an instrumentation Stepin mechanism by employing the Monitor Trap Flag (MTF); a feature that works similar to RFLAGS's Trap Flag (TF) but appears transparent to the guest.Moreover, Non-Maskable Interrupts (NMIs) are used to ensure that the execution is done on a single core while other cores are halted.This method entirely overcomes the disruptions by inevitable interrupts.

Instrumentation
Step-in.To the best of our knowledge, Hy-perDbg is the first debugger to address the issue by presenting a guaranteed stepping method.According to Figure 2c, after executing the target instruction, a VM-exit is triggered (as an MTF has been previously set).Doing so guarantees that only the succeeding instruction is executed in the debugging guest.To do so, HyperDbg continues on only one core and disables interrupts on the same core (ignoring external interrupts by setting the external-interrupts exiting bit in VMCS) to offer a fine-grained stepping.This method provides the user with the unique feature to instrument routines from user-mode to kernel-mode and kernel-mode to user-mode that is not possible though other kernel debugger (WinDbg).As an example, whenever the user-mode executes a SYSCALL instruction, HyperDbg allows the user to follow the instructions directly into the kernel and execute the next instruction in the kernel-mode (SYSCALL handler).Similarly, if a page-fault occurs in the middle of a user-mode application, the debugger is moved into the kernelmode's page-fault handler.Kernel-mode to user-mode migration is also handled by HyperDbg, e.g., executing a SYSRET or IRET returns the debugger to user-mode from kernel-mode.

4.1.3
Step-over.The step-over mechanism in HyperDbg is very similar to conventional Step-in, except for the call instruction where the debugger sends the length of the call instruction to the debuggee, and instead of setting the Trap flag, it sets a Hardware Debug Register to the instruction after the call.Therefore, when the call is finished, the Hardware Debug Register is triggered, and the debugger is notified about the next instruction.Since other threads/cores might also trigger the Hardware Debug Register (as all the threads/cores are continued through the stepping), HyperDbg ignores such #DBs from other Thread IDs/Process IDs and re-sets the debug register until reaching the correct execution context and target thread that is supposed to trigger the Hardware Debug Register.Figure 2b shows the overview of the step-over stepping mechanism in HyperDbg, where upon inspection of a call instruction, a debug breakpoint exception (#DB) is thrown for the next instruction.

Hooking Subsystem
Hooking in the context of debugging is the act of intercepting an arbitrary event (e.g.execution of a breakpoint on a particular address), running specific commands, and turning the execution flow back to the conventional routine at the entry point of the event.
Challenge.Existing hooking systems in commodity debuggers implement direct memory access, which a user-mode software can easily check and detect.The integrity of memory can effortlessly be verified as well.This leaves the possibility of debugging detection for evasive malware.Moreover, Hardware Debug Registers used to record memory content in debugging process are fixed in number and size, limiting hooking performance.4.2.1 SYSCALL and SYSRET Hooks.HyperDbg implements hooking functionality by triggering an undefined opcode exception (#UD) (by clearing the SCE bit in the Extended Feature Enable Register, i.e., IA32_EFER) and checking for the originating cause of the exception.The user can execute arbitrary scripts and set hooks for arbitrary system calls through the OS (SYSCALL) or any return of the execution flow from a system-call (SYSRET ).During a userto-kernel or kernel-to-user emulation, the debugger can monitor, execute or modify the system context before the actual execution of the instructions.HyperDbg provides the following approach for its novel hooking capabilities.
Approach.HyperDbg allows the user to monitor and manipulate memory accesses while remaining transparent by providing two EPT hooking mechanisms that reveal an unmodified version of the target page to the application.This methodology delivers an entirely transparent memory hook via EPT.Furthermore, we emulate Debug Registers to increase address traceability surpassing the previous limitations.

EPT Hidden Hook.
We propose EPT-level hooks that are not visible to the user-mode program or the operating system when attempting to read the hooking address.The first type of hidden breakpoints in HyperDbg are Classic EPT Hooks, which are achieved by injecting a #BP (0xcc) to the target machine's memory to cause a trap upon an attempt from the guest to execute the target memory address.The second variation of hidden hooks utilizes Detours-Style Hooks [10] (Inline EPT Hooks), which change the execution path by jumping to the patched instructions and returning the execution flow to the regular routine after the callback.While the latter approach has some flexibility constraints (e.g., limitations with the usage of the script engine, the range of hookable addresses, number of hooks in a page table), avoiding the costly VM-exit operation makes for a substantially faster hooking mechanism.

Limitless Simulating of Debug Register (monitor).
EPT hooking also allows for monitoring any read/write to any range of addresses by causing an event trigger to emulate Hardware Debug Registers capability while eliminating its limitations on the number and lengths of trackable addresses [13].

Memory Access in VMX-root Mode
Implementation of safe memory access is one of the challenging parts of designing a hypervisor-level debugger, as there are many scenarios that can lead to system halt or an exception (e.g., access to paged-out [40] pages in the VMX-root [47], and access to userspace memory from the VMX-root mode) that cannot be addressed using readily-available primitive instructions (e.g., mov).Challenge.Safe memory access through VMX-level is extremely complicated as it is often handled by the OS.This often results in performance overheads and footprints in conventional current debuggers.However, safe and efficient memory access is necessary for many use cases such as malware analysis.
Approach.We propose a series of methodologies to address the complications of VMX memory management, described as follows.

Discovering Page-table Entries. The conventional method in
HyperDbg to detect a valid page is checking for the presence of a valid page-table entry (with set present bit) for its target address.This method requires traversing through the page tables to carry out the discovery process.As an alternative method, we make use of Intel TSX.TSX suppresses exceptions/faults without any switch between user/kernel modes.This ability is leveraged in Hyper-Dbg to check for the validity of a page by checking the successful execution of a transaction involving the target address.A similar approach has been used by Schwarz et al. [86] to check if an address in SGX is mapped.This method can be carried out using only a few instructions (Listing 1); however, as not all processors support this capability, HyperDbg automatically checks for the processor's support of this feature and switches to the former method if necessary.Our experiments show that a TSX-based page discovery for user-mode debugging is roughly three orders of magnitude faster since normal traversing requires the requests to be forwarded to the user for validity check.However, in kernel-mode applications, the method incurs a 40% slow-down due to the domination of cycles introduced by RTM routines.JMP R e t u r n MOV RAX , 0 R e t u r n :

RETN
; R e t u r n t h e r e s u l t Listing 1: Using Intel TSX to detect address validity.

Retrieving a Page by Injecting Page Fault (#PF).
Upon absence of a page, HyperDbg injects a page-fault to the debuggee (by configuring the CR2 register to the target virtual address) to request the VMX non-root to bring the page back from the hard disk to the RAM when it is resumed.While this method is not applicable in some scenarios (e.g., in DISPATCH_LEVEL IRQL level as paging is not available), it can be useful in many others (e.g., upon execution of a SYSCALL or SYSRET where the system is guaranteed to be in PASSIVE_LEVEL).

VMX-root Mode Compatible Message
Tracing.Sending a message from VMX-root mode to VMX non-root mode is a challenging part of hypervisor design due to various limitations of accessing paged-pool buffers in VMX-root mode.Notably, most NT functions are not ANY IRQL compatible, as they might access buffers that reside in paged pool memory.To send commands and messages from VMX-root mode to the user-mode application or the debugger, HyperDbg provides a custom VMX-root mode compatible message tracing mechanism.This mechanism operates on the non-paged pool, and its memory is visible in VMX-root mode.By deploying specialized messaging buffers, we ensure that the messages are only sent when the paging process is safely accessible on the kernelmode.The details of this mechanism is thoroughly discussed in [83].

4.3.4
Reading and Writing Memory.Due to the various safety considerations surrounding making direct access to a user-space address from VMX-root mode, HyperDbg is designed not to access the memory directly but to use a virtual addressing method to reserve a page-table entry and map the desired user-mode physical address to a kernel-mode virtual address to enable safe memory read/write access.Furthermore, the write-enable bit in the PTE eliminates the check for the writability of the target address.
4.3.5 Pre-allocated Pools.Given that most of HyperDbg's routines operate in VMX-root mode, HyperDbg makes use of pre-allocated pools to provide a mechanism for addressing the conventionally impossible [47] issue of allocating memory in the VMX-root mode.These pools (when divided into 4KB granularity) provide the resources necessary for EPT hooks.HyperDbg's memory manager routines periodically check for any deallocation/replacement of memory pools needed in VMX root mode and performs them when the debugee is in VMX non-root mode.

FRONT-END ARCHITECTURE
In the following section, we explore the intermediatory components of HyperDbg's connecting back-end VMX-root mode sub-systems with the host machine as well as the user-interface functionality.Specifically, We describe guest-host communication and the kernellevel script engine.Although the core functionality of the proposed script engine operates on the guest side's VMX-root, we regard all non-VMX-root sub-modules in our framework as front-end here.

VMX-ROOT MODE
H/W Sys.

Communicating and Task Appliance
The impracticality of using Windows API for data transmission over network in a debugger can be attributed to the unavailability of interrupts in VMX-root mode (which forces the mode of communication to polling mode) and the need for extra implementation, as Windows uses different device stacks in different IRQL levels for networking.Owing to these challenges, HyperDbg utilizes serial ports for data transfers as it simplifies many aspects of design and usability and enables the use of polling mode.Figure 3 shows the general overview of HyperDbg communication routine.In addition to the serial communication, which is the conventional mode of communication in HyperDbg, KDNet functionality [68], which is publicly available from Windows SDK [4], has also been incorporated in HyperDbg as an alternative and more modern means of communication.
5.1.1Sending Data over Serial.Following the connection initialization between a serial device and its corresponding serial port, a connection to the target device can be established by providing the COM argument.HyperDbg supports up to four different serial ports at a time.Furthermore, halting a debuggee is performed by sending an interrupt signal using the interrupt mode of the serial device, which eliminates the need for gritty checks in polling mode when the debuggee is running.The interrupt to the user-mode application of the debuggee is passed down into the kernel-mode, where eventually, a VMCALL is invoked to put the debuggee to the pause state in the VMX-root mode and await further commands (packets) from the debugger.

Communication between Cores.
Upon an event getting triggered, HyperDbg checks for a corresponding action and halts every other core in the VMX-root mode in case of a break action (by sending Non-Maskable Interrupts (NMIs) [24], which cause the core to spin on a spinlock and invoke a VM-exit and await further commands from the debugger), or executing the custom code/script without notifying the other cores, otherwise.

Kernel-level Script Engine
Modern day debuggers fall short in providing a high-performance and highly customizable scripting framework.Striving to address this gap and faced with the lack of support for direct access to memory in VMX-root mode, we designed a VMX-enabled script engine from scratch.4a, the script engine is comprised of a back-end (that uses LL(1) and LALR(1) parsers for maximum efficiency) and a front-end that uses a MASM Style syntax with C keywords (e.g., if, else, for) and an easily customizable grammar.The user-inputted scripts are delivered to the front-end host, scanned via a lexer, and parsed into an Intermediate Representation (IR), which is sent into a buffer over the serial interface into the guest's kernel VMX-root mode for execution.Afterward, a buffer is gradually filled with the execution results and transmitted back to the host.This approach offers substantial performance improvement compared to the conventional bidirectional method used in commodity debuggers (where commands and scripts are sent and parsed line by line) by sending the entirety of the script into the VMX-root mode, and the response back into the user mode, in a unidirectional flow.
As illustrated in Figure 4b, it is also possible to set a script as the action of an event.In this scenario, the parsed IR script is stored into the VMX-root kernel once, and upon having its corresponding event triggered, the IR is performed locally, thus improving the execution performance of the script engine.A sample script with a detailed description of the example is provided in Appendix E.

TRANSPARENCY ANALYSIS
In this section, we investigate the side effects and overhead created by HyperDbg which potentially could be exploited for detection.We further analyze different levels of transparency analysis using malware anti-debugging methods.Furthermore, we propose a statistical approach for hardening HyperDbg against timing sidechannel attacks targeting sub-OS intercepting entities.

Hypervisor Detection Methods and Mitigations
Detection of sub-OS third-party programs (e.g., hypervisors) is carried out by querying for a set of indicative footprints, such as registry keys, system-calls (e.g., to discover running processes and loaded drivers), and instructions [94] (e.g., CPUID, IDT, LDT).HyperDbg counters these endeavors by intercepting the attempt, forcing a VM-exit, and emulating the corresponding return values with those of a normal, non-virtualized environment in the VMexit handler.Table 1 provides a comprehensive overview of these methods.More sophisticated hypervisor/VM detection methods exploit timing side channels.The key idea is the fact that certain instructions (e.g., CPUID, GETSEC, INVD, XSETB) cause a VM-exit routine when executed.If the target program is running in a VM, this results in a longer execution time than on bare metal, which can be detected by timing measurements.Listing 2 shows an example of such attacks.In the following, we describe the mechanisms in HyperDbg to counter these detection methods.

Timing Transparency in HyperDbg
HyperDbg's transparent mode offers a solution for hiding the virtualization timing leakage by identifying VM-detecting sequences and replacing the timing values with those of a non-virtualized system.To the best of our knowledge, HyperDbg is the first debugger to offer a practical means to modify timing footprints used by analyzer software to detect virtualized environments.By using a statistical model of the execution time, extensive timing profiling is executed prior to the launch of the VMM module in order to emulate timestamps as close to the normal operating condition of the guest OS as possible.We use a two-term Gaussian Distribution as a regressor function, as our experiments indicate that it can be a good fit for modeling the execution times of such nature.Figure 5 shows the Probability Distribution Function (PDF) of our measurements by running 10k executions of the sequence in Listing 2, with and without HyperDbg enabled.These values can be derived and the statistical parameters can be recorded for emulation purposes.
HyperDbg currently enables covering the VM timing leakage by providing two methods that are: 1) Adjusting the MSR register that keeps track of the CPU's time which is referred as IA32_TIME _STAMP_COUNTER, and 2) Emulating the results of RDTSC and RDTSCP instructions that provide the means for user-level applications for accessing the CPU timing values.Each of these methods comes with its own set of advantages and setbacks.The former approach does not require a VM-exit for its function, which simplifies the solution and allows for covering more complex VM-detection sequences, but might increase the instability and interfere with the normal functionality of the system as other applications also rely on this mechanism for their timing measurements.In contrast, the latter method (Figure 6) does not cause any interference with the inner workings of the system, but requiring a VM-exit adds a layer of complexity, as emulating timing values expected by an examiner program armed with sophisticated patterns for VM-detection would require extra considerations.
The transparency function can be used on a process or a list of executables, as global emulation of timing instructions would most likely disturb primary functionalities of the system (our experiments show disturbances in the screen driver, as well as audio output performance when a global emulation is implemented).

Alternative Timing Attack Methods
In addition to the methods explained in 6.2, HyperDbg can impede various forms of timing attacks used by malware to expose the presence of a debugger.Since HyperDbg operates at the hypervisor level, it is possible to use HyperDbg to safeguard against other timing attacks that utilize alternative timing resources, such as shared ticks and Hardware Performance Monitor Counters (RDPMC), both of which have built-in support to defend against in HyperDbg's Transparent Mode (See  timing-thread attacks are one of the more challenging attacks to defend against, it is possible to detect such attempts using any known timing resources using HyperDbg.HyperDbg is actively adding methods that safeguard against new techniques used by malware for debugger detection.

EVALUATION
In this section, we thoroughly evaluate HyperDbg's transparency and performance in different scenarios.

Transparency Evaluation
We evaluate the transparency mode of HyperDbg using two of the best-known tools that offers stress-testing for anti-debugging and protection methods, pafish [3] and al-khaser [70].
In accordance with our expectations, the first method, which involves updating the IA32_TIME_STAMP_COUNTER, interferes with the primary functions of the system and causes screen flickering during our experiments.Regardless, the second method (emulation) was able to successfully pass these tools when enabling emulation for the anti-debugging testing software.As an extension of our transparency analysis, we separately evaluate HyperDbg against common anti-debugging methods and commercial off-the-shelf packers/protectors.

Evaluation Configuration.
In our experiments, we analyzed 10,853 malware samples in different categories derived from a malware database [93].Each of these malware samples is executed in HyperDbg's normal and transparent mode in the VMI Mode as well as x64dbg (user-mode debugger) and WinDbg (kernel-mode debugger) in Microsoft Windows 10 20H1 for comparison.We employ a client/server paradigm to distribute the samples among the client systems that execute the binaries and record logs from the executions.Each client asks the server for its according sample over a simple HTTP application.We use two approaches to restore the system: a rebootless, Barebox-based method [50] and an automated system restoration method based on Windows System Restore.

Barebox-based Approach
We first attempt to run the malware under a setup that aims to rebootlessly restore the system using a set of Barebox-based methods to decrease the restoration time and improve performance.After fetching the malware from the server, the client globally disables interrupts and continues the main thread on a single core.While this can impose some performance penalties and communication issues with external devices, by removing the context switch to other processes, it simplifies the system restoration process and allows for observation of the effects of malware on the system exclusively, as it prevents unnecessary modifications to the structures unrelated to the subject binary.
Next, the automation program loads the binary and locates its entry point by capturing the page fault that follows in the initialization phase of the execution of a binary in Windows.A hardware breakpoint is then applied in this address (entry-point), which allows HyperDbg to be notified at the beginning of the program's execution, once Windows is done with the initialization process of executing the binary.HyperDbg uses this to trigger the mechanism used for rebootless restoration of the contents of the memory.
To increase the performance of the testing process by limiting the restoration only to the modified pages, we make a snapshot of the clean installation of Windows by disabling writing on pages, which is accomplished by clearing every write bit on EPT pages.This results in an EPT violation for a write request.We use the handler of this violation to make a clone of the pages that the malware attempts to write and then release the write lock to allow the binary to continue with its normal execution.While at the beginning of the execution, this can decrease the performance and sometimes make the system unresponsive, the performance improves as the execution progresses.To mitigate any resulting artifacts, HyperDbg is set to run in Transparent Mode.
Once the execution is finished, and the logs are saved using the VMX-root Mode Compatible Message Tracing Mechanism (Section 4.3.3), the interrupts are re-enabled, and our master program updates the order of runs and proceeds to restore the system.To restore the information on the volatile memory, we replace the pages that were cloned during the execution process back to their original state, while for disk writes, we use Shadow copy [69] combined with a mini-filter driver that monitors the modification/creation/deletion of the memory/registry to restore the disk to its initial state from the clean installation.This process takes around 7 minutes per malware sample.
If the system cannot be restored with this approach, we resort to the second approach, which requires a reboot and is based on Windows Restore Point.These cases include restoration of the system for bare metal executions, rootkit/bootkit samples, and cases where the execution encounters an error.A malware is classified as rootkit/bootkit using the tags provided by the malware providers and by detecting any attempt to load drivers.The success of the execution is measured using a Win32 API call [67].
Windows System Restore Approach As an alternative approach, we rely on the Windows System Restore functionality.As there is no command-line tool for restoring snapshots, we implement a small tool to use the GUI tool automatically.This method takes about 25-35 minutes to restore the system, based on the state of the system and the modifications made by the malware.
Overall, the testing process takes about 468 hours (168 hours testing malware with HyperDbg and 100 hours each for testing bare-metal, x64dbg, and WinDbg).We use 10 systems for this test.  1 describes the common anti-debugging and anti-virtualization methods [1,102], and HyperDbg's countermeasure to impede detection.Each of these methods is applied separately in HyperDbg's Debugger mode and activates the suitable countermeasure to verify the transparency of the proposed debugger.Furthermore, for an end to end transparency analysis, all the mitigation techniques are activated.We employ a combination of rebootless and rebootbased approaches for system restoration after the execution of each malware sample.
We observe that a relatively large percentage of the samples detect the debugging environment in WinDbg and x64dbg and change their behavior accordingly to conceal their malicious behavior.Considering WinDbg is the baseline debugger, Figure 8a reports the percentage of successfully executed malware samples where the debugger is attached.For this experiment, we measure the success rate of the execution of malware samples by carefully logging the syscall sequence in the target system by hooking the syscalls (changing IA32_LSTAR).As shown in Figure 8a, HyperDbg's Transparent mode increases the transparency surface by 22% compared to WinDbg, executing malware samples in all four categories while remaining undetected.This is due to the fact that HyperDbg operates at the hypervisor-level, which minimizes footprints that anti-debugging/VM methods in malware use to detect a debugging environment.We manually investigated malware samples that detect the presence of HyperDbg and reverse engineered the binaries to inspect their inner-workings.We could attribute the detection of HyperDbg to two main factors.Employment of hypervisor-specific techniques that lead to the non-successful execution of malware in already virtualized environments, and utilization of methods that detect the absence of PatchGuard, or Driver Signature Enforcement (DSE) in the system.These methods can be countered by adding support for nested virtualization (cf.10) and obtaining a valid driver signature.However, this does not imply a claim of full invisibility upon the addition of said improvements.For example, HyperDbg is a new and open-source tool and it is possible for malware producers to focus on the stealthiness methods employed in this tool and find new methods to counter those efforts to reveal its presence.

Syscall Malware Analysis.
The transparent mode of this debugger offers the advantage of stealthily monitoring malware's execution.Figure 7 shows the syscall execution flow of a sample malware on HyperDbg.For high-level comparison it is possible to extract execution flow and divergence point of evasive malware here.One can execute the malware on a bare-metal system with no debugging present rather than HyperDbg's transparent mode for monitoring purposes.We attach a kernel-mode debugger (WinDbg) to the malware, execute the malware, and record the syscall execution flow.We use a simple script code using HyperDbg's script engine to trace the SYSCALLs in the execution flow.(See Appendix E) As Figure 7 depicts, the execution flow of the malware does not follow a similar behavior in the different environments.As a simple analysis, we could come to the conclusion that this specific malware uses anti-debugging methods.To conceal its malicious intention, an entirely different (and most likely safe) execution path is chosen within the malware code when a debugger is detected.The same approach is used to measure whether the malware samples are running or not, as shown in Figure 8a.

Evaluation by Packers and Protectors
Testing.We test Hyper-Dbg with binaries processed with packers and protectors.These binaries are tested on different debuggers as well as HyperDbg in both regular debugging and transparent mode debugging.Table 2 shows the results of attaching and debugging these protected binaries.

Performance Evaluation
In terms of performance, we analyze HyperDbg in three debugging scenarios which are discussed in the following.For our performance evaluations, we used a machine with an Intel core i7-6820HQ with 16GB of main memory, running Windows 10 20H1.

Performance Analysis of Scenario 1:
Step-in.Single stepping is one of the most fundamental functionalities of a debugger that has been carefully optimized in HyperDbg to become as fast as possible.To evaluate the performance of this functionality, we considered  = 100 sets of 65,536 predefined instructions (a particular application) to evaluate the performance.HyperDbg was able to instrument the instruction sets on average in 6 minutes and 51 seconds ( = 411 seconds) with the standard deviation of  = 28.3seconds.It took WinDbg on average 1, 221 seconds with the standard deviation of  = 118.4seconds to perform the same function on the same instructions sets.Thus, HyperDbg takes 2.97 less time on average to execute the same analysis compared to WinDbg's instrumenting.For a fair comparison with WinDbg, we also used  An essential part of analyzing binaries are conditional breakpoints, which have been attempted to be implemented robustly and considerably faster than almost all of the currently available debuggers in HyperDbg.
To evaluate the performance of HyperDbg on conditional breakpoints, we set one on the frequently used nt!ExAllocatePoolWi-thTag function, checking whether the RAX register contains a specific value.As performance metric, we count the number of times the condition was checked within 5 minutes both for HyperDbg and WinDbg.To mitigate timing noise on our results, we repeat the experiments for  = 50 times.As the baseline of the performance, WinDbg checks on average 6,941 conditions.At the same time, HyperDbg checks on average 9,153,731 conditions with its classic implementation of EPT Hook and checks on average 23,214,792 conditions with its detours style EPT Hook.
These result show that HyperDbg's script engine achieves 1,319 and 3,345 average fold speedups compared to WinDbg in classic EPT Hook and detours EPT Hooks, respectively.This significant speed gain comes from the fact that based on the design, HyperDbg checks and evaluates scripts directly in the kernel and VMX-root mode and does not need the assistance of the user-mode for this end.Thus, in contrast to WinDbg, nothing is transferred to the debugger during the execution of the script.
This difference is also visible in the system's overall performance during the execution of the benchmarks on the debugger.In WinDbg, the system slows down to the point that it seems the system has come to a halt since not even the most basic computations, such as cursor movements are properly processed.While in HyperDbg's case, even though the performance of the system is still slow, it's usable.Therefore, other tasks could still be normally performed on the system, which allows alteration and addition of new conditional breakpoints while the test is performed.

Performance
Analysis of Scenario 3: Analyzing Syscalls.Setting breakpoints on syscalls is another scenario that can be used for evaluation of the performance of HyperDbg.
Generally speaking, it is not possible to set a breakpoint on syscall-handler routines in other debuggers like WinDbg.However, it is possible to trace system calls by setting breakpoints on functions responsible for dispatching the SYSCALL numbers.In HyperDbg, it is possible to set breakpoints on syscall-handler routines and to emulate system calls.For the performance evaluation, we perform  = 50 experiments each lasting 300 seconds.WinDbg executes 2,559 syscalls, while at the same time HyperDbg executes 5,166,430.Hence, HyperDbg is on average ∼2018x times more efficient than WinDbg in tracing syscall routines.

APPLICATIONS
With the privileged access level and the newly-presented APIs, HyperDbg can be used in many applications.

Debugging Devices
HyperDbg supports the general functionality of any other debuggers, i.e., pausing and stepping through the instructions, read/write on memory, read and modify registers, and putting breakpoints anywhere in the program.Plus, it has many other creative events to ease the debugging process.
One of the unique capabilities of HyperDbg is its ability to debug the communications of the system with external devices.The user can monitor each x86 I/O port separately for port mapped I/O (PMIO) devices and use EPT to monitor Memory Mapped I/O (MMIO) devices.Since I/O instructions and EPT modifications are treated as events in HyperDbg, the user is able to monitor the executions of IN and OUT instructions and create separate logs.Moreover, it is also possible to modify the registers in the script engine and therefore, delivering the modified values to the operating system.In addition to debugging Port Mapped I/O and Memory Mapped I/O, HyperDbg is also capable of notifying the user about the interrupts from external devices.For example, HyperDbg can be configured to intercept any particular interrupt from an external device (e.g., a PS/2 keyboard) and allow the user to halt Windows to investigate the device in case it occurs, or simply ignore the interrupt and allow the operating system to continue normally.

Fuzzing
One of the main problems of kernel fuzzing is the fact that every invalid value causes a kernel error and thus a BSOD.HyperDbg can avoid these errors by handling them even before the OS is notified and help fuzzing (e.g., by measuring the code coverage).As HyperDbg resides on a more privileged ring than the kernel, it can intercept code-level exceptions that lead to the crash of the system (application/OS) (e.g., page faults or division by zero) and discard the crash before calling the OS error handling routines.
HyperDbg's script engine provides the possibility to execute brute-force tests for a target program using simple scripts.For instance, one can re-execute a target function arbitrarily often, each time with different parameters.
The proposed instrumentation step-in procedure in HyperDbg forces the system to only run the specific process without switching to other processes.Consequently, the CPU only executes the targeted codes during the fuzzing process and returns the program flow to the initial state of fuzzing if any crash appears.Using the script engine, it is then possible to prepare the CPU for the next stage of fuzzing with new parameters in an entirely automated mechanism.This method results in a fine-grained approach to fuzz both user and kernel programs with high-performance execution.

Malware Analysis
As another essential application, HyperDbg features a transparent debugging tool that can be used for evasive malware analysis.Given the unique toolset of HyperDbg, online malware analysis is armed with a high-performance run-time script-engine, which makes the process effective and substantially faster.We describe a simple and transparent syscall malware analysis using HyperDbg in Section 7.1.3.In the following, we survey the applicability of HyperDbg in a Windows vulnerability.

Analysis of a Vulnerability:
A Case Study.During our experiments, we rediscover a full-kernel mode Bootkit known as Pitou [14], to which the latest Windows versions are still vulnerable.We briefly describe Pitou as a case study analyzed by HyperDbg.
Pitou is able to attack the victim system by bypassing the user access control and performing privilege escalation, which enables it to infect the Master Boot Record (MBR).This allows it to inject its kernel payload at the time of Windows startup without facing any resistance from Kernel Mode Code Signing (KMCS) policy.Pitou is then able to take control of the lowest level components of the OS (e.g., Windows network driver -NDIS) and utilize VM-level code that is not executable natively on Windows to obfuscate itself from conventional disassemblers, which makes it much more difficult to analyze it.To the date of writing, it is still able to infect the systems running the latest version of Microsoft Windows with a 0-day local privilege escalation.
Pitou also employs advanced anti-debugging and anti-sandboxing techniques that look for any traces of the execution in a non-native execution environment by performing inspections on Windows registry, kernel modules, disk devices, BIOS memory, and measurement of CPU ticks using RDTSC.These methods have been shown to be updated by the creator of the malware over time.In our tests, the malware detects the debugger environment with some of the most well-known and widely-used debuggers like WinDbg, x64dbg, and Ollydbg [76].It deviates from its normal behavior on every other debugger.However, with HyperDbg's transparent mode, we successfully execute the malware and perform an extensive dynamic analysis to reverse-engineer its execution flow.

Digital Forensics & Incident Response (DFIR).
HyperDbg can be used extensively in the DFIR to detect signs of attacks.For instance, the script engine of HyperDbg can be utilized for developing a pre-built plugin to monitor the top abused APIs/syscalls under user-specified conditions and on any subset of the processes (e.g., critical system processes only), allowing the inspector to adjust between the conciseness and thoroughness of the logs based on their preference.Additionally, HyperDbg is capable of classifying the APIs into different categories of attacks (e.g., code injection, keylogging, or discovery) and transmitting the results over TCP/Named Pipe/File using Event Forwarding.

Attempt to Exploit Detection.
HyperDbg can be used to detect many exploitation techniques.Often, exploits modify a special structure as the final payload, such as the token of a process [17,46].
In the above example, HyperDbg can be used to monitor TOKEN structure and detect any access (or more precisely, any write) to this structure.After that, this abnormal behavior can be traced back to reach the initial phase of exploit and reveal its method.

Software Performance Analysis
HyperDbg can be used for performance and security analysis in software development and testing.For example, the highly optimized methods available in HyperDbg can be utilized for intercepting events such as page faults, with marginally superior performance compared to alternative means and methods used in an user-mode analysis tool [7,8,42].
HyperDbg can detect page faults in both the operating system and applications.In previous works, Shadow Paging, Page Tracking, and Pseudo-paging methods were used to detect page-faults [42].Detecting page-faults is beneficial in the evaluation of applications that opt to improve their performance by minimizing the frequency of page faults.HyperDbg can detect page faults by exploiting exception bitmaps and providing it as an event.Using this method, HyperDbg can provide the exact address of fault area (CR2) for further investigations.This method is transparent to the operating system and does not change its semantics.

RELATED WORK
Developing a debugger and low-level software analyzer has been regarded as a crucial topic for the computer community due to its impact and applicability in a wide range of scientific research and industrial products.The implications can be generally categorized into two main groups: 1) Hardware-level malware analysis and 2) System isolation, monitoring, and sandboxing.
Over the past decade, many researchers have proposed several debugging methods based on the ring -1 (sub-OS level) infrastructure to address these issues.However, in terms of transparency level, applicability, performance, and generalization, these tools fail to present a suitable solution for the community.HyperDbg as an open-sourced and general hardware-assisted debugger that aims to provide researchers and computer engineers with a tool to help resolve the aforementioned issues.
Hardware-level malware analysis.Malware developers have managed to develop many strategies and techniques that allow them to escape from almost every form of detection methodology, including virtualization, debugging, and emulation techniques.Anti-debugging and anti-virtualization techniques used in early malware [12] employ numerous evasion methods to hide or reduce malicious activities.These anti-detection methods are analyzed comprehensively in the recent study by Galloro et al. [34] where over 92 classes of evasive techniques executed by modern malware.
Furthermore, hardware-based artifacts such as processor's cache actualization [80], scheduling leakage in simultaneous multithreading (SMT) [63], as well as timing side effects [77] of monitoring facilities can be observed by evasive malware.
As thoroughly discussed by Garfinkel et al. [36], achieving full transparency against malware running in a virtualized environment is extremely challenging.Considering all of the issues, previous work proposed frameworks such as Apate [87] to hide debugging procedures from malware.Likewise, other work proposes resilient malware detectors against evasive malware using hardware features [45,90].Leon et al. [54] study the possibility of utilizing hypervisors to detect, deactivate and analyze evasive malware by employing low-level processor features.
Unlike previous solutions, which merely focus on transparency rather than functionality, our method in HyperDbg to approach malware analysis provides a richly equipped debugging facility by pushing the deployment of more complex functionalities deeper into the hypervisor.This approach not only provides transparency but gains significant performance, as well as rich functionality all together in a singular framework making HyperDbg applicable for real-world malware analysis.
System isolation, tracing, and sandboxing.Due to the increasing complexity of the malware evasion techniques, researchers have recently evolved the environment from VM-based sandboxes such as CWSandbox [95] and Cuckoo sandbox [33] to Bare-Metal sandboxes like BareBox [50], and BareCloud [51] to minimize the leakage of the virtualization environment.Pioneered by Ether [27] as the first hypervisor-based analyzer with more transparency level, Malt [102] and Ninja [73] target Intel's SMM and Arm's TrustZone to present hardware-level debugging and process tracing as well as sandboxing primarily aiming at malware debugging.Although transparent to some level, all these works provide simple functionalities and low-speed tracing, making them unsuitable for deep and dynamic code analysis.HyperDbg addresses these shortcomings by providing real-time user-specified debugging functionalities using VMX-based script-engine.Furthermore, even though the hardware overhead is negligible for most previous solutions, the total debugging execution flow is prolonged due to the continuous ring transportation to perform dynamic code analysis.This drawback is fundamentally solved in HyperDbg 's design.
Though designed as debugger, HyperDbg delivers high-level transparency for low-level sandboxing and isolation.Moreover, its architectural design and VMX-enabled script engine provide an accurate and fast process tracking of arbitrary binaries.
Feature comparison among existing debuggers A Comprehensive feature comparison is given in Table 5 in Appendix D.

DISCUSSIONS AND LIMITATIONS
Transparency Several timing attacks for detecting HyperDbg are impeded.Still, by relying on external timing resources such as NTP, a binary can leverage measurement methods beyond the domain of the local system to detect HyperDbg's timing adjustments.Nonetheless, these methods for debugger detection can potentially be mitigated in a case-specific analysis of a binary; however, the development of a general solution for this set of approaches is considered out of scope in this paper.
Stability Direct emulation of system-wide general-purpose mechanisms (such as modifying MSRs, cf.Section 6.2) might interfere with the normal functionality of other applications that use timing measurements.Thus, we recommend using the second method (e.g., emulation of RDTSC and RDTSCP instructions) for already up and running test-bed environments.Furthermore, the Barebox-based restoration process requires manual restorations in some scenarios where Windows uses Asynchronous Procedure Calls (APC) as the inter-processor communication is disabled in such test-cases [16].We plan to develop a general testing framework and release it publicly as part of the HyperDbg project.
Performance Although faster than similar debuggers, our experiments show cases where HyperDbg faces slowdowns du to an excessive number of VM-exits.To minimize the overhead, unnecessary VM-exits can be reduced, and emulation of the system at the early stage of VM-exits can be avoided to improve the overall performance of the system.For instance, the user might specify a specific core to apply the events.
Future Works In future releases, we intend to make a UEFIbased module for HyperDbg, which permits HyperDbg to run compatibly with Windows Virtualization-based Security (VBS).We also plan to add support for nested virtualization in HyperDbg to support the execution of nested guest VMs inside the virtualized environment of HyperDbg.

CONCLUSION
With the expanding hardware support in modern processors, it is now more than ever crucial to employ hardware-assisted techniques in software debugging.Common software debugging solutions rely on traditional OS-dependent APIs for code functionality analysis, vulnerabilities detection, and reverse engineering.Modern packed software and evasive malware employ sophisticated anti-debugging methods to hide their primary functionalities and withstand reverse engineering attempts on the extensively used debugging solutions.This paper presents HyperDbg, an open-source hypervisor-level debugging tool with transparency and performance in mind.Hy-perDbg exploits Intel VT-x and Intel EPT to present multiple new debugging modules, useful for fuzzing, malware analysis, and reverse engineering.We propose a novel VMX-level script engine in HyperDbg's core which gives an unmatched debugging performance useful for software fuzzing as user-mode to kernel-mode (and vice versa) transfer is entirely avoided.Our evaluation shows a high level of stealth code analysis against malware classes and unprecedented performance in terms of debugging functionality among other available kernel debuggers.Finally, HyperDbg is designed modular and scalable for convenient usage in both academia and industry.

B PROCESS/PROCESSOR/EXECUTION MODE SWITCH
In this section, we describe the architecture related to switching between processes, processors, and different modes of execution in HyperDbg.
B.1 Detecting Execution-mode Changes (Kernel-mode to User-mode) Detecting changes to the operating mode is performed via the same mechanism used in the i command in HyperDbg.The proper method for implementing this functionality would be checking the CS register, fetching GDT, and checking the Long Mode flag.However, since the CS for wow64 and native code is set to a constant value across all versions of Windows, the CS register check is sufficient for the determination of a mode switch.

B.2 Switching to a New Processor/Process
HyperDbg uses a straightforward mechanism to switch between cores.Each core has its own spinlock to wait on VMX-root mode.By unlocking the spinlock assigned to the new core and setting the spinlock of the current core, it is possible to enter a waiting state.Consequently, HyperDbg calls the command handler from the new core, making the new core responsible for getting commands.Note that HyperDbg is designed to have a single core for getting commands at any given time.Moreover, switching to a new process is performed by monitoring changes to the CR3 register.Each time Windows changes the memory layout of any process, the CR3 is changed, and HyperDbg checks whether or not Windows has switched to the target process.If the memory layout is changed and the target process is now on the execution stage of the Windows, HyperDbg halts the debuggee again and waits for the commands from the debugger.

B.3 Getting Debugging Events: #BPs and #DBs
HyperDbg uses the exception bitmap of VMCS to get notified of breakpoints (#BP) and Debug Breakpoints (#DB) to halt the other cores.HyperDbg is the first debugger capable of being notified about the debugging event which means that HyperDbg is notified even earlier than the operating system.Consequently, we design the system not to notify user-mode application or kernel-mode (OS) entities regarding the debugging events.So, all the breakpoints events are handled by HyperDbg.

B.4 Spinning on Spinlocks
Spinning the cores in HyperDbg is considered as a primary technique in its functionalities.We study the challenges in this context.Suppose a function requires a spinlock (e.g. it is merely a buffer which is to be accessed) in a single-core processor.The function raises the IRQL to DISPATCH_LEVEL.Here, the Windows Scheduler can not interrupt the function until it releases the spinlock and lowers the IRQL to PASSIVE_LEVEL or APC_LEVEL.If during the execution of the function, a VM-exit occurs, the operation mode is moved into the VMX-root.(It can be interpreted that a VM-exit happens similar to a HIGH_IRQL interrupt.) In the case where a user accesses the buffer in the VMX-root mode, two scenarios are possible: • The first scenario is to wait on a spinlock that was previously acquired by a thread in the VMX non-root mode.In such scenario, a deadlock occurs and spins forever.• Alternatively, it is also possible to enter the function without looking at the lock (while there is another thread that enters the function at the same time), which would result in a corrupted buffer and invalid data.Windows also imposes another limitation, as cores must not wait on a spinlock when IRQL is higher than DISPATCH_LEVEL.This lies in fact that Windows raises the IRQL to (DISPATCH_LEVEL) 2, when a spinlock is acquired.In this case, Windows performs the workload, releases the spinlock and lowers IRQL back afterwards.
Looking at corresponding Windows functions (e.g KeAcquireSpin-Lock and KeReleaseSpinLock), the IRQL arguments are given as input.Windows saves the current IRQL to the parameter supplied by the user in KeAcquireSpinLockand then it raises the IRQL to DISPATCH_LEVEL.After the function is finished with the shared data, it calls KeReleaseSpinLock and passes the old IRQL parameter to the function.Finally, it unsets the bit and restore the old IRQL (lowering the IRQL).
Unfortunately, Windows spinlocks employs IRQLs which do not make sense when VMX-root mode is in action.This makes it very complicated to use such functions in this mode.Hence, to implement spinlock for HyperDbg functionalities such as multicore message tracing, we design a custom VMX-root compatible spinlock.

B.5 MTFs Disadvantages
By setting the monitor trap flag, it is not necessarily guaranteed that the next instruction is the targeted instruction.In this case, if the upcoming instruction is a sudden interrupt from the CPU, the next targeted instruction in the debugging program would not be executed since the interrupt handler instructions are executed first.One way to address this issue is to set a VM-exit on exceptions (Exception Bitmap) and external-interrupts.However, this resolution is not optimal as it might causes system inconsistency by blocking interrupts.HyperDbg is able to resolve this issue using an instrumentation stepping process.
The following Listing illustrates the set/unset of MTF in an execution sequence.Typically, there are two scenarios in which the kernel debugger is paused.A breakpoint is triggered either by a break request from an event or the script engine.In this context, if the user is in the kernel-mode, a VMCALL occurs and the future chain of events are handled accordingly.If the user is already in VMX-root mode, other cores should be notified to prevent a system-level crash.Operating in VMX-root mode is similar to HIGH_IRQL.In VMX-root mode, all the interrupts are masked because of RFLAGS' IF Bit.
The other scenario is upon the request from the user (for instance, an interruption by pressing CTRL+C).There, a packet is sent to pause the debugger.In this method, the debugger processes the packet in user mode, invoking an IOCTL, executing a VMCALL which transfers from the kernel-mode to the VMX-root.

B.7 Continuation a Single Core
One of the exclusive features of HyperDbg is to keep execution (continuation) on one core while other cores are in a halt-state.We used this mechanism in our instrumentation step-in command to guarantee that no other cores (threads) get the chance to be executed.The fundamental basis of this mechanism is to ensure that the target core is not interrupted during the debugging.
There are two approaches that HyperDbg uses to prevent a target core from getting interrupted (e.g. by clock interrupt or keyboard interrupts).
• First, the RFLAGS.IF bit of the guest can be unset so the interrupts are masked.• Second, the PIN Based External-Interrupt Exiting bit can be set so all of the external interrupts would cause VM-exits; thus, allowing the interrupts to be simply ignored in the VM-exit handler The former method is faster and avoids unnecessary VM-exits.However, for several considerations described in following, the second method is preferred in HyperDbg.
In contrast to the method described in the first approach, it is much safer not to change the guest's registers.As an example, if a page-fault, SYSCALL, or an invalid operation such as divisionby-zero occurs, the execution is directed to the kernel and guest's RFLAGS are saved by the processor.Therefore, extra operations are required to locate the user-mode RFLAGS (search in stack for exceptions and in R11 Register for SYSCALLs), because the RFLAGS that was previously saved in user-mode is with IF bit disabled.If this specific task is ignored, RFLAGS are restored without checking for IF bit every time the guest continues and performs a context-switch.In this case, by unsetting this bit from hypervisor, the core becomes uninterruptible as the OS cannot get the execution again (e.g, using clock interrupt).Consequently, after a delayed bug check, Windows realizes the target core behaves abnormally and returns an error.Moreover, changing the guest's RFLAGS is also incompatible with instructions like CLI and STI.More importantly, considering the side effects, the guest is able to detect the tampering of HyperDbg using PUSHF function and check for IF bit in RFLAGS.
All of the issues investigated regarding RFLAGS changing, in addition to the fact that using PIN Based External-Interrupt Exiting bit is completely transparent from the kernel-mode and user-mode, has lead us to employ the second method in HyperDbg.
C FULL DESCRIPTIONS OF THE USED TERMS C.1 VM-entries, VM-exits (VMX-root and VMX non-root) VT-x introduces two new modes of operations: VMX-root operation, and VMX non-root operation.A software running on VMX-root mode has higher privileges and has access to certain instructions that are not available in VMX non-root operation, regardless of the privilege level [81].The core of HyperDbg runs mainly in the VMX-root mode, while guests (operating system's kernel, and applications) are executed in VMX non-root.With definition of these two modes of operation, VT-x consequently defines two new transitions: one being VM-entry, which is a transition from the root operation to guest non-root operation, and the other VM-exit, which performs the opposite.

C.2 Virtual Machine Control Structure (VMCS)
To control the guest features, we have to set some properties in the Virtual Machine Control Structure (VMCS).The VMCS is a hardware-defined structure that controls the behavior and settings of each guest virtual machine (VM).Such a data structure exists running on the processors.Processes running at a higher IRQL will preempt a thread or interrupt running at a lower IRQL [18].
The below list shows different routines and the corresponding IRQL that these routines are running on:

E HYPERDBG'S SCRIPT ENGINE
1 !s y s c a l l 0 x 5 5 p i d 0 x 1 4 c 0 s c r i p t { 2 i f ( @rcx == 0 x 2 7 && @rdx == 0 x 4 7 ) { 3 p r i n t f ( " S y s c a l l t r i g g e r e d : %x i n p r o c e s s i d : %x \ nThe t h i r d param : % l l x \ nThe f o u r t h param : % l l x \ n " , @rax , $ p i d , @r8 , @r9 ) ; In the above example, a syscall event is configured to trigger exclusively for syscalls specific to the process (pid = 0x14c0), which will execute the target script in VMX-root mode.
Considering that Windows uses fastcall calling convention for its syscalls, we know the registers stored in RCX, RDX, R8, R9, and stack.In the example script, it is checked if the first parameter to the syscall (RCX register) is equal to 0x27 and the second parameter (RDX register) is equal to 0x47.If these conditions are met, a message is printed, which generates a log from the 3rd (R8 register) and 4th (R9 register) parameters.At last, the pause() function is used to pause the debuggee and give the control to the debugger.

F HARDWARE FEATURES ON ARM AND AMD
ARM processors also contain virtualization extensions that provide hardware means for hypervisors to virtualize the CPU, permitting multiple OSes to be run on the same machine.ARM processors support the Second Level Address Translation (SLAT), which is known as Stage-2 page tables provided by a Stage-2 MMU.Similarly, AMD supports virtualization through AMD-v technology.SLAT implementation in AMD processors is through the Rapid Virtualization Indexing (RVI), or Nested Page Tables (NPT) technology.
With the similar approach it is most likely possible to investigate and implement the same methodologies presented in this work for AMD or ARM-based hardware-assisted debuggers.

G APPLICATION: REVERSE ENGINEERING G.1 Automatic Symbol Reconstruction
One of the main goals of HyperDbg is to provide a reverse engineering and dynamic binary analysing tool.Here, we describe a simple example of reverse-engineering an application.HyperDbg provides a functionality that maps a virtual memory to a C/C++ data type (e.g., enums, and structures).This is extremely useful, as OS-level PDB linkers can dynamically be translated to structures for testing and reverse engineering.For instance, a user can exactly detect the location of values in an specific function used by OS process and modify it accordingly.This is easily done using simple scripting.The following listing shows how memory content of target process can be easily delivered by structured representation.For instance, here _PROCESS_CREATION_INFO is de-referenced and could be modified.

Figure 1 :
Figure 1: High-level overview of HyperDbg's sub-systems and execution flow The i command Instrumentation Stepping Approach in HyperDbg

;
Use I n t e l TSX t o s u p p r e s s any ; page − f a u l t i n VMX− r o o t mode XBEGIN $ + x x x ; End o f TSX MOV RAX , Dword PTR : [ RCX ] ; A c c e s s t h e t a r g e t memory a d d r e s s , XEND ; End o f TSX MOV RAX , 1

Figure 3 :
Figure 3: The overall view of the communication in Hyper-Dbg

rdtscp ; g e
t t h e c u r r e n t t i m e c l o c k cpuid ; E x e c u t e a s e r i a l i z a t i o n i n s t r u c t i o n (VM− e x i t ) rdtscp ; D e l t a Timi ng Listing 2: The timing measurement code by forcing VM-exit

Figure 7 :
Figure 7: Syscalls executed in a malware using HyperDbg and WinDbg

Listing 3 :
/ * S e t t h e m o n i t o r t r a p f l a g * / v o i d H v S e t M o n i t o r T r a p F l a g ( BOOLEAN S e t ) { u n s i g n e d l o n g CpuBasedVmExecControls = 0 ; / / Read t h e p r e v i o u s f l a g __vmx_vmread ( MTF Set/Unset in an example execution sequence B.6 Debugger Pausing

•
DIRQL : Interrupt Service Routines (ISRs) of hardware and external devices • DISPATCH_LEVEL : Scheduler, DPCs, and codes protected by a spinlock • APC_LEVEL : Asynchronous Procedure Calls (APC) routines • PASSIVE_LEVEL : User code, dispatch routines, and PnP routines D FEATURE COMPARISON AMONG EXISTING DEBUGGERS

4 +0x05b7 u i n t 8
_ t P r i o r i t y C l a s s : 0x2 5 +0x05b8 v o i d * S e c u r i t y P o r t : (null) 6 +0x05c0 _SE_AUDIT_PROCESS_CREATION_INFO . . .: ffff948c'c25fa2a0 7 +0x05c8 _LIST_ENTRY J o b L i n k s . . .: [ 00000000'00000000 -00000000'00000000 ] Listing 5: Converting PDB references to structures for reverse engineering

Table 1 :
Anti-Debugging and Anti-VM exercises and mitigation in HyperDbg

Table 3 in
Appendix A).Additionally, while

Table 2 :
Evaluation and comparison of HyperDbg for integrated software via packers/protectors

Table 3 :
The list of the supported events in HyperDbg

Table 5
illustrates a comparison among the existing debuggers and HyperDbg.