Satisfiability Attack-Resilient Camouflaged Multiple Multivariable Logic-in-Memory Exploiting 3D NAND Flash Array

Logic-in-memory implementations have attracted significant attention recently for energy efficient in-situ processing of big data in this era of IoT. However, the emerging memory technologies such as RRAMs, PCMs, STT-MRAMs, etc. are still immature and exhibit significant spatial and temporal variations limiting the yield and the size of crossbar arrays available for implementing logic functions. Considering the technological maturity, ultra-high density and ultra-low cost of 3D NAND flash memory, in this work, we have proposed a novel methodology to exploit 3D NAND flash memory for realizing any logic function in sum-of-product form (SOP) with ≤177 literals/inputs and $\le 2^{14}$ minterms parallelly. Moreover, all the logic functions realized using the proposed technique appear same at the layout level rendering the logic-in-memory implementation utilizing the 3D NAND flash memory an innate camouflaging property and an inherent immunity against security vulnerabilities in the semiconductor supply chain. We have also evaluated the resiliency of the proposed technique against reverse engineering attacks such as SAT attacks, ATPG attacks and brute force attacks on ISCAS’85 and ISCAS’89 benchmark circuits. Our results indicate that the proposed logic-in-memory implementation facilitates complete obfuscation of the logic function without introducing any area overhead and exhibits a strong resiliency against reverse engineering.


Satisfiability Attack-Resilient Camouflaged Multiple
Multivariable Logic-in-Memory Exploiting 3D NAND Flash Array Bhogi Satya Swaroop , Member, IEEE, Ayush Saxena , and Shubham Sahay , Senior Member, IEEE Abstract-Logic-in-memory implementations have attracted significant attention recently for energy efficient in-situ processing of big data in this era of IoT.However, the emerging memory technologies such as RRAMs, PCMs, STT-MRAMs, etc. are still immature and exhibit significant spatial and temporal variations limiting the yield and the size of crossbar arrays available for implementing logic functions.Considering the technological maturity, ultra-high density and ultra-low cost of 3D NAND flash memory, in this work, we have proposed a novel methodology to exploit 3D NAND flash memory for realizing any logic function in sum-of-product form (SOP) with ≤177 literals/inputs and ≤2 14 minterms parallelly.Moreover, all the logic functions realized using the proposed technique appear same at the layout level rendering the logic-in-memory implementation utilizing the 3D NAND flash memory an innate camouflaging property and an inherent immunity against security vulnerabilities in the semiconductor supply chain.We have also evaluated the resiliency of the proposed technique against reverse engineering attacks such as SAT attacks, ATPG attacks and brute force attacks on ISCAS'85 and ISCAS'89 benchmark circuits.Our results indicate that the proposed logic-in-memory implementation facilitates complete obfuscation of the logic function without introducing any area overhead and exhibits a strong resiliency against reverse engineering.Index Terms-3D NAND flash, logic-in-memory, reverse engineering, IC camouflaging, SAT attack.

I. INTRODUCTION
T HE unprecedented growth in the interconnected cyber-physical systems in this era of Internet-of-things (IoT) and rapid advancements in emerging sectors such as social media, personalized health care, finance, online education, etc. have led to a surge in the asynchronous data.Analyzing and processing such Big-data with high energy-efficiency is a challenge for the conventional von-Neumann computing systems.The physical isolation of the memory and processing units operating at different speed degrades the performance of the von-Neumann computing primitives owing to the large data transfer.This von-Neumann bottleneck not only increases the energy dissipation but also reduces the computing speed.
Recently, compact and highly energy-efficient processingin-memory primitives exploiting co-located storage and computational blocks were introduced to minimize the data transfer and circumvent this von-Neumann bottleneck.Since logic gates are used in almost every aspect of processing and computation, implementation of logic gates within the memory block may accelerate the computations significantly while drastically reducing the energy consumption and area.

A. Prior Logic-in-Memory Implementations
Recently, logic-in-memory implementations exploiting volatile SRAM [1] and DRAM [2] cells were proposed.While a NOR gate was realized with inputs stored in two 8T SRAM cells, the NAND gate implementation requires time-constrained read operation and skewed inverters [1].Moreover, an IMPLY logic (IMP) gate was also realized using a voltage divider implementation exploiting 8T SRAM cell with additional voltage sources and skewed inverters [1].Furthermore, an 8 + T SRAM cell (9 transistors) with appropriately sized sense amplifier-based NAND and NOR logic gates were also realized [1].Although these universal gate implementations appear promising, the complex logic gates such as XOR are derived from these basic gates which results in an increased latency and energy dissipation and reduced throughput.Furthermore, utilization of 8T and 8 + T SRAM cells, skewed inverters and additional voltage sources lead to a large area overhead and complex routing.
Moreover, a bitwise OR and AND logic was implemented in [2] with inputs stored in two rows of 1T-1C DRAM array.However, the output of these logic operations overwrites the input data stored in DRAM cells necessitating redundant storage of input data leading to an increase in the area and energy dissipation.Furthermore, the input/output data cannot be retrieved after power-off since SRAM and DRAM are volatile memories.
Recently, several in-memory logic implementations based on compact emerging non-volatile memories (NVM) such as memristors [3], [4], RRAM [5], PCM [6] etc. were proposed.Although the standby power dissipation (which dominates the energy landscape for CMOS designs) is eliminated in NVM based logic implementations, they can only support limited fan-ins (typically 2) per stage.Moreover, the emerging NVM technology is not mature and exhibits large spatial and temporal variations, complex fabrication process and limited yield despite the recent developments in the integration of emerging NVMs with the CMOS technology [3], [4], [5], [6].
Furthermore, a 2D charge trap flash cell-based logic-inmemory implementation was recently proposed in [7].Different 2-input logic gates were realized in [7] utilizing a sequential 3-step algorithm which significantly reduces the speed.Furthermore, a novel reverse read technique was utilized for logic evaluation which is not compatible with the conventional read procedure of 3D NAND flash array.

B. 3D NAND Flash Memory
Recently, 3D NAND flash memory has emerged as the mainstream technology for data storage ranging from small USB drives to solid state drives in data warehouses and gigantic cloud storage.Considering the high technological maturity, ultra-high density and ultra-low cost of the 3D NAND flash memory [8], it becomes imperative to analyze their potential for logic-in-memory implementations.Although 3D NAND flash memory array has been exploited for implementation of in-memory vector-by-matrix multiplication accelerators [9], [10], [11], [12], [13], to the best of our knowledge, their application for logic implementation has not been explored so far.
To this end, in this work, we have proposed a methodology to implement logic functions within the mature and ultra-dense commercial 3D NAND flash memory array.Any Boolean logic in sum-of-products (SOP) form can be implemented by encoding inputs as bit line (BL) voltages and threshold voltages of the 3D NAND flash cells in the strings and sensing the output accumulated on the input capacitance of the sense amplifier.Such an implementation enables realization of logic functions with ≤177 literals/inputs and ≤2 14 minterms parallelly considering the state-of-art 3D NAND flash memory array with 176-word line (WL) layers and page size of 16 KB.This is highly advantageous since realizing such high fan-in logic using CMOS technology not only requires large number of complimentary MOSFETS and multi-stage design but also results in large latency, area overhead, high power dissipation and routing complexity.Moreover, the throughput of the proposed implementation is significantly high as compared to the logic gate implementations utilizing other NVMs [3], [4], [5], [6].Using an experimentally calibrated behavioral compact model for 3D NAND flash memory [14], with the aid of HSPICE simulations, we have implemented the basic logic gates (AND, OR, NAND, NOR, XOR and XNOR) and performed extensive analysis of their delay and energy dissipation.Furthermore, a novel systemlevel reconfigurable architecture for fast and energy-efficient implementation of logic functions using pre-programmed 3D NAND flash cells and a content addressable memory (CAM) has also been proposed.Although standalone CAMs utilizing flash memory cells have already been demonstrated [30], [31] the application of a 3D NAND flash-based CAM for searching the pre-programmed logic planes to perform fast and energy-efficient logic-in-memory computations has been proposed for the first time.

C. Reverse Engineering and IC Camouflaging
Moreover, the integrated circuit (IC) manufacturing process involves a large number of steps including chip design, manufacturing, testing, packaging and supply of final product.The entire supply chain is dispersed across the globe to reduce cost and design time.Furthermore, due to the economic considerations while sustaining the foundries, many semiconductor giants have gone fabless increasing their dependence on the global supply chain.Recently, the semiconductor supply chain has been exposed to several security vulnerabilities such as reverse engineering, IC piracy, IP piracy, trojan insertion, counterfeiting, etc. due to the involvement of malicious parties in the supply chain.Reverse engineering (RE) [15], [16] involves de-packaging the IC and delayering and imaging it layer by layer to extract the gate level netlist and is considered as the most serious threat to the semiconductor industry since the design technology and IP used in the IC can be identified and the functionality may be inferred.Once the design is identified, the malicious attacker may initiate counterfeiting or manufacturing the same product and supplying at a reduced cost.
Recently, several techniques such as IC camouflaging, logic locking, split manufacturing, etc. have been proposed to thwart the RE attacks.IC camouflaging [17] is a hardware obfuscation technique where the designers introduce dummy elements/contacts in different logic gates to realize the same physical layout.Although addition of dummy contacts yields similar layout for NAND, NOR, and XOR gates in [18] which may deceive the reverse engineer, it also leads to an increased area overhead.
The proposed logic-in-memory technique using ultra-dense 3D NAND flash memory array exhibits an innate camouflaging property whereby all the realized logic functions with ≤177 literals/inputs and ≤2 14 minterms appear similar at the layout level.Hence, it provides an inherent strong resilience against the reverse engineering techniques such as SAT attacks, ATPG attacks and brute force attacks without additional area overhead.Moreover, compared to the prior IC camouflaging techniques which may camouflage only NAND/NOR/AND/OR gates, the proposed implementation provides efficient means to camouflage all the logic functions and is less vulnerable to the SAT attacks performed on ISCAS'85 [19] and ISCAS'89 [20] benchmark circuits.
The manuscript is organized as follows: the structure and operating principle of 3D NAND flash memory is described in section II.The logic-in-memory implementation methodology based on 3D NAND flash array is discussed in section III and the performance metrics are evaluated in section IV.The reconfigurable system-level architecture to further enhance the performance of logic-in-memory implementation using 3D NAND flash array is described in section V.The camouflaging property of the proposed 3D NAND flash-based logic-inmemory implementation and its resiliency against the SAT attack, ATPG attack and brute-force attack is analyzed in

TABLE I PARAMETERS USED FOR 3D NAND FLASH MEMORY ARRAY
sections VI, respectively and the conclusions are drawn in section VII.

II. STRUCTURE AND OPERATING PRINCIPLE
The three-dimensional view of 3D NAND Flash memory array [22] is as shown in Fig. 1(a).The cylindrical pillars (strings) consist of Macaroni body, vertical channel charge-trap (CT) devices with oxide/nitride/oxide (O/N/O) in gate stack.The Flash cells are located at the intersection of cylindrical pillar and word line (WL) plates.Bit lines (BL) and Bit selector transistor lines (BSL) are used to access a particular string and WL is used to select a particular flash cell in that string.All the strings share the same common source line (CSL).The exact dimensions of 3D NAND flash cell shown in Fig. 1 are given in Table I.
The schematic view of flash cell array considering the behavioral compact modeling approach used in [14] is shown in Fig. 2. The BL and BSL are orthogonal to each other and are used for selecting a particular string within the array.WL is implemented as a metal plate to select a particular layer of the array.Each TLC flash cell considered in this work can exhibit eight different threshold voltages.The threshold voltage of each cell can be programmed to the desired state using incremental step pulse programming/erase (ISPP/E) technique [23].
To implement any Boolean logic, the logic function is first converted into its SOP form and the inputs are applied as bit line voltages and threshold voltages of the flash cells.A high threshold voltage corresponding to the extreme programmed state (111) is treated as logic '0' and a low threshold voltage corresponding to the erased state (000) is treated as logic '1'.A voltage of 0.5 V on BL represents logic '1', and the BL is grounded to implement logic '0'.One literal of a minterm is applied on the BL and the remaining literals of that minterm are encoded as threshold voltages of the flash cells in that string as shown in Fig. 2. Therefore, one string is used to encode one minterm of the Boolean logic expressed in the SOP form.To perform computation, a read voltage (V read = 2.5 V) is applied to the WL of the flash cells encoding the inputs, and a pass voltage (V pass = 8 V) is applied to the remaining WLs to reduce the series resistance of the flash cells along the string.The current flowing through the string depends on the applied bit line voltage and the threshold voltage state of the programmed flash cells and encodes the output as the voltage accumulated on the input capacitance of the sense amplifier (C CSL ).A high voltage (0.5 V) on the C CSL is considered as output logic '1' and a low voltage (0 V) on the C CSL is considered as output logic '0'.The voltage on the C CSL can then be fed as the input of the subsequent stages for cascading of logic gates.

III. LOGIC-IN-MEMORY IMPLEMENTATION
For proof-of-concept demonstration of logic-in-memory implementation utilizing 3D NAND Flash memory array, we have designed two-input universal gates like NAND and NOR and other useful logic functions like OR, AND, XOR and XNOR using the proposed approach.The compact form of these implementations (where BSL and SSL are omitted for simplicity) are shown in Fig. 3. From Fig. 3, we observe that logic gate implementations with only one minterm such as N-input AND, NOR, etc. require only one string of 3D NAND flash memory array.However, logic gate implementations of N-bit NAND, OR, XOR, XNOR, etc. which consist of N-minterms in their SOP expression need N strings of 3D NAND flash memory array.Considering the ultra-high density of the commercial 3D NAND flash memories with ≤ 176 WL layers and page size exceeding 16 KB, we can real1ize logic functions with ≤177 inputs (literals) and ∼2 14 minterms parallelly.Realizing logic functions with such high fan-ins (inputs) exploiting single-stage static CMOS design technique not only requires a large number of complementary MOSFETs (atleast N-pMOSFETs and N-nMOSFETs for N fan-ins) but also results in a significantly high delay, large area and high dynamic power dissipation.Furthermore, even input signal routing is a complex task while realizing such CMOS circuits with large fan-ins.Therefore, logic functions with large inputs are typically implemented using multi-stage CMOS circuits with large latency.However, the proposed approach facilitates a simple, energy-efficient and single-stage implementation of even complex logic functions with large fan-ins or minterms with a high throughput.Furthermore, the proposed logic-inmemory implementation is reconfigurable i.e., different logic functions can be realized by reprogramming the flash cells within a string.Moreover, the proposed approach can also be extended to implement multiplexer (MUX) based logic designs.

IV. PERFORMANCE EVALUATION
To analyze the efficacy of the proposed logic-in-memory implementation utilizing the 3D NAND flash memory array, we have performed rigorous analysis of the performance metrics such as energy and delay for different logic gates designed in section III using the experimentally calibrated behavioral compact model of 3D NAND flash [14] with the aid of HSPICE [24] simulations.The behavioral compact model in [14] utilizes the BSIM CMG 110.0.0 model to mimic the cell behavior and the model parameters have been tuned to reproduce the experimental characteristics of a 3D NAND flash string with 10 WLs [25], [26].Moreover, the model also effectively captures the parasitic capacitances pertaining to the string in the 3D NAND flash architecture [14].Furthermore, the model takes into account the voltage drop across the unselected cells in the pass mode while reading the string current and the location-dependent characteristics of the flash cells within a string [14].It may be noted that the main aim of this work is to provide a novel methodology for realizing logic-in-memory implementations and a unique system-level reconfigurable architecture with an innate IC camouflaging property rather than showing the exact values of the string current.Furthermore, the exact analysis of the reliability and transmission line losses in the proposed implementation is an important future work.The methodology utilized for estimating the performance of the proposed logic-in-memory implementation utilizing 3D NAND flash memory array is described in detail in the following sub-sections.

A. Energy Calculation
For performing logic-in-memory operations utilizing 3D NAND flash, first, the inputs have to be encoded as threshold voltages of the flash cells in the string.To encode the inputs as threshold voltage, we need to apply a programming pulse which leads to the write energy dissipation.
For a threshold voltage (V t ) shift of 3.5 V (Fig. 1(c)), the programming voltage (V pgm ) required is 20 V [23].The write energy (E write ) can then be calculated as: where q is the change in the amount of charge stored in the nitride layer, and is given by: where A is the curved surface area and C q is the capacitance per unit area between the charge trap nitride layer and the active polysilicon layer.C q can be derived as [23]: where ε ox is the permittivity of the tunnel oxide layer, ε Si N is the permittivity of the charge trap nitride layer, T T O X is the thickness of the tunnel oxide and T Si N is the thickness of the charge trap nitride layer (Fig. 1(b)).Utilizing equations ( 1)-( 3) Once the inputs are encoded as the threshold voltages of the flash cells in 3D NAND string, the output of the logic gate is obtained by applying a read voltage (V WL = 2.5 V) on WLs corresponding to the programmed cells and a pass voltage (V pass = 8 V) to the remaining cells in the string and a bit line voltage corresponding to one input of the logic gate (V BL = 0.5 V for logic '1' and V BL = 0 V for logic '0').The string current during this read process gets accumulated as the voltage on the input capacitance of the sense amplifier (C CSL = 60 fF) and a read energy given by E r ead = C CSL ×V 2 BL = 30 fJ is consumed in the process.The write energy (E write ), the read energy (E r ead ) and the total energy dissipated by the widely used two-input logic gates for different input combinations are reported in Table II.As can be observed from Table II, the worst-case energy consumption of all the logic functions realized using the proposed methodology is same.Moreover, the worst-case (assuming all inputs of a single string need to be programmed) energy dissipation of an N -input logic function implemented utilizing this scheme can be obtained as (N-1)×E write +E r ead .For a 177-input logic-in-memory implementation exploiting 3D NAND flash array with 176 layers, the worst-case energy dissipation is 58.32 fJ which is significantly lower than the multi-stage CMOS implementation.

B. Delay Calculation
The delay of the proposed logic-in-memory implementation utilizing 3D NAND flash memory depends significantly on the critical input (applied on the bit line BL).The delay is obtained as the temporal difference between 50% of the critical input voltage and 50% of the output voltage accumulated on the C CSL .
We have performed an extensive analysis of the delay of different logic gates considering a read voltage V read = 2.5 V for the programmed WLs and a pass voltage V pass = 8 V for the unprogrammed WLs for 3D NAND flash strings with  16, 64, 128 and 176 WL layers (or flash cells in a string).Moreover, the time period of the critical input BL pulse is taken as 10 µs considering the typical random read time (t R ) of commercial 3D NAND flash memory [27], [28], [29].The worst-case delay for logic-in memory implementation utilizing 3D NAND flash array with 16, 64, 128, and 176 WLs in the string for critical input pulse width of 10 µs, 100 µs and 1 ms are mentioned in table III.The worst-case delay increases with the number of WLs in the string.It may be noted that we have decoupled the programming time i.e., the time required to encode the inputs as the threshold voltage of the flash cells (typically 2 ms per input) from the read time required for performing in-memory logic operation while analyzing the delay following the approach used in [3] and [4].

V. RECONFIGURABLE SYSTEM-LEVEL ARCHITECTURE
We also propose a novel reconfigurable system-level architecture for ultra-fast implementation of different logic gates with different input combinations as shown in Fig. 4. In the proposed architecture, we exploit the ultra-high density of 3D NAND flash memory and pre-program different logic gates with different input combinations in different strings across the 3D NAND flash array in the form of logic gate planes (Fig. 4) and use a content addressable memory (CAM) (which can also be implemented using a 3D NAND flash [30], [31]) to search the strings encoding the logic function to be performed.
The inputs and the logic function to be performed are encoded and given as a search query vector to the CAM.The CAM stores the information regarding the location of the logic gate planes performing a particular logic operation on a set of inputs in the pre-programmed 3D NAND flash array.The flash cells within different strings encoding a logic function are programmed in the 3D NAND flash array such that the output of the logic function is obtained as the voltage on C CSL when input literals are applied on bit lines (BLs) and bit select lines (BSLs) during the read process.A read voltage V read = 2.5 V is applied to the WL layer in which the flash cells are pre-programmed according to the logic Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.operation and a pass voltage of V pass = 8 V is applied to the remaining WLs to obtain the output.For instance, as shown in Fig. 5, we can realize the widely used logic functions in our proposed architecture by applying the inputs on BLs and BSLs and pre-programming the threshold voltage of flash cells in different strings such that each string encodes one minterm of the logic function.Moreover, we can reconfigure the same 3D NAND flash array for different logic operations by re-programming the threshold voltage of flash cells in different strings.
We have performed extensive HSPICE [24] simulations of the proposed 3D NAND flash logic-in-memory implementation with a CAM based reconfigurable architecture using the behavioral compact model described in [14].To analyze the performance enhancement in terms of speed, we provide different combinations of search query and measure the (worst case) delay while obtaining the match/mismatch result from the proposed system level reconfigurable architecture.
For ease of understanding and proof-of-concept demonstration, we programmed the 3D NAND flash array with 16 different 2-input logic gates.Each logic gate was programmed in a particular WL layer and 4 BLs were used to represent the different input combinations (00, 01, 10, 11) corresponding to that logic gate.Therefore, for storing all possible input combinations corresponding to 16 different 2-input logic gates, 64 BLs (16 logic functions × 4 combinations for each logic function) and 16 WLs (one each for 16 logic functions) are required.Now, for selecting a logic gate with a particular input combination, the search query vector should consist of 10 bits out of which the first 6 bits can be used to select the BL corresponding to the input combination (out of 64 BLs) and the next 4 bits can be used to select the WL (out of 16 WLs) encoding the logic gate.
Next, we utilize a 3D NAND flash-based CAM [30] to search a particular input combination of logic gate within the pre-programmed flash array.Each CAM cell consists of two flash cells in adjacent strings corresponding to the same WL.To store a particular bit information, the flash cells are programmed in a complementary fashion as shown in Fig. 6.While consecutive flash cells programmed to V t = 0V (CTM 1 ) and V t = 3.5V (CTM 2 ), respectively, represent a logic '1', the combination of flash cells programmed to V t = 3.5V (CTM 3 ) and V t = 0V (CTM 4 ), respectively, represent a logic '0'.Similarly, while searching for the stored data in the CAM, the query vector bits need to be applied as BL voltages to the consecutive strings in a complementary fashion i.e., consecutive BL voltages BL 0 = 0 V and BL 1 = 0.5 V represent logic '1' and BL 0 = 0.5 V and BL 1 = 0 V represents logic '0'.In this scheme, for a mismatch between the query Fig. 6.CAM implementation exploiting 3D NAND flash array [30].vector and the stored data, a substantial string current charges the CSL capacitance to 0.5 V whereas no string current flows when a match is found and the potential of the CSL remains at 0 V. Since the ultra-dense 3D NAND flash memory array facilitates pre-programming of almost all the logic functions, the mismatch case is expected to be rare.Therefore, this negative logic implementation where only the (rare) mismatch case leads to power dissipation saves a significant amount of energy.
For the representative case of logic-in-memory implementation of 16 different 2-input logic gates utilizing this proposed architecture, the HSPICE [24] simulations indicate a worst-case delay of 19 ns between the application of the query vector to the 3D NAND flash-based CAM and the Match/Mismatch result considering state-of-art 176-layer 3D NAND flash array.Although the typical programming delay of a 3D NAND flash cell is ∼2 ms [3], [4], the ultra-dense CAM cell needs to be re-programmed only if there is a mismatch which is a rare case.
Although the 3D NAND flash cells exhibit a limited endurance, the lifetime of the proposed logic-in-memory implementation exploiting the 3D NAND flash memory can be improved by utilizing effective architectural level solutions such as ENDURER [32] which leverage the ultra-high density and employ different flash cells across the array for different switching cycles.Moreover, in the system level reconfigurable architecture, there is no need to write into cells until the required logic to be implemented is not pre-programmed in the array (mismatch case which is rare).Such an approach can significantly increase the speed of the 3D NAND flashbased logic-in-memory implementations.

VI. IMMUNITY AGAINST REVERSE ENGINEERING ATTACKS
As discussed in the introduction section, reverse engineering, which involves de-packaging and delayering of the IC and subsequent imaging of different layers to extract the information regarding the underlying design, poses a serious challenge to the semiconductor design companies.Although layout level IC camouflaging techniques [18] increase the resilience against RE attacks, they offer limited number of gates with similar layout (NAND, NOR, XOR) and introduce a significant delay, power consumption and area overhead which restricts widespread (100%) obfuscation while satisfying the design constraints.Therefore, there is an urgent need for a camouflaging technique which does not introduce significant delay/power/area overhead and provides complete (100%) logic obfuscation for enhanced immunity against RE attacks.Since all the logic functions with ≤177 literals/inputs and ≤2 14 minterms appear similar at the layout level in the proposed logic-in-memory utilizing 3D NAND flash array, our implementation may provide an efficient means to thwart the RE attacks without introducing any overheads.The innate camouflaging property of the proposed logic-in-memory implementation arises from the capability to re-program flash cells without changing the physical layout.In this work, for the first time, we have provided an effective means to exploit this reconfigurable property of flash cells to realize logic functions with multiple inputs in sum-of-product (SOP) form by encoding inputs as the threshold voltages of flash cells within a string.Moreover, the system-level architecture comprising of a 3D NAND flash array-based CAM and a pre-programmed 3D NAND flash array with different logic planes in different WL layers also appears same at the layout level.Therefore, the proposed methodology to implement logic functions within the 3D NAND flash memory and the novel system-level architecture facilitates the exploitation of the reconfigurable property of the flash cells to realize a camouflaged logic design technique.Moreover, the 3D NAND flash cells can be programmed after fabrication which eliminates Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
the possibility of detection of the implemented logic function by the malicious attackers, foundry or reverse engineers.
To analyze the potential of the proposed logic-in-memory implementation utilizing 3D NAND flash as a logic camouflaging technique, we have performed various reverse engineering attacks on different ISCAS'85 [19] and ISCAS'89 [20] benchmark circuits with logic gates replaced by the camouflaged logic gates utilizing the ultra-dense 3D NAND flash memory.

A. Resilience to Satisfiability (SAT) Attacks
The efficiency of a circuit camouflaging technique is evaluated by analyzing its resiliency against SAT attacks [33].The inherent assumptions while performing the SAT attacks are: the reverse engineer has access to (a) functional chip and (b) RE tools which facilitate de-packaging, delayering, imaging of the layout and extraction of gate level netlist.Moreover, (c) the RE attacker can differentiate between the layouts of camouflaged and non-camouflaged gates and (d) the attacker also possesses the list of functions that can be implemented using the camouflaged cell.
To identify the function of all camouflaged gates in the circuit, the reverse engineer extracts all the input/output patterns from a functional IC and applies these patterns on another functional IC while analyzing the outputs.The critical step while de-camouflaging the IC using SAT attacks is to generate input patterns that help in resolving the functionality of the camouflaged gate using least number of iterations.This is achieved utilizing two basic principles of VLSI design for test (DFT) used for analysis of manufacturing defects in ICs [34], [35]: justification and sensitization.While justification refers to the process of controlling one or more inputs of the gate to obtain a certain output, sensitization involves application of inputs to the gate in such a way that the output is controlled only by the input to be sensitized.For instance, application of '0' as one input to an AND gate justifies its output to '0' while application of '1' as an input to the AND gate forces the output to depend only on the other input increasing its sensitivity.
To analyze the efficacy of the proposed implementation for IC camouflaging, we have utilized the (fast) incremental SAT solver tool [33], [36].Although the proposed camouflaging technique is capable of realizing any logic function with ≤177 literals/inputs and ≤2 14 minterms without introducing any area overhead in the layout, the incremental SAT solver tool supports analysis of camouflaged gates with only two inputs.For logic gates with only two inputs, the total number of distinguishable truth tables (functions) which may be realized is 16.Considering this limitation of the tool, we restricted the number of logic functions that may be realized using the camouflaged gate to 16.The incremental SAT solver utilizes a mux-based modelling approach for the camouflaged gate as shown in Fig. 7.In the mux-based modelling approach, the control vectors (P i , P i+1,.... ) dictate the logic function realized by the camouflaged gate.The SAT solver tries to identify the control vectors which yield correct input/output patterns (obtained via the RE of functional IC) in an iterative manner to identify the logic function realized by the camouflaged gates [33], [36].We used the incremental SAT solver on ISCAS'85 [19] and ISCAS'89 [20] benchmark circuits with all the logic gates replaced by the proposed logic gate implementation utilizing 3D NAND flash memory.Although the SAT solver was able to resolve the functionality of the six camouflaged gates and break the simple c17 circuit of the ISCAS'85 benchmark circuits [19] within 39 secs, it was not able to break (decode) the larger circuits like c432 and c499 benchmarks even after running the tool with an inexhaustive resource space for more than 15 hours.This clearly indicates the significant increase in the immunity of the benchmark circuits with large number of camouflaged logic gates against SAT attacks.Moreover, we have compared the resiliency of the proposed implementation against de-camouflaging using SAT attacks on ISCAS'85 benchmark circuits with recent camouflaging techniques in Table IV.The inherent camouflaging of the proposed logicin-memory architecture utilizing ultra-dense 3D NAND flash memory exhibits a significantly reduced vulnerability to SAT attacks as compared to the recent obfuscation techniques based on CMOS [18] and 2D hetero-structures [37].
Although we have replaced all the logic gates of the ISCAS'85 benchmark circuits with the mux-based camouflaged gates while performing SAT attacks, the area overhead introduced by the other obfuscation techniques restrict such an approach while designing logic circuits.To reduce the area overhead, only 5% of the logic gates were camouflaged in the CMOS-based camouflaging technique [18] and silicon nanowire (SiNW) [38] based obfuscation technique.Although the 2D hetero structure based camouflaging technique [37] may be used to camouflage all the logic gates owing to the significantly reduced area overhead, the camouflaged gates can only implement NAND/NOR/AND/OR gates.On the other hand, the inherent camouflaging technique offered by the proposed logic-in-memory implementation using 3D NAND flash memory allows realization of all the logic gates without introducing area overhead facilitating complete obfuscation and alleviates the need for hybrid designs.Moreover, the proposed implementation exhibits same delay and energy dissipation for all the logic operations.Therefore, it is also expected to show high robustness against side-channel attacks.

B. Resilience to ATPG Attacks
The automatic test pattern generation (ATPG) tool [39] relies on 'activation' and 'propagation' of faults.For efficient detection of single fault models such as stuck-at-fault utilizing Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
ATPG attacks, there should be a clean path between the inputs and the outputs.i.e., the gate being attacked should be connected to either primary inputs or the primary outputs through standard logic gates.Recently, the ATPG tools have also been exploited for de-camouflaging obfuscated circuits [40].During the ATPG attack, the IC is considered as a black box and the outputs corresponding to different input patterns generated by the ATPG are analyzed.Each camouflaged gate is replaced by a standard logic gate and the test patterns are generated according to the expected functionalities.For instance, for a 2-input camouflaged gate which may exhibit 16 different logic functions, the total number of possible input patterns is 64 (2 2 ×16).Furthermore, if 'm' camouflaged gates are connected in series, they are replaced by a single dummy gate with 'm+1' inputs.Moreover, if each camouflaged gate exhibits 'N' different logic functions, the total number of input patterns to be generated increases exponentially to 2 m+1 ×N m .Since the proposed logic-in-memory implementation exploiting ultradense 3D NAND flash memory can realize any logic function with ≤177 literals/inputs and ≤2 14 minterms, the number of test patterns required for ATPG attack would be significantly large.Moreover, the proposed technique facilitates full (100%) camouflaging without introducing any significant overhead and mitigates the possibility of having clean paths for activation and propagation of faults.Therefore, the ATPG tool failed to de-camouflage even the smallest c17 benchmark circuit indicating the strong resilience of the proposed technique against ATPG attacks.

C. Resilience to Brute Force Attack
Brute-force attack, which relies on enumeration of all possible combinations of logic gates to realize a particular function, is considered as the ultimate solution to de-camouflage an obfuscated circuit.For the two input gates in the ISCAS'85 benchmark circuits, the proposed implementation may realize at least 16 different functionalities.Considering the smallest c17 benchmark circuit which consists of 6 gates, the number of possible logic combinations is huge (16 6  = 16,777,216).Furthermore, for complex benchmark circuits such as c7552 consisting of 2362 gates, the possible logic combinations increase exponentially (16 2362 ).This makes the brute force attack computationally complex, resource intensive and time consuming.Therefore, the attacker may not be able to resolve the camouflaged gates in a reasonable time with limited resources.Furthermore, since the proposed implementation facilitates realization of more than 16 logic functions (any logic function with ≤177 literals/inputs and ≤2 14 minterms), the possible combinations are expected to be significantly large increasing its robustness against the brute force attack.

VII. CONCLUSION
In this work, we have proposed a novel architecture for realizing multiple multi-variable logic-in-memory utilizing 3D NAND flash memory array.The proposed implementation is compact, highly energy-efficient and enables realization of any logic function with ≤177 literals/inputs and ≤2 14 minterms.Since the proposed methodology does not need any modification in the 3D NAND flash architecture and utilizes the conventional read process for logic evaluation, it is compatible with the commercial 3D NAND flash memory arrays and the 3D NAND flash cells can also be utilized as storage elements.We have also proposed a novel reconfigurable system level architecture exploiting a CAM to further improve the efficiency of logic-in-memory implementation.Moreover, we have also demonstrated an innate camouflaging technique utilizing the proposed architecture which facilitates complete obfuscation of logic gates and benchmark circuits without introducing any area overhead while exhibiting a strong resilience to SAT attacks, ATPG attacks and brute force attacks.We believe that this work is an important step in the direction of exploiting ultra-dense 3D NAND flash memories for secured processingin-memory architectures which are immune against reverse engineering.Our results may provide the incentive for experimental demonstration of 3D NAND flash memory based multiple multi-input logic-in-memory primitives.

Fig. 1 .
Fig. 1.(a) Bird's eye view of the gate-all-around (GAA) charge trap (CT) Macaroni body 3D NAND flash memory array, (b) cross-sectional view of the 3D NAND flash cell and (c) string current characteristics of the 3D NAND cell located at WL 9 for extreme programmed state (111) and the erased state (000).

Fig. 2 .
Fig.2.Schematic representation of the 3D NAND flash memory array used for implementation of logic gates utilizing the behavioral compact modeling approach proposed in[14].

Fig. 4 .
Fig. 4. System level architecture for improving the speed.

Fig. 5 .
Fig. 5.The logic plane schematic for the pre-programmed 3D NAND flash array used in the system level architecture proposed to increase the speed of 3D NAND flash-based login-in-memory implementation.

Fig. 7 .
Fig. 7. (a) c17 circuit of ISCAS'85 circuit.G3 gate is replaced by a Mux-based model of camouflaged gate with an internal structure shown in (b).

TABLE II ENERGY
[14]UMED BY DIFFERENT LOGIC GATES and the structural parameter values from Table I[14], the write energy is obtained as E write = 0.16 fJ for programming one input.

TABLE III DELAY
FOR DIFFERENT WL LAYERS AND PULSE DURATION