Deep Neural Oracles for Short-window Optimized Compressed Sensing of Biosignals

,


I. INTRODUCTION
C OMPRESSED Sensing (CS) is a relatively new paradigm for the acquisition/sampling of signals that violates the intuition behind the theorem of Shannon [1]- [3].In fact, CS theory states that, under surprisingly broad conditions it is possible to reconstruct certain signals or images using far fewer samples or measurements than they are used with traditional methods.To enable this, CS is based on two concepts: sparsity, which is related to the signals of interest, and incoherence, which relates to the methods of measurement/acquisition/sampling.Sparsity expresses the idea many natural signals have a very parsimonious representation when expressed in an appropriate sparsity basis.Incoherence expresses the idea that a reduced number of acquisitions of a waveform that have a sparse representation in an appropriate basis, which is made in a domain that is incoherent with it, allows to always capture the entire signal information.
Based on these concepts, it is has been possible to devise protocols for sampling/measurement [4], [5] which capture the information content, but require a number of measurements comparable to the number of non-zero coefficients in the expression of the signal of interest with respect to its appropriate sparsity base.Consequently, the most significant feature of these sampling procedures is that they allow a sensor to capture the information content of a signal without going through the acquisition of its entire profile, thus performing acquisition and compression at the same time.In other words, CS is a very simple and efficient procedure to sample sparse signals at a reduced rate, using much less resources with respect to standard sampling required for A/D conversion.
Of particular interest is that many signals of interest in biomedical applications enjoy the sparsity property and can therefore be efficiently acquired using CS, i.e. by using less energy, in less time and/or with less samples.For example, this has been demonstrated for Electrocardiographic (ECG), Electromiographic (EMG) [6] and Electroencephalography (EEG) [7] signals, which paved the way to the adoption of CS for efficient acquisition of biosignals in Body Area Networks nodes [8], [9].This has been shown as well for waveforms acquired through magnetic resonance imaging (MRI) [10] where by using CS one can obtain the very important results to accelerate the overall MRI acquisition [11].
All these advantages in the acquisition phase are balanced by the increase in complexity necessary for the signal reconstruction with respect to the simple low-pass filtering needed in a standard D/A conversion.In fact, reconstruction in a CS frameworks boils down to solving the problem (which is also fundamental in a number of heterogeneous applications) of recovering an n-dimensional sparse signal x from a set of m measurements y that represent the output of the CS under-sampling encoding, i.e., with m ă n.More specifically, one needs to find the sparsest n-dimensional vector x among the infinite solution of the hill-defined system y " Lpxq, where L : R n Þ Ñ R m is a linear dimensionality-reduction operator, which, regrettably, is an NP-hard problem.Yet, thanks to [12], the solution can be obtained by solving a minimization problem, called Basis Pursuit (BP) 1 , using linear programming.In other words, the result in [12] is fundamental since it allows to obtain a solution for the BP problem in polynomial time, thus making the use of CS practical.Yet, the computational resources needed by the numerical algorithm solving BP may be so demanding to make its solution practically unfeasible in low-complexity nodes, like a typical BAN gateway.To cope with this, several dedicated BP/BPDN solvers have been proposed such as the Spectral Projected Gradient for L1 Minimization (SPGL1) [13], and the Generalized Approximate Message Passing (GAMP) [14].Alternative solutions rely on the observation that the main issue in the computation of x is not finding a generic solution to y " Lpxq, but to find the sparse one.Starting from this, further computational cost reduction can be achieved by generating solutions which iteratively adjusts their sparsity at each step.Different heuristics may be used to promote sparsity and give raise to different methods, such as the Orthogonal Matching Pursuit (OMP) [15] and the Compressive Sampling Matching Pursuit (CoSaMP) [16].
More recently it has been demonstrated that additional advantages in terms of a smaller computational complexity or improvement in the quality of the reconstructed signal can be obtained by adopting a (Deep) Neural Network (DNN) for reconstruction [17]- [24].More specifically, in [21] authors have shown a probabilistic relation between CS and a stacked denoising autoencoder (SDA) implemented using a 3-layer simple structured neural network.Once properly trained, the SDA has been capable to directly recover a sparse image from its linear (or mildly non-linear) measurements, and has offered, in some cases, advantages in terms of quality of the reconstructed images with respect to the most common greedy reconstruction algorithms.A similar approach which employs fully-connected DNNs can be found in [22], where CS has been applied to videos and the proposed approach enables fast recovery of video frames at significantly improved reconstruction quality.In [23] authors have proposed a DNN, called ISTA-Net and inspired by the Iterative Shrinkage-Thresholding Algorithm (ISTA) [25], which has been designed with the aim of optimizing the solution of BP to reconstruct compressed images.Another deep learning framework (referred as BW-NQ-DNN) applied to CS acquisition/reconstruction of neural signals has been presented in [24], where three networks have been jointly optimized for performing a binary measurement matrix multiplication, a non-uniform quantization and reconstruction, respectively.Despite the advantage shown in terms of quality of reconstruction, this approach has the drawback in terms of complexity of the resulting system, and in particular of the DNN used for the non-uniform signal re-quantization.
The aim of this work is to propose an innovative use of DNNs in a CS-based acquisition/reconstruction framework.More specifically our network is not used, as it is the case for all the above mentioned Literature, to directly reconstruct the input signal, but only to provide a divination of the support of the input signal, i.e. of the positions of the nonnull components of the original signal when it is expressed along the sparsity basis.Our approach not only improves reconstruction quality with respect to standard techniques, but also introduces a self-assessment capability that allows to estimate on the fly the quality of reconstruction.Furthermore, with our approach signals can be successfully reconstructed even when they have been sampled using CS referring to very short acquisition windows.This is important since it allows to further reduce the complexity of the acquisition stage and/or allow aggressive mixed-signal implementation of the acquisition stage.
To the best of our knowledge, this is the first work proposing to use a DNN for support identification, and one of the few proposing to use a DNN to improve reconstruction of signals sampled using CS which are not images.
The rest of the paper is organized as follows.Section II introduces some basic concepts of the CS.In Section III the choice of n is analyzed with pros and cons for the two considered classes of signals, ECGs and EEGs.Section IV recaps standard and oracle-based CS decoders while in Section V the adopted figures of merit are defined.Section VI introduces the DNN architecture as main building block of the proposed CS decoder described in Section VII.The same section includes performance analysis and comparisons with other standard CS decoders.The self-assessment capability is the topic of Section VIII while Section IX reports computational analysis for both encoder and decoder.Finally, we draw the conclusion.

II. COMPRESSED SENSING BASICS
Let us refer to the scheme in Figure 1 and assume to work by chopping input waveforms into subsequent windows, each of which is represented by a set of its samples x " px 0 , . . ., x n´1 q collected at Nyquist rate that we see as a vector x P R n .CS hinges on the assumption that x is κ-sparse, i.e., in the simplest possible setting, that an orthonormal matrix S exists (whose columns are the vectors of the sparsity basis) such that when we express x " Sξ, then the vector ξ " pξ 0 , . . ., ξ n´1 q does not contain more than κ ă n non-zero entries.
The fact that x depends only on a number of scalars that is less than its sheer dimensionality hints at the possibility of compressing it.CS does this by applying a linear operator L A : R n Þ Ñ R m depending on the acquisition (or encoding) matrix A P R mˆn with m ă n and defined in such a way that x P R n can be retrieved from y " L A pxq P R m .The ratio n{m is the compression ratio and will be indicated by CR.
It can be intuitively accepted that the larger the κ, the larger the m is needed to guarantee that x can be retrieved from y and thus the lower the achievable CR.This relationship is asymptotically identified by CS theory as m " O pκ log p n {kqq [2].In finite and practical cases, one may often aim at using an m value proportional to κ though the most elementary, worst-case theoretical guarantees fail for m ă 2κ.In fact, though y " L A pξq has an infinite number of counterimages, the first prerequisite for the recoverability is that when we add the κ-sparsity prior only one of them survives.Hence, given any two κ-sparse vectors ξ 1 an ξ2 it cannot be y " L A pξ 1 q and y " L A pξ 2 q, i.e., L A pξ 1 ´ξ2 q must be non-zero.Hence, ξ 1 ´ξ2 cannot be in the kernel of L A .Since, in the worstcase, ξ 1 ´ξ2 is 2κ-sparse, the only way of guaranteeing this is that L A when restricted to any 2κ-dimensional coordinate subspace of R n is a maximum rank operator.Clearly, if L A : R n Þ Ñ R m with m ă 2κ this is not possible and, whenever the worst-case scenario is hit, the sparsity prior is no longer able to guarantee signal recovery.In practice, though worst case scenarios seldom appear, classical reconstruction algorithms fail before the limit m " 2κ is reached.
Clearly, compression by L A must be coupled with a signal reconstruction stage 2 R A : R m Þ Ñ R n such that ideally x " R A pL A pxqq.In practice the chain of the encoding and decoding step is a lossy process and x " R A pL A pxqq is only an approximation of x.

WINDOWS
The class of linear operators L A that can be effectively paired with a decoder R A is extremely large.Most notably, if A is an instance of a matrix whose entries are independent zero-average and unit-variance Gaussian random variables, then L A pxq " Ax is known to work [1], [2], [26] with very high probability.Yet, if the matrix A ˘is defined as A j,k " signpA j,k q, then L A pxq " A ˘x is also known to work with very high probability [27].In the following, we will focus on L A pxq " A ˘x as this makes the computation of L A pxq multiplierless and is thus the best option for very low resources implementations of the encoder stage.
Actually, the Literature shows that there is plenty of room for optimizing A [28]- [31], and suitably designed matrices are able to increase compression considerably with respect to naive random instances.
Clearly, this paves the way to applications in all those settings in which the computational complexity of compression must be kept at bay, e.g., in BANs for which reduced computation and compression before transmission are essential to fit within a tight resource budget.
It is worth stressing that, to best express its potential in reducing computational complexity at the encoder, CS should consider the shortest possible acquisition windows.
To understand why, consider the processing of N given samples.They may be partitioned into N {n contiguous and non-overlapping time windows, each with n samples.Operator L A can be applied to each window, entailing a number of operations O pn ¨mq.The total number of operations to process the N samples is O pn ¨m ¨N{nq " Opn ¨N {CRq.
Yet, CR is fixed to a sufficient level to reconstruct the original n-dimensional signal x from the m measurement y with a quality that is deemed acceptable.Hence, at given CR and N , the computational complexity is linearly increasing with n, i.e., with the length of individual time windows.
Another aspect that has to be considered is the signal reconstruction latency.Even considering that R A pyq is an instantaneous operation, the reconstructed signal is recovered with a delay of up to n time steps, since y is available with a delay of up to n time steps.Of course, the smaller the n, the lower the reconstruction latency.
Beyond this high-level reasons, short windows may benefit the implementation of the encoder also at a more physical level.
In purely digital realizations [32]- [34], the samples come from a conventional Analog-to-Digital converter and the encoder is implemented as a sequence of sums and subtractions depending on the entries of A ˘.In this case, not only the computation time but also the memory needed to store A reduces when n (and m) gets smaller.
In mixed-mode realizations (i.e., in the design of Analogto-Information converter based on CS) [8], [9], [35]- [37], y " A ˘x is computed component-wise as y j " ř n´1 k"0 A j,k x k , i.e., accumulating the signal samples in the analog domain.This implies an analog storage to hold the intermediate sum value.Yet, independently of the actual implementation and technology, the approach is doomed to suffer from leakage and disturbance [9], [38].These phenomena degrade the stored value along time and their effect increases with the hold time and the number of sums.Hence, the lower the n, the shorter the time and the smaller number of operations needed to compute y j , and therefore the smaller the degradation incurred before conversion into digital words occurs.
Regrettably, gaining all the advantages connected with the reduction of n is not straightforward.In fact, real world signals are such that, when n shrinks, the ratio κ {n tends to increase.As κ affects m, any reduction of n tends to impair the compression ratio.
To get a quantitative feeling of these trends, we show in Figure 2 the normalized sparsity κ {n for different values of n observed in the classes of ECG and EEG signals.Instances are obtained according to the synthetic generators described in the Appendix.Moreover, for both classes of bio-signals the considered sparsity basis S is a family of the orthogonal Wavelet functions [39].In more detail, we select the Symmlet-6 family as sparsity basis for ECG signals [9], while our choice for the EEG case is the Daubechies-4 family [40].
The value of κ is computed as the lowest number of entries of the sparse representation that includes the 99.5% of the energy in the 99% of the ECG instances, and the 95% of the energy in the 99% of the EEG instances 3 .From the figure it is clear that the smaller the n, the larger the (normalized) sparsity, and therefore the lower the attainable CR that ensures a target reconstruction quality.
The above considerations reveal that there is a multi-faceted trade-off linking computational/implementation complexity, reconstruction quality and compression.The joint design of the encoder and of the oracle-based decoder we propose tries to address such a trade-off in a better way with respect to conventional approaches.

IV. SIGNAL RECOVERY AND SUPPORT ORACLES
To better formalize sparsity and its consequences, recall x " Sξ and that not more than κ entries of ξ are non-null.The positions of the non-zero entries of ξ identify the so-called support supp ξ that we will represent by means of the binary vector s P t0, 1u n such that s j " 1 if ξ j ‰ 0 and s j " 0 otherwise.Binary, n-dimensional vectors can be used to index a generic n-dimensional vector v so that v |s is the subvector of v collecting only the entries v j such that s j " 1.We will use binary n-dimensional vectors also to index matrices M with n columns so that M |s is the submatrix of M that contains only the columns whose index j is such that s j " 1.
With this notation, κ-sparsity is equivalent to say that x is efficiently represented by two pieces of information, namely the n-dimensional binary vector s and the real vector ξ |s whose dimensionality does not exceed κ.
Sparsity is fundamental in the decoding process going from y back to x.In fact, since m ă n, the mapping y " A ˘Sξ from ξ to y is non-injective.Hence, any given measurement vector y corresponds to an infinite number of possible ξ.Yet, if A is properly designed, only one of the counterimages of y is κ-sparse and can be found by relatively simple algorithmic means.
As shown in Figure 1, a decoder recovers both s and ξ s .Among the many methods proposed in the literature, the most classical approach is BPDN which recovers both pieces of information simultaneously by solving the optimization problem where x " S ξ is the reconstructed signal, }v} p indicates the p-norm of the generic vector v, and τ ě 0 accounts for the possible presence of disturbances in the computation of y by relaxing the constraint y " A ˘Sξ that would hold in the noiseless case.The noiseless case itself corresponding to solve the simpler BP problem can, of course, be tackled by setting τ " 0. Though implicitly performed, support identification is an important ingredient in BP and BPDN and is embedded in the 1-norm used in the objective function that, among all the possible ξ satisfying the constraint, selects the one with the largest number of zeros entries.This is so true that changing the 1-norm in the merit function would completely spoil reconstruction, while changing the 2-norm in the constraint usually still gives sensible results.
We here consider a different approach in which support identification is performed by an oracle looking at the vector y and divining s.Once s is known one may note that y " A ˘Sξ is a equivalent to y " A ˘S|s ξ |s to estimate ξ |s .
The oracle is based on a Deep Neural Network (DNN) trained on signals with the same statistical features of the one to be acquired.Training involves also the matrix A, so that encoder and decoder are jointly optimized to improve support identification and thus to improve reconstruction performance.

V. PERFORMANCE INDEXES
The encoder-decoder chain may simultaneously perform more than one useful operation on the signal (see, e.g., [41]- [44] for its use as an encryption stage) of which compression is surely the most obvious as m ă n.The compression performance of the encoder-decoder chain is easily assessed by the compression ratio n{m.
Yet, such a compression is in general lossy, and some degradation appears yielding x ‰ x.The closer x to x, the better the encoder-decoder chain and this can be assessed by means of the Reconstruction Signal-to-Noise Ratio (RSNR) defined as where for any scalar a, the a dB notation is equivalent to 20 log 10 paq.RSNR can be used to define two ensemble-level performance figures, computed starting from a set x ptq (for t " 0, . . ., T ´1) of signal instances recovered as xptq .The first is the Average RSNR (ARSNR) while the second is the Probability of Correct Reconstruction (PCR) estimated as where # counts the number of elements in the set and RSNR min is the minimum RSNR level that is considered sufficient for a correct reconstruction.´ q, maxtlog 2 p q, log 2 p¨quu for a small .
Though L A pxq " A ˘x in the forward pass, to prevent the sign function from interrupting error backpropagation, in the backward pass we assume ∇ A L A pxq " ∇ A pAxq.With this, since A j,k " signpA j,k q for every j and k, the training acts on the continuous-valued parameters whose sign is used in feedforward computation.
Using the methods specified in the Appendix, we generate a dataset composed by 8ˆ10 5 signal instances for both the ECG and the EEG case.Each dataset is split in 80% for training (training set) and 20% for performance assessment (validation set).
All models proposed in this paper are implemented and trained by means of the TensorFlow framework [45] with the help of the high-level API provided by Keras [46].
Training is performed with stochastic gradient descent, where each gradient step is computed with a mini-batch comprising of 30 signal instances and where the initial learning rate value is 0.1.
To appreciate the complexity of the networks we propose, the one for n " 64 and with m ranging in r16, 40s contains from 32128 to 36736 parameters and in our examples is trained for 500 epochs 4 .The network for n " 128 and with m ranging in r24, 64s contains from 124672 to 140032 parameters and in our examples is trained for 1000 epochs.

VII. TRAINED CS WITH SUPPORT ORACLE
The trained oracle can be exploited in the definition of the decoder reported in Figure 3.We compute o " N C pyq and, given a certain threshold o min P r0, 1s we estimate s with the binary vector ŝ P t0, 1u n such that ŝj " 1 if o j ě o min and ŝj " 0 otherwise.Starting from ŝ we finally estimate ξ|ŝ " `A˘S |ŝ ˘: y where ¨: indicates Moore-Penrose pseudo-inversion that is needed since the number of ones in ŝ is in the order of κ ă m and the matrix A ˘S|ŝ is a tall matrix with more rows than columns.The two estimations ŝ and ξ|ŝ define the recovered signal x.Decoder operations depend on the value of o min that is set by a further training phase in which each vector in the training set is encoded and decoded for different values of o min .The o min yielding the highest ARSNR is selected.We name our approach Trained CS with Support Oracle (TCSSO) to summarize its main features.
We compare the performance of TCSSO with that of some well-known methods.
Since TCSSO simultaneously adapts encoder and decoder, we pair some classical signal recovery algorithms with an established technique for the optimization of the matrix A that is able to cope with the antipodality constraint on the entries.As decoders, we consider the lightweight OMP [15], BP that in all the tested cases has proven itself better than BPDN, and GAMP [14].
Matrix optimization is made by rakeness maximization [30], [31] that we have verified to yield much better results with respect to classical independent assignment of ˘1 to each of the entries of A ˘.
We evaluate ARSNR and PCR by Montecarlo simulations using the samples of the validation set for both ECG and EEG cases with a superimposed noise equivalent to an Intrinsic Signal-to-Noise Ratio ISNR " 60 dB.The results are reported in Figure 4 for the n " 64 and κ " 16 case.In all plots, the number of measurements sweeps from m " 40 down to m " 16 thus focusing on compression ratios from CR " 1.6 up to CR " 4.
TCSSO clearly outperforms all other techniques and makes possible to work at compression ratios much larger than those commonly achievable though it still requires a very limited computational effort since n " 64.
Figure 5, show how the situation changes when n increases from 64 to 128.Performance is reported only in terms of ARSNR and increases with n as we are analyzing more data per chunk.The performance gap between TCSSO and the best of the classical methods also increases.

VIII. DECODER SELF-ASSESSMENT
The TCSSO architecture described in the previous section can be extended by exploiting a property that stems from the fact that s is estimated separately from ξ |s .
In fact, assume that no noise is present and that the size and content of A ˘are such that y " A ˘Sξ is satisfied by one and only one κ-sparse ξ, i.e., that recovery of the true signal is theoretically possible.If the oracle is successful in divining the support, then ŝ " s and y " A ˘S|s ξ |s implies that y P span `A˘S |ŝ ˘, where for any matrix M , span pM q is the subspace generated by the linear combination of its columns.This has a twofold consequence: i) (2) computes ξ|ŝ " ξ |s , ii) if ξ is mapped back we have A ˘S ξ " y.
Yet, if the oracle fails, then ŝ ‰ s and since ξ is the unique κ-sparse solution of y " A ˘Sξ then y R span `A˘S |ŝ ˘.This has a twofold consequence: Clearly, the decoder cannot check the correctness of ξ as the true ξ is unknown.Yet, it may map ξ back to measurement obtaining ŷ " A ˘S|ŝ `A˘S |ŝ ˘: y " A ˘x that could be different from y.As a result, }y ´ŷ} 2 is most naturally linked to the decoder failure and grants a useful self-assessment capability.In particular, one may monitor the quantity RMNR " }y} 2 }y ´ŷ} 2 that is the Reconstruction Measurements-to-Noise Ratio, and declare that the oracle, and thus the TCSSO decoder, has succeeded when RMNR ě RMNR min for a certain threshold.This situation can be exemplified in the small-dimensional case n " 4, κ " 2 and m " 3 with Since κ " 2 the instances of the original signal ξ P R 4 may have at most two non-null components and thus lay on the union of all the possible coordinate planes in R 4 .We may indicate one of those planes as c j,k where j and k are the indexes of the non-null coordinates of its points.The matrix A ˘is such that A ˘maps each of those 6 coordinate planes into a plane in R 3 that can be distinguished from the others.This is exemplified in Figure 6 on the left of which we draw the 6 planes ι j,k Ă R 3 that are the images through A ˘S of the coordinate planes c j,k Ă R 4 .Note that due to dimensionality reduction images are not pairwise orthogonal.Yet, recovery is theoretically possible as no two images ι j,k and ι j 1 ,k 1 are the same and thus a sufficiently clever algorithm can establish the support by looking at the measurement vector y.
By computing (2), the vector y is mapped back to ξ on that plane, that is therefore different from ξ.Though only approximately, the same holds in the noisy case and give an idea why the difference between y and ŷ assesses the correctness of the divined ŝ, i.e., the quality of the reconstruction x.
As an example of the underlying mechanism, Figure 7 reports some Montecarlo evidence on the relationship between RMNR and RSNR for the ECG signals and in three different configurations.In Figure 7a no noise is present and m " 32 " 2κ; in Figure 7b ISNR " 60 dB and m " 32 " 2κ, whereas in Figure 7c no noise is present, but m " 24 ă 2κ.
The two dimensional plots show an estimation of the jointprobability, conditioned to the positive events, i.e., the support has been correctly identified (ŝ j ě s j for all j " 0, . . ., n 1, orange points) or to the negative events, i.e., at least one entry in the support is neglected (ŝ j ă s j for at least one j " 0, . . ., n ´1, blue points).Darker colors stand for higher densities.
The one dimensional plots at the bottom of the Figure report the error probabilities of a self-assessment procedure that calls for a positive event whenever RMNR ě RMNR min and for a negative event otherwise.As the threshold RMNR min increases, the probability of a false positive decreases since only very high RMNR reconstructions are declared correct.On the contrary, the probability of a false negative increases since for larger RMNR min even good reconstructions can be declared incorrect.
The ideal conditions in Figure 7a result in perfect selfassessment capabilities.When noise is added as in Figure 7b, positive and negative cases get mixed but remain identifiable by looking at RMNR.
Though no noise is present in Figure 7c, the fact that m ă 2κ makes the number of measurements insufficient, in general, for signal reconstruction, as there is no guarantee that only one κ-sparse signal ξ correspond to the given y through A ˘S.
Hence, more than one support corresponding to the measurement exist.In these conditions, it may happen that the oracle divines a support that includes the true one (more than κ outputs of the network are larger than o min ) as well as components of other possible supports.In this case the oracle is not missing the support (orange point in the lower-right cluster in the scatter plot of Figure 7c) but pseudo-inversion In all cases n " 64 and κ " 16.Orange dots correspond to cases in which ŝ includes all the components of s, while blue dots correspond to ŝ failing to identify some components in s.Above and to the right of the scatter plots are logarithmic histograms estimating the probability density of RMNR and RSNR.In (a) m " 32 " 2κ and ISNR " 8.In (b) m " 32 " 2κ and ISNR " 60 dB.In (c), m " 24 ă 2κ and ISNR " 8.
spreads the reconstruction over all the available components thus failing to reconstruct the signal.It may also happen that the oracle divines a support different from the true one.In this case the oracle is wrong (blue points in the lower-right cluster in the scatter plot of Figure 7c) and pseudo inversion identifies a sparse signal that is not the true one.Both cases give rise to points for which RMNR is very high but the RSNR is very low and no matter how high the RMNR min , the probability of a false positive is not vanishing.Luckily enough, the above cases are the ones breaking worst-case guarantees and happen quite rarely: in our 1.6ˆ105 validation set, for n " 32, κ " 16 and m " 24, the oracle divines a support in excess of the true one only 109 times, and a support different from the true one only 6 times.The statistics commonly used to assess performance remain substantially unaltered by these failures that are undetectable by looking at the RMNR.
In general, the value of RMNR min can be decided once that o min is set, by a further pass over the training set.This allows to estimate false positive and false negative curves as in Figure 7 and use them as criteria.
In the following we will set RMNR min as the largest value for which false negative probability is negligible.
Whenever a failure is detected, the decoder may take different actions whose effectiveness depend on the final applications.
The exploration of all the possibilities of the resulting twolevel decoder is out of the scope of this paper but it can be easily recognized that quite a few options are available, such that: i) raise a warning and mark the current window as potentially incorrect;  ii) feed the warning back to the encoder and require further information to correct the reconstruction (thus lowering the CR for this instance); iii) trigger another decoder on the same measurement vector hoping that this will improve reconstruction; iv) any combination of the above.As a partial, non-optimized example, whose only aim is to show that some information can still be extracted from the measurements, when first-attempt TCSSO decoder fails, we trigger GAMP 5 as a second-wind decoder.
Figure 8 plots the probability that GAMP yields an RSNR larger than what is given by TCSSO when applied to the instances that the latter marks as incorrectly recovered as RMNR ă RMNR min , as a function of CR for the n " 64, κ " 16 case.A second-wind decoding is useful when such a probability is larger than 50%, i.e., approximately for CR ď 2.

IX. COMPUTATIONAL REQUIREMENTS
As noted previously, CS-based lossy compression methods result in a multi-faceted trade-off between compression ratio, reconstruction quality, and computational complexity.In this section we give further detail on the last aspect, distinguishing what is required at the encoder (that we want to minimize) and at the decoder (that we want to be not worse than the needs of classical recovery methods).In all cases we refer to the computational burden per processed sample, i.e., we divide the number of operations by the number of samples n contained in the processed window.

A. Encoder
The complexity of the encoder is briefly introduced in Section III as one of the leading design criteria.The number of signed accumulations (AC) is nm " n 2 CR ´1 thus yielding nCR ´1 AC{sample.Further to time-complexity, memory footprint is dominated by the storage of the matrix A ˘, and requires a number of entries equal to nm " n 2 CR ´1.In principle, matrix entries are bits.Yet, microcontroller-based implementations may favor 1-byte-per-entry or even 4-bytesper-entry solutions.In fact, in some architectures the alignment of entries at word boundaries ensures better performance both in terms of speed and energy (see, e.g., [47]), this is why we express the memory footprint as the number of entries in A ˘.
From the blue curves in Figure 5, one gets that an higher n results in better reconstruction performance for the same CR, and thus there is a trade-off between encoder complexity and window length.
Table I reports the comparison between the increase in terms of quality and the increase in terms of complexity and memory footprint for ECG signals when n " 64 and n " 128, and where CR ranges from 2 to 4. At high CR levels, increase in terms of quality (e.g., with CR " 4, +18.0 dB for ECG and +11.5 dB for EEG) may be worth the ˆ2 in terms of computational effort and the ˆ4 in terms of memory footprint.Yet, at lower values of CR, the increase in resource needs is not justified by the limited increase in performance: for CR " 2, resource requirement increase as before but one only gains +1.7 dB in the ECG case and +0.1 dB in the EEG case.

B. Decoder
In CS-based schemes, decoding is computationally much intensive than encoding.We may evaluate the complexity of the TCSSO decoder by counting the number of Multiplyand-Accumulate (MAC) operations needed to compute x, disregarding the training phase.
The oracle N C is composed by an input layer with m nodes, 3 fully connected hidden layers with 2n, 2n and n nodes, and a final output fully connected layer with n nodes.The layer-by-layer number of MACs required for After support estimation, calculations are needed to compute x.In particular, one needs to compute the Moore-Penrose pseudoinverse of B " A ˘S|ŝ , i.e., of a matrix with m rows and a number c of columns κ ď c ď n, with c » κ being the most frequent case.
Though pseudo-inversion has optimized implementations, its typical complexity is equivalent to the computation of its analytical formula that, in our case, is B J `BB J ˘´1 .Considering that BB J is an m ˆm matrix whose inversion entails 2m 3 MACs we arrive at estimating a total of mp2m 2 `mκ `m `κq for the typical c " κ case.The complexity is p2nCR ´2 `nκ {nCR ´1 `κ{n `CR ´1qnCR ´1 MAC{sample.
We may compare the complexity of TCSSO decoding with that of OMP, that is known to be one of the simplest and most light-weighted approaches.We consider the standard implementation of OMP as described in [48].OMP begins with a number of iterations each of which entails some linear algebra and sorting whose aim is to add a component to the support.By focusing on matrix products and their cost in terms of number of MAC, the j-th step has a complexity equal to nm `2mpj ´1q `2m `jm.In the most economical implementation, it is possible to fix the number of iterations to κ and spare the checks needed to terminate the loop.This yields a total of at least 3κm{2 `3κ 2 m{2 `κnm MACs.After that, OMP computes the pseudo-inverse of a matrix of the same size as the B " A ˘S|ŝ in TCSSO.
The total complexity of the iterative part is therefore given by p3{2 `3n κ {n{2 `nqn κ {nCR ´1 MAC{sample and must be compared with the computational effort required by the oracle that is p2{CR `7qn MAC{sample.
Though the different contributions to the computational complexities computed above have quite dissimilar asymptotic behaviors, their magnitude in the small-n cases can be appreciated only by numerical evaluation.As an example, for n " 64, κ " 16 and CR " 2 (one of our ECG cases) the first part of OMP entails some 716 MAC{sample, the oracle in TCSSO requires some 512 MAC{sample, while the common pseudo-inversion amounts to 1304 MAC{sample.As a further, somehow opposite, example, for n " 128, κ " 26 and CR " 4 (one of our EEG cases) the first part of OMP entails some 1095 MAC{sample, the oracle in TCSSO requires some 960 MAC{sample, while the common pseudo-inversion amounts to 735 MAC{sample.
In both cases the complexity of TCSSO and that of OMP are analogous thus showing that, though TCSSO allows to implement extremely lightweight encoders, the decoder does not have to compensate by increasing its computational requirements with respect to conventional decoders.

X. CONCLUSION
We propose a CS decoder that, starting from the compressed measurements, first guesses which components are non-zero in the sparse signal to recover, and then computes their magnitudes.
Support guessing is provided by a suitable DNN-based oracle that reveals extremely accurate, especially when trained together with the encoding matrix.
The resulting decoder largely outperforms classical approaches, even when they are paired with one of the most effective adaptation policies for the encoding matrix.This allows to apply CS to signal windows containing a limited number of samples.The adoption of short windows is extremely beneficial along many directions, one of the most remarkable being the computational complexity of the encoder.Yet, short windows are usually out of the reach of classical CS mechanisms as the sparsity assumption on which they hinge tends to fail when the dimensionality of the waveform to compress decreases.Hence, our proposal allows the implementation of extremely low complexity encoders that still feature remarkable compression capabilities.
Furthermore, the separation between support guessing and magnitude calculation allows our decoder to detect cases in which the reconstruction may be affected by significant errors, thus paving the way, for example, to additional processing that further increases the reconstruction performance.
We demonstrated the effectiveness of this novel approach addressing realistic ECG and EEG signals for which compression ratios in the excess of 2 can be reached with a computational burden not exceeding 32 signed sums per sample.

APPENDIX GENERATION OF ECG AND EEG DATASETS
Due to the large number of signal instances needed, in general, to train a neural network, both in the ECG and in the EEG case we used a MATLAB code to generate synthetic instances of the two classes of signals.
As mentioned in Section III, ECGs exhibit sparsity with respect to the orthonormal set of vectors representing the Symmlet-6 wavelet family transformation.Here, κ is set on 16 for n " 64 and 24 for n " 128.For the EEG signals the sparse vectors ξ are with respect the basis representing the Daubechies-4 wavelet transformation where k " t16, 26u matches n " t64, 128u.

A. ECG
The synthetic generator 6 of ECGs is thoroughly discussed in [49].Signals are generated as noiseless waveforms.The noisy cases are obtained by superimposing additive white Gaussian noise whose power is such that the intrinsic SNR, ISNR, is 60 dB.
The setup is the same detailed in [30].The heart-beat rate is randomly set using an uniform distribution between 60 beat{min and 100 beat{min.We generate chunks of 2 s with a 256 sample{s sampling frequency, that are split into windows of n subsequent samples.For both n " 64 and n " 128 cases we generate 8 ˆ10 5 input vectors x such that the corresponding total number of signal chunks are 10 5 and 2 ˆ10 5 .These input vectors are randomly split in training set an test set where the latter contains the 20% of the total amount of vectors x.

B. EEG
The detailed description of the code to generate the synthetic EEG signal7 can be found in [50].The generator emulates event-related brain potentials, modeling an evoked potential as the series of a positive and a negative peak occurring at a fixed time relative to the event.The peaks are added to uncorrelated background noise, whose power is set to a level such that the resulting signal is very similar to an EEG signal measured by a real scalp electrode.Though the software is able to generate all channels in a multi-electrode EEG according to the standard 10-20 system, we focus on the "Fz" electrode, since it is in proximity (but not exactly on the top) of the simulated source of the stimulus.The sampling rate is set to 1024 sample{s with a stimulus frequency of 1 Hz.
We generate tracks corresponding to 50 different patients by starting from the parameters used in [50] and adding a random uniformly distributed offset to each of them.The ranges of the offsets for the positive peaks are ˘16 samples for the position of the peak, ˘0.05 Hz for the peak frequency and ˘1 for the peak amplitude.Ranges for the random offsets for negative peaks are: ˘26 samples for the position of the peak, ˘1 Hz for the peak frequency and ˘4 for the peak amplitude.
Signal length for each patient is such that the total number of n-samples windows is 8 ˆ10 5 .After that, 20% of signal instances for each patient are randomly select to contribute at the test set while remaining 80% is for the training phase.

Fig. 1 .
Fig.1.General scheme of an encoder-decoder pair based on CS.In the decoder we distinguish the estimation of the signal support s and the estimation of the non-zero coefficients ξ |s .Classical decoders perform both estimation simultaneously.Our approach first estimates s and then ξ |s .

Fig. 2 .
Fig. 2. The effect of reducing n on the normalized sparsity κ {n in the two examples of a synthetic ECG and a synthetic EEG signal.

Fig. 3 .L
Fig. 3. Trained CS with support oracle block scheme including self-assessment capability.The fully connected DNN in the left is N C ˝LA with an output layer o P r0, 1s n .Estimated support ŝ is such that ŝj " 1 if o j ě o min and ŝj " 0 otherwise.The estimated support is employed in the signal reconstruction where the reconstructed signal x is also the input of the self-assessment block.

Fig. 5 .
Fig. 5. Performance in terms of ARSNR as function of the compression ratio for both TCSSO and the best observed traditional approach (RAK + BP).Results are for n " 64 as well as for n " 128 and for ECG (a) and EEG signals (b).

Fig. 6 .
Fig. 6.The mechanism granting self-assessment capabilities to decoders based on a support oracle

Fig. 7 .
Fig.7.The relationship between RSNR and RMNR for ECG signals.In all cases n " 64 and κ " 16.Orange dots correspond to cases in which ŝ includes all the components of s, while blue dots correspond to ŝ failing to identify some components in s.Above and to the right of the scatter plots are logarithmic histograms estimating the probability density of RMNR and RSNR.In (a) m " 32 " 2κ and ISNR " 8.In (b) m " 32 " 2κ and ISNR " 60 dB.In (c), m " 24 ă 2κ and ISNR " 8.

TABLE I PERFORMANCE
IMPROVEMENT IN TERMS OF ARSNR, COMPUTATIONAL OVERHEAD IN TERMS OF AC{sample AND INCREASE IN MEMORY FOOTPRINT FOR THE SENSING MATRIX (# ENTRIES OF A ˘) WITH n GOING FROM 64 TO 128.RESULTS ARE FOR TCSSO IN ECG AND EEG CASES WITH COMPRESSION RATIO (CR) RANGING FROM 2 TO 4.