Denoising scheme based on singular-value decomposition for one-dimensional spectra and its application in precision storage-ring mass spectrometry

This work concerns noise reduction for one-dimensional spectra in the case that the signal is corrupted by an additive white noise. The proposed method starts with mapping the noisy spectrum to a partial circulant matrix. In virtue of singular-value decomposition of the matrix, components belonging to the signal are determined by inspecting the total variations of left singular vectors. Afterwards, a smoothed spectrum is reconstructed from the low-rank approximation of the matrix consisting of the signal components only. The denoising effect of the proposed method is shown to be highly competitive among other existing nonparametric methods, including moving average, wavelet shrinkage, and total variation. Furthermore, its applicable scenarios in precision storage-ring mass spectrometry are demonstrated to be rather diverse and appealing.


I. INTRODUCTION
Noise is ubiquitous in the outcome of any physical experiment, owing to statistical fluctuations and various uncontrollable disturbances affecting the measuring system. It is superposed over the measured signal, and obscures the inference of the underlying ground truth. To reduce the noise effect, usually a batch of independent measurements under the same condition are performed for averaging, since the noise variance scales inversely with the averaging number. However, the accumulated statistics may be insufficient to bring down the noise due to pragmatic constraints. Or even the measurement is just a one-shot instance. Under those circumstances, data de-noising in the later analysis can only be resorted to as a mitigation.
It is quite common that measurement results are encoded in one-dimensional spectra, such as the energy spectrum of a -radioactive isotope, the frequency response of a notch filter, or the hourly temperature readings of a thermometer. Mathematically speaking, it is an ordered sequence of realvalued numbers, where usually the signal is smooth while the noise overlays complex irregularities like kinks and wiggles. In this sense, smoothing can be used interchangeably with de-noising in the context of one-dimensional data analysis.
Despite the voluminous data smoothing techniques, there are in general two categories, namely parametric and nonparametric. A parametric method starts with a data model built from a priori knowledge on the signal and the noise. Such knowledge can be a theoretical comprehension of the measuring system, or merely an educated guess. In contrast, a non-parametric method makes a minimal assumption about the noise and then lets the data speak for themselves. Thus with less restrictions, non-parametric methods own more versatility and applicability.
As an example in the non-parametric category, moving average probably is the most intuitive method [1]. The smoothed * cxc@impcas.ac.cn sequence is formed by replacing every noisy datum with the (weighted) mean of data in its neighborhood. This is essentially equivalent to convolution smoothing [2], where the noisy sequence is convolved with a smoothing kernel such as binomial, Gaussian, or exponential [3]. In the same vein, the neighborhood for averaging can be extended to encompass data with similar values, which has inspired two other methods: bilateral filter [4] and non-local means [5].
When viewed from a different perspective, the convolution smoothing is in fact low-pass filtering in the Fouriertransformed domain [6], where the smoothing kernel is the impulse response of the filter. Normally the signal is bandlimited whereas the noise covers the whole domain, hence a low-pass filter can effectively reject most of the noise. Based on the same Fourier transform, another method named Fourier thresholding takes a different approach to data de-noising [7]. It abandons sinusoidal components whose transform coefficients are below a given threshold, as they more likely belong to the noise.
Like Fourier transform, wavelet transform is another powerful tool in digital signal processing [8], which decomposes an input sequence into localized oscillatory components. The Fourier-based data de-noising methods can likewise be migrated to the wavelet-transformed domain. As a counterpart of the Fourier thresholding, wavelet shrinkage manipulates wavelet coefficients according to a specific criterion to smooth a noisy sequence [9,10].
Lastly, it is worth noting a special mainstream method in the non-parametric category: total variation regularization [11], whose underlying framework differs from the aforementioned. The idea is to minimize the total variation, which is the ℓ 1 -norm of its first derivative, of the smoothed sequence. Meanwhile, it also tries to preserve data fidelity as the regularizing condition. Since the ℓ 1 -rather than ℓ 2 -norm is adopted, this method is good at retaining discontinuities in the noisy sequence [12], but also suffers from a staircase effect in the smoothed one [13].
On the contrary, curve fitting of a pre-defined signal model by least-squares regression is a typical data de-nosing example in the parametric category. To apply the fitting, the noisy sequence can be treated as a whole or in partitions, where the latter gives rise to locally weighted regression [14] and Savitzky-Golay filter [15]. If additionally a noise model is defined, data de-noising can be achieved in a more sophisticated manner. Examples are Wiener filter [16] and Kalman filter [17], which are both rooted in statistical inference and estimation.
In the work presented here, we propose a non-parametric one-dimensional data de-noising method based on Singular-Value Decomposition (SVD). The SVD is a well-developed matrix factorization technique [18], which can be used to realize the idea of principal component analysis [19]. Its applications are prevalent in digital signal processing, especially in two-dimensional image-related works such as coding, watermarking, and de-noising [20][21][22][23]. With the aid of a bĳective map from a sequence to a matrix, the SVD also plays a role in one-dimensional data analysis, such as speech recognition [24], fetal electrocardiogram extraction [25], mass spectrometry [26], and fault diagnosis in mechanical systems [27][28][29][30]. In particular for the data de-noising, a noisy sequence is to be decomposed by SVD, where a smoothed sequence can be reconstructed from a selected subset of the components.

II. METHOD
Let us consider a noise-corrupted sequence of length , which consists of the signal and the noise : Here, we have made no assumptions about the noise other than being additive, zero-mean, and white. The proposed data de-noising method mainly includes three steps: (1) Embed the sequence into a matrix. (2) Apply SVD to the matrix and determine signal components. (3) Reconstruct a smoothed sequence with the selected components. In the following, each step will be elaborated.

A. Matrix selection
In literature there are mainly three approaches to embed the sequence into a matrix. Probably the simplest one is to evenly partition into segments, which are then stacked to form a matrix [25,27,31]. This approach is mostly suitable for a quasi-periodic sequence, when each segment roughly contains one or multiple periods. Another approach is to build a time-frequency representation of [32], which can be computationally expensive. As a matter of fact, the frequently adopted approach is to construct a Hankel matrix [24, 26, 28-30, 33, 34], which balances algorithmic performance and computational cost.
Specifically, the constructed Hankel matrix of size × ( − + 1) reads Apparently, the choice of will somehow influence the smoothing effect. It is shown that should best be /2 or ( + 1)/2, depending on the parity of [29]. That is to say, should optimally be (almost) square.
However, a drawback of the mapping in Eq. (2), which is often overlooked, is that the elements of are not put on an equal footing, i.e., the central elements tend to occur more times in than the lateral ones. Numerical experiments have proved that by construction of the signal details close to the sequence ends can badly be recovered after de-noising. Fig. 1 illustrates such a phenomenon with a synthetic sequence, which is the superposition of a sinc signal and a normal noise. The sequence of length = 1000 is embedded into with a row number = 500. It is evident that the de-noised sequence (see green line in Fig. 1) strongly deviates from the ground truth (see orange line in Fig. 1) at both ends, where the largest deviation amounts to 10.5%. What is worse, the deviation is so large that the de-noised sequence is notably off the noisy one, which indicates the inability to respect lateral information in the case of de-noising with .
To overcome this limitation, we thus propose to construct a partial circulant matrix of size × with ≤ : Every row of is a cyclic left-shift of the above one by an element, such that every occurs exactly times. By virtue of with the same row number = 500, the de-noised sequence, which is shown with the blue line in Fig. 1, resembles more the ground truth, and the overall deviation is within 2.3%. Moreover, the de-noised sequence tends to turn back to the ground truth at both ends, which is in contrast to the previous tendency as a result of de-noising with .

B. Low-rank approximation
For the matrix defined in Eq. (3), the Eckart-Young theorem has asserted that it can be decomposed to (at most) rank-one matrices [35]: where the singular values are non-negative and ordered nonincreasingly; the left singular vectors u ∈ R ×1 are orthonormal; and similarly for the right singular vectors v ∈ R ×1 . When expressed with matrix notation, Eq. (4) leads to the canonical form of SVD: where Σ is an × diagonal matrix composed of as the non-zero elements; is an × matrix composed of u as the column vectors, while is an × matrix composed of v .
The physical interpretation for is that each element describes the amount of energy distributed in the corresponding component. Specifically, the total energy of is proportional to the squared Frobenius norm of : Meanwhile, by Eq. (5), 2 can be written as Since is non-negative and non-increasing, by selecting the largest singular values and setting the remaining to zero, the resultant matrix is the best rank-approximation of in terms of minimizing the approximation error energy [35]. This sets the theoretical basis for the data de-noising method proposed here. In practice, most of the singular values are close to zero [see Fig. 2(l-a) for illustration], which can be attributed to the noise. Therefore, can be well approximated with a few predominant signal components only.
That said, one question remains to be answered: which index separates the signal components from the noise ones? Despite different solutions in literature, it is certainly impossible to agree on a universal approach that fits in every realistic situation and fulfills every practical requirement.
Nevertheless, a majority of the methods choose to look into the intrinsic structure of the singular values, or its variants like singular entropy [29,36], to search for the "elbow point" in Fig. 2(l-a). To name a few, one can inspect the first-order difference [29,32], the first-order quotient [27,30], or simply select the intersection of two extrapolated lines by eye [31]. All those methods depend on extra deciding parameters, which are usually specified ad hoc and vary from case to case. Consequently, it is rather difficult to implement them in an automated manner to de-noise a large batch of sequences. What is worse, the smoothed sequence is prone to subjective bias, and thus may vary from person to person.
A more advanced method seeks to approximate when regularized by the nuclear norm of the low-rank approximator [23,37]. To select an appropriate regularization parameter, this method relies on various unbiased risk estimators according to the prior knowledge on the noise statistics. Other sophisticated methods based on neural network dictionary learning require considerable training data, and often involve many iterations to solve non-linear minimization problems [25,33]. Apparently, those methods are computationally rather intense.
Here, we propose to tackle this challenge by inspecting the left singular vectors. As can be seen in Fig. 2(l-c), the first three belonging to the signal are quite smooth, whereas the noise vectors are very wiggly. Moreover, there exists a drastic behavioral change at the critical index, which separates the signal from the noise. This observation is further confirmed by Figs. 2(c-c) and 2(r-c), where the signal and the noise are individually inspected. To characterize such an impression, a quantity named Normalized Mean Total Variation (NMTV) is defined. For a given sequence of length , the NMTV is calculated as Obviously, is between zero and one, where zero means no variation and one corresponds to a zigzag pattern. A larger NMTV indicates that the sequence is more oscillatory. Hence, the NMTV is able to describe the smoothness of the sequence. Since the noise is usually wiggly, its SVD components can be discriminated by thresholding on the NMTVs of u . Numerical experiments have revealed that a universal threshold of 0.1 should be suitable in most, if not all, cases.

C. Signal reconstruction
Having selected the first SVD components, the low-rank approximation of is In general, is no longer partially circulant. Consequently, cyclic anti-diagonal averaging is applied to reconstruct the smoothed sequenceˆfrom :

D. Practical considerations
So far, the choice of has not been addressed. It is nevertheless worth noting that when = , i.e., is completely circulant, the method proposed here is equivalent to the Fourier thresholding [7]. Since any circulant matrix is diagonalizable by the unitary discrete Fourier transform matrix, its singular values obtained by SVD are in fact the modulus of the Fourier coefficients of [38]. However, to select a larger does not necessarily result in a better smoothing effect, as the signal energy is diluted by excessive singular values. A relatively small is not much helpful either due to an insufficient number of elements for the cyclic anti-diagonal averaging. Although in practice the choice of may be subject to the specific purpose of the de-noising problem, one can generally decide on an optimal which minimizes the NMTV of the de-noised sequence to attain the smoothest result. Numerical experiments have suggested an empirical rule to achieve an optimal de-noising effect: 0.1 < < 0.4 .
Sometimes, it will be unfortunate to notice a ringing artifact in the smoothed data, in particular when the signal has a significant gap between its two ends (Fig. 3). This should be owing to the same reason as for the Gibbs phenomenon in Fourier transform [39]. As a cure for the artifact, a linear trend, if at all, in the signal should be removed to close the gap before de-noising. Whether the de-trending is practically necessary is judged by a comparison between the signal gap Δ and the noise standard deviation .
Unfortunately, neither of them are known, and hence they must be estimated from . Despite the wiggles at both ends of , the respective signal end-values can be recovered somehow by averaging a few neighboring data, based on which Δ is estimated. Moreover, can be estimated from the noise singular values. By recalling Eqs. (1), (6), and (7), it can be written that where, due to the cancellation among , the summation of the cross terms on the left side contributes rather little. Therefore, can be approximated as Should turn out to be less than Δ after de-noising, a linear trend is significantly present and must be removed. However, it may happen sometimes that for the de-trended yet noisy sequence, is still less than Δ, which suggests that the estimate of the latter by averaging is strongly biased by the signal itself. Under such circumstances, it is advisable to reduce the number of elements for averaging and redo the de-trending. This adaptive de-trending process will surely terminate, in the worst case, when the averaging number becomes one.
In summary, the algorithm of the proposed data de-noising method is organized as follows.
1. Begin with a noisy sequence and a row number .

Construct a partial circulant matrix by Eq. (3), then
apply SVD to it.
4. Estimate the signal gap Δ by averaging a few end-data in , as well as the noise standard deviation by Eq. (12).

III. EXAMPLES
To illustrate the smoothing effect of the proposed method, four synthetic signals of the same length = 1000 are adopted for test. They are intentionally selected to embody different data traits that are frequently seen in practice. Specifically, their synthetic formulas are listed below, with 0 ≤ < 1000.
where erfc( ) is the complementary error function.

multiple peaks on a ramp:
= −(0.02 −6) 2 + 2 + (0.04 − 28) 2 −1 + −0.001 , (15) which contains a Gaussian peak and a Lorentzian peak. chaotic dynamics: It is well known that a Duffing oscillator will undergo chaos under certain conditions [40]. Here, we adopt the following parametrization of the Duffing equation: where denotes the oscillation amplitude, and the dotted versions denote its derivatives with respect to time . With the initial value ( , ) 0 = (1, 0), the differential equation is numerically solved by the Runge-Kutta fourth-order method in an interval of 0 ≤ < 50 with a resolution of 0.05.
The synthetic signals are then contaminated by adding respectively two types of noise, namely uniform and normal, to produce noisy sequences. The amount of the contamination is controlled by the Signal-to-Noise Ratio (SNR) ranging from 0 to 20 dB, which is defined as: (dB) = 10 log 10 where · denotes arithmetic mean. It is clear that a smaller will lead to a more wiggly sequence. A fixed row number = 250 is selected for all the smoothing tests. The de-noising goodness is assessed by the Normalized Root-Mean-Square Deviation (NRMSD) of the smoothed sequenceˆfrom the signal , which is defined as A smaller indicates a greater power to reveal the ground truth.
The test results are shown in Figs. 4-7. It can be seen that the proposed method is well able to smooth the noisy sequences, regardless of the underlying signals, even when the SNR is down to 0 dB. The achieved NRMSDs are quite similar for both types of noise, although the normal noise usually results in a slightly larger NRMSD, which may be owing to its larger kurtosis.

IV. APPLICATIONS
Owing to its non-parametric nature, the method presented here is promising to find its applications in vast data de-noising scenarios. Furthermore, by allowing for a manual selection of the approximation rank, the proposed method will be a powerful data analysis technique not just for de-noising. In what follows, three data analysis challenges in precision storagering mass spectrometry are presented to showcase its diverse applications in reality [41].

A. Pulse timing
In the isochronous mass spectrometry [42], a relativistic charged particle periodically circulates in a vacuum environment due to the confinement by magnetic field [43]. A time-offlight detector consisting of an ultra thin carbon foil is placed in the vacuum chamber to register every passage of the particle, which is signaled by a sharp voltage pulse [44]. Fig. 8 shows a zoomed example of such a pulse. A precise determination of the pulse onset is critically important for a high-precision (< 1 ppm) mass measurement [45][46][47]. Unfortunately, due to thermal noise and quantization error, in particular for weak pulses, the pulse leading edge is distorted, which limits the timing accuracy. Furthermore, the voltage uncertainty in pulse samples also degrades the timing precision, since the leading edge can only have a non-zero fall time, which is the duration for the pulse to dive from 90% to 10% of its full depth. In general, a smaller fall time will result in a better timing precision. As for the timing accuracy, it can be improved by employing a competent data de-noising technique. With the same pulse of length = 401 in Fig. 8, we have compared four de-noising methods of non-parametric type, namely the one proposed here, moving average, wavelet shrinkage, and total variation. The NMTVs of their de-noised data, which are shown in Table I, are controlled to be almost the same to ensure comparable smoothness. Afterwards, the fall time is measured for each smoothed sequence, except for the total variation since the staircase effect unfeasibly allows for a clear definition of falling interval. In what follows, the technical implementing details are presented. For the proposed method, a row number = 105 is determined. As for the moving average, a Gaussian kernel of length 45 and standard deviation 9 is employed: When applying the wavelet shrinkage, the Symlets-4 is adopted to perform the wavelet decomposition. Afterwards, the wavelet coefficients are thresholded on 3.406 using the non-negative garrote scheme [48,49]. The wavelet transform is carried out in Python with the PyWavelets package [50]. Lastly, the total variation is based on the algorithm for onedimensional data in Ref. [51], and computed with the C code available in Ref. [52]. The regularization parameter for total variation is selected to be 2.89. The de-noised results by these four methods are shown in Fig. 8, and the obtained fall times are listed in Table I. It is found that the proposed method produces the smallest fall time, which can in principle lead to the best timing precision. In contrast, the wide kernel of moving average not only leaves apparent vacancy in the de-noised data at both ends, as pointed by arrows in Fig. 8(b), but also smears out the leading edge. The wavelet shrinkage superfluously causes a leading bump and trailing spikes, as pointed by arrows in Fig. 8(c), where the former also degrades the fall time. As for the total variation, the disturbing staircase effect, as shown in Fig. 8(d), makes it improper for accurate and precise pulse timing. It can therefore be concluded that by means of the proposed method, timing of weak pulses has become feasible with satisfactory accuracy and precision, thus relaxing the saturation constraint of the detector [53].

B. Pre-whitening
In the Schottky mass spectrometry [54], a dedicated resonator is employed to detect a revolving particle in the storage ring by sensing its electromagnetic radiation [55]. Owing to the periodic motion of the particle, the power spectral density of its Schottky signal detected by the resonator peaks at each harmonic of the particle's revolution frequency [56]. Among those harmonics, the one in the resonant range of the resonator is band-pass filtered and mixed down to the base band for data acquisition [57]. Fig. 9 shows an example of a peak harmonic on top of a resonant background. In certain cases, the integral peak area is of the experimental interest, from which the particle number can be inferred [56,58]. This is a prior step for, e.g., the measurement of the particle's lifetime [59], or the calibration of the particle's position [60]. Therefore, a proper deduction of the background is of the experimental importance.
Although the Lorentzian shape of the background is well understood [61], it is improper to directly apply fitting to the spectrum as the peak will strongly deviate the result (see blue line in Fig. 9). In contrast, the method proposed here with a bit of adaptations can elegantly eliminate the peak interference. First recall that the Fourier coefficients of any noise are zero-mean circular complex Gaussian [62]. Consequently, the power spectral density of the noise, which is the squared modulus of the coefficients, obeys a chi-squared distribution with 2 degrees of freedom. Its probability density function is in fact a negative exponent, which happens to be the same distribution as a speckle noise obeys [63]. Since the speckle noise is usually modeled as a multiplicative noise [64], it is analogous to conclude that the noise in the Schottky spectrum is also multiplicative.
Therefore, a logarithmic transform should first be applied to the spectrum so as to use the additive model in Eq. (1). The transformed spectrum of length = 1639 is then embedded into a partial circulant matrix of = 290 rows. This time only the first 2 components are taken to reconstruct the background. As can be seen in Fig. 9, the resultant approximation of red line is in excellent agreement with the reference of green line, which is obtained by fitting a "bald" spectrum when particles are purged from the storage ring.
The proposed method hence opens up a possibility to deduct the resonant background in situ. It will be no longer necessary to interrupt an ongoing experiment just for the reference measurements, which increases effective on-target beam time. Weak peak detection in a Schottky spectrum by the proposed method in this work. (a) The red line is the de-noised spectrum, and the green line, which is obtained by averaging 100 similar spectra, provides a reference. All local maxima in the red spectrum are marked with blue crosses, which also respectively indicate their prominences (vertical extent) and widths (horizontal extent). The detected peaks are labelled, in a descending order of PWR, as 1, 2, and 3. (b) The PWR of all eight peak candidates, among which the largest three are considered as real peaks associated with particles. This is in particular useful in case the yields of aimed particles are extremely low.

C. Peak detection
Another common challenge in Schottky mass spectrometry is the detection of weak peaks. Fig. 10(a) shows an example of such a case. Apart from the predominant peak in the center, it is impossible to unambiguously detect other peaks in the noisy spectrum. A normal solution is to continuously collect a number of spectra and average them to improve the SNR. For example, the green line in Fig. 10(a) is obtained from 100-fold averaging, which can barely reveal two weaker peaks on the left side. An obvious drawback of this approach is the degradation of time resolution, which, in this case, is 100 times worse. It is thus unfavorable for the time-resolved Schottky mass spectrometry [65], and in particular for the single-particle decay spectroscopy [66]. Even worse is that sometimes it is unfeasible to collect sufficient spectra, because, e.g., the aimed particle may have already decayed [67].
By using the proposed method, the noisy spectrum of length = 1639 is first log-transformed and then embedded into a partial circulant matrix of = 430 rows. A smoothed spectrum is reconstructed from the first 7 SVD components, which is shown as the red line in Fig. 10(a). Peaks are searched among local maxima in the smoothed spectrum as the candidates. For each candidate, its prominence and width are subsequently measured, where the prominence is the relative height from the summit to the highest adjacent valley, and the width is the horizontal span at half prominence [see blue crosses in Fig. 10(a) for illustration]. Afterwards, the Prominence-to-Width Ratio (PWR), shown in Fig. 10(b), is calculated for each candidate to characterize its significance. Based on this parameter, in total three peaks are decisively identified with the largest PWRs and labelled with 1, 2, and 3 in Fig. 10(a). The remaining candidates are only considered as bumps due to their insignificant PWRs. It is clear in Fig. 10(a) that the detected peaks perfectly align with those in the averaged spectrum, and, moreover, with a 100-fold improvement of time resolving power.
Once the locations of weak peaks are coarsely determined from the smoothed spectrum, they can subsequently be transferred to a dedicated peak-finding algorithm as the required prior knowledge to refine the result [68]. What is more, the proposed method can be deployed to a real-time task to fast track particles in a dynamic process such as orbital-electron capture [69] or bound-state − decay [70] on an event-by-event basis.

V. CONCLUSION
An SVD-based one-dimensional data de-noising scheme has been proposed. Owing to its non-parametric nature, it works for any additive noise, as well as multiplicative noise after logarithmic transform. By construction of a partial circulant matrix, the proposed method well balances computational simplicity and smoothing efficiency. The proposed NMTV criterion to select the optimal approximation rank allows for an integration into automated processes, which is a compelling feature for certain applications in reality. With the real-world data, the proposed method has proved its strong competitiveness among other non-parametric de-noising methods such as moving average, wavelet shrinkage, and total variation. Furthermore, a few interesting applications of the method other than de-noising were also addressed, and its competence has been demonstrated with cases in precision storage-ring mass spectrometry. Lastly, a Python implementation of the proposed method is provided and placed in the public domain [71].