Open EDFA gain spectrum dataset and its applications in data-driven EDFA gain modeling

Optical networks satisfy high bandwidth and low latency requirements for telecommunication networks and data center interconnection. To improve network resource utilization, machine learning (ML) is used to accurately model optical amplifiers such as erbium-doped fiber amplifiers (EDFAs), which impact end-to-end system performance such as quality of transmission. However, a comprehensive measurement dataset is required for ML to accurately predict an EDFA’s wavelength-dependent gain. We present an open dataset consisting of 202,752 gain spectrum measurements collected from 16 commercial-grade reconfigurable optical add–drop multiplexer (ROADM) booster and pre-amplifier EDFAs under varying gain settings and diverse channel-loading configurations over 2,785 hours in total, with a total dataset size of 3.1 GB. With this EDFA dataset, we implemented component-level deep-neural-network-based EDFA models and use transfer learning (TL) to transfer the EDFA model among 16 ROADM EDFAs, which achieve less than 0.18/0.24 dB mean absolute error for booster/pre-amplifier gain prediction using only 0.5% of the full target training set. We also showed that TL reduces the EDFA data collection requirements on a new gain setting or a different type of EDFA on the same ROADM.


INTRODUCTION
Telecommunication networks and cloud infrastructure rely on amplified optical networks to deliver high data rates over metro and long-haul distances.Reconfigurable optical add-drop multiplexers (ROADMs) are used to add and drop signals within such networks, and erbium-doped fiber amplifiers (EDFAs), sometimes in tandem with Raman amplifiers, are used to overcome node and link losses.The EDFA output power is typically the main determinant of the signal launch power, and the EDFA noise figure sets the accumulated amplifier noise levels, which impact end-to-end system performance metrics such as the optical signal-to-noise ratio (OSNR) and other quality of transmission (QoT) measures [1].However, characterizing the gain spectrum of an EDFA is challenging as it depends on many factors such as the internal hardware architecture, gain setting, channel-loading configuration, and input power levels.For these reasons, vendors are motivated to treat the wavelength-dependent gain of EDFAs as a variable quantity accounted for through margin allocations.Therefore, better characterization of amplifier gain is of interest to achieving low-margin systems.
Recent work has focused on developing accurate models for the wavelength-dependent gain profiles of optical amplifiers like Raman amplifiers [2,3] and EDFAs [4,5], which can be further used for effective prediction of the optical power spectrum evolution [6] and QoT estimation [7,8].It has been shown that machine learning (ML) models, such as those based on deep neural networks (DNNs), can achieve prediction accuracy primarily limited by the measurement resolution if the model is trained on large EDFA gain spectrum measurement datasets.However, such prior work is built on datasets collected from very few EDFAs and usually only considers a limited set of channel-loading configurations and/or input power levels.Moreover, these datasets are not publicly available, therefore making it challenging to compare different EDFA models using the same baseline, as the measurement resolutions and methods may differ from one experiment to the next.
Although the DNN-based EDFA gain model can achieve high gain spectrum prediction accuracy, it requires collecting a comprehensive set of EDFA gain spectrum measurements for each EDFA.For example, collecting such a dataset for a single EDFA covering different gain settings and diverse channel-loading configurations can consume up to 51 h [9].A promising solution to overcome this challenge is to apply transfer learning (TL) [10,11], which is an ML technique that allows for building a new target model based on a pre-trained source model that shares similar model knowledge using very few data samples collected from the target domain.
In this paper, we make two key contributions aiming to address these challenges.First, we present an open dataset of the gain spectrum measurements including 16 EDFAs within 8 commercial-grade Lumentum ROADM-20 units deployed in the city-scale PAWR COSMOS testbed [12].We consider EDFAs within the ROADM units targeting metro networks, which typically consist of more ROADM amplifiers than inline amplifiers compared to long-haul networks.The dataset includes measurements collected from eight booster EDFAs, each with three gain settings, and eight pre-amplifier EDFAs, each with five gain settings.For each EDFA at a given gain setting, 3,168 gain spectrum measurements are collected with a set of diverse channel-loading configurations and varying input power levels.Importantly, all data is collected using the built-in photodiodes (PDs) and optical channel monitors (OCMs) of each ROADM unit without relying on any external measurement equipment, thereby providing the equivalent of in situ characterization results.The comprehensive 202,752 EDFA gain spectrum measurement dataset collected from 16 EDFAs over 2,785 hours is shared publicly and available at [9].We believe that this dataset can serve as an open resource for researchers to evaluate and compare different ML-based EDFA models.It can also be potentially integrated with emulators such as mininet-optical [13] and planning tools such as GNPy [1].
Second, we investigate the use of TL-based EDFA gain models and show that, using only 0.5% of the new data collected from the target EDFA (13 measurements), the transferred target model can achieve similar gain prediction accuracy compared to the source model with the full training set (2,732 measurements).We demonstrate three different scenarios that can benefit from TL with a largely reduced EDFA data collection process: (i) TL between EDFAs of the same type, (ii) TL between different EDFA gain settings, and (iii) TL between different EDFA types.For TL between EDFAs of the same type, we achieve an average median absolute error of 0.08 dB for booster amplifiers and 0.10 dB for pre-amplifiers.For TL between different EDFA gain settings, 0.16 dB MAE is achieved averaged from two gain settings transferred to and tested on another gain setting.For TL between different EDFA types, 0.16 dB MAE is achieved.Based on these evaluation results, TL-based EDFA gain models can reduce data measurement times 200× without sacrificing model prediction accuracy.
The rest of the paper is organized as follows.We review related work in Section 2. We present the EDFA gain spectrum measurement setup and analysis of the collected dataset in Sections 3 and 4. Using the collected dataset, we present the DNN-and TL-based EDFA gain spectrum model in Sections 5 and 6, and conclude in Section 7.

RELATED WORK A. Traditional EDFA Models
EDFAs in terrestrial wavelength-division multiplexed (WDM) systems with channel add-drop multiplexing use automatic gain control (AGC) to maintain a target gain, which controls the total power gain rather than the gain of individual channels.For example, if the target gain setting is 18 dB, the actual gain spectrum can have a fluctuation of ±0.5 dB across different wavelength channels.Moreover, the channel gain deviates from the target gain under different channel loading, input power level, and gain settings, which can be characterized by a physical model given by Eq. ( 1) [14]: where G TC and G M are the target gain and mean gain, respectively.g m (λ i ) is the original channel gain in the ith wavelength channel at λ i , before new input power P j and the corresponding residual ripple g j , and the tilt t j is applied to the j th wavelength channel at λ j .The noise includes five different factors: the total input noise N I , total amplifier input-referred noise N R , amplifier AGC noise compensation factor N C , average incident noise gain ripple g I , and input-referred noise gain ripple g R .However, fully characterizing such factors is challenging due to many practical reasons.
In practice, the gain variations described in Eq. ( 1) follow a center of mass weighting of the channel powers by their wavelength-dependent gain functions.Based on this, another well-known model is the center of mass (CM) model, which uses simple measurements to predict the EDFA gain spectrum and, for equal channel powers, is given by where g wdm (λ i ) and g single (λ j ) are the gain of the ith wavelength channel under WDM and single-channel-loading configurations, respectively.Equation ( 2) is usually accurate for the two extreme cases of one channel and all channels turned on, and approximates the gain spectral behavior for other loadings, which can vary significantly for complex multi-stage amplifiers and due to effects such as spectral hole burning [15].

B. ML-Based EDFA Models
Recent research has also focused on using ML to better characterize the wavelength-dependent gain spectrum of EDFAs.In particular, a DNN-based EDFA gain model was proposed in [4], where individual sub-models are used to predict the EDFA output power for random channel configurations under one single gain and one tilt setting.The measurement is collected using built-in OCMs and PDs, the first example of an in situ monitor-based model.Another DNN-based EDFA model with high accuracy to predict partial-fill EDFA gain profile was proposed in [5], which was trained using a dataset consisting of 50,000 measurements using an optical spectrum analyzer (OSA) with high resolution.The prediction was related to the WDM measurements, as the output of the model was the power difference between fully loaded (WDM) and partially loaded (arbitrary) channel power.You et al. [16] consider optical signal-to-noise ratio (OSNR) prediction using EDFA models that use two different models to predict gain profile and noise figure separately, with an additional OSA for data collection.
Although individual EDFA models were well investigated, there are still limitations regarding the use of the ML-based approach.First, training ML-based EDFA models requires tremendous amounts of data that is time consuming to collect in real-world scenarios, especially in already deployed networks.Second, an EDFA model only applies to itself, and data recollection and model retraining are required for new EDFAs.Zhu et al. [17] proposed a hybrid ML-based EDFA model that combines analytical and ML-based models to reduce the training dataset size and training time.The results showed that 46% of training samples or 20% training time were reduced when using the output from an analytical model feeding into the ML model.This method reduced data collection time but still trained each EDFA model individually.In addition, Da Ros et al. [18] showed that a pre-trained ML-based EDFA model can be directly extended to multiple physical devices of the same make, with low prediction error, by training an EDFA model using measurements collected from multiple EDFAs with a benchtop OSA.However, this approach assumes EDFA gain profiles of the same make were highly similar as it uses a single model to predict multiple EDFA gain profiles.
ML-based EDFA models were also integrated with multispan optical transmission systems for QoT prediction.In Yankov et al. [19], a generalized EDFA model trained on separately collected gain spectrum measurements using an OSA is utilized to predict the OSNR across 8 channels in a 3-span link.In Kamel et al. [20], OSNR prediction in a 20-span link with 40 channels using characterized inline EDFAs is demonstrated, without considering the model generalization to different topologies.In Wang et al. [6], individually characterized component-level EDFA models were applied to 5-span ROADM systems with 95 channels and 10 EDFAs, where each model was trained using measurements collected using built-in OCMs and photodiodes of the ROADM units.
Compared to this prior work, our study focuses on creating an EDFA open dataset and component-level EDFA modeling.We collected gain profile measurements on 16 commercial-grade Lumentum EDFA devices with different channel-loading configurations, input power levels, and gain settings.We report the gain profile difference for EDFAs of the same make and show gain profile variations across a long time period.In addition to the EDFA dataset, we also show that measurement time can be largely reduced with TL for EDFAs of the same make.In particular, TL can be used between different EDFA devices of the same type, different gain settings on the same EDFA device, and different EDFA types on the same ROADM.A portion of this paper is an expansion of our recent work [21].

EDFA GAIN SPECTRUM MEASUREMENT SETUP AND DATA COLLECTION
We now describe the EDFA gain spectrum measurement setup using the COSMOS testbed and the data collection pipeline.

A. PAWR COSMOS Testbed
The PAWR COSMOS testbed is a city-scale optical-wireless programmable testbed being deployed in Manhattan, New York City, to support advanced optical and wireless experiments [22].A more detailed description about COSMOS' programmable optical network and the supported applications can be found in [12].In particular, the testbed consists of one Calient S320 320 × 320 space switch, one Dicon 16 × 16 space switch, 8 commercial-grade Lumentum ROADM units, one customized comb source, various lengths of fiber spools, and a dark fiber network between Columbia University, the colocation facility at 32 Avenue of the Americas (32 AoA), and the City College of New York (CCNY), some of which is shown in Fig. 1.Using the space switching and WDM switching capabilities, different topologies in the optical physical layer that emulate varying metro networks can be constructed [6,23].

B. EDFA Gain Spectrum Measurement Setup
We characterize the gain spectrum of 16 EDFAs of two types: booster (B) and pre-amplifier (P), as part of eight commercialgrade Lumentum ROADM-20 units.Figure 1 shows a block diagram of the Lumentum ROADM-20 unit and the measurement setup of a device under test (DUT) EDFA.Each ROADM unit consists of one MUX wavelength-selective switch (WSS), one DEMUX WSS, one booster EDFA (at line out), and one pre-amplifier EDFA (at line in).Each ROADM is also equipped with total power and channel power monitoring capabilities using the built-in PDs and OCMs with a power measurement resolution of 0.01 and 0.1 dB, respectively.We use a comb source to generate a set of 95 × 50 GHz WDM channels in the C-band following the ITU DWDM 50 GHz grid specification [24].
Figure 1 shows the booster and pre-amplifier EDFA measurement topology.With a DUT booster EDFA, the output of the comb source is connected to an add port of the MUX WSS, which applies the channel-loading configuration, adjusts the power level in each loaded channel, and generates a flat input power spectrum to the DUT EDFA.The output of the DUT booster is terminated.Similarly, with a DUT pre-amplifier EDFA, the output of the comb source is first connected to the pre-amplifier EDFA and DEMUX WSS of an auxiliary ROADM, whose DEMUX WSS applies the channel-loading configuration, adjusts the power level in each loaded channel, and generates a flat output power spectrum at the input of the DUT pre-amplifier EDFA.The output of the DUT preamplifier EDFA is terminated by the following DEMUX WSS.The wavelength dependent gain spectrum of each EDFA, denoted by g (λ i ), can be characterized by its input power spectrum, S in (λ i ), and output power spectrum, S out (λ i ), i.e.,  g with λ 1 = 1,529.16nm (196.050THz) and λ 95 = 1,566.72nm (191.350THz).
We use the Network Configuration Protocol (NETCONF) with Yet Another Next Generation (YANG) data modeling language to control and collect data from each Lumentum ROADM unit.For example, we use the add-connection command to apply the channel-loading configuration, whose input parameters include the MUX/DEMUX WSS module, connection index, start/end frequencies, attenuation, input/output ports, and channel block status.After waiting for a certain amount of time for the optical system to stabilize, we use the monitored-channels and monitored-connections commands to obtain the input and output channel power spectrum measurements.EDFA-related information, such as the gain setting and gain tilt, can be retrieved via the edfas command.The collected EDFA input/output power spectrum, together with other system information, is stored in machine-actionable json files, which we describe in Section 3.D.

C. Channel Loading Configurations
One main challenge associated with the data collection process is the large number of channel-loading configurations, which can affect the wavelength dependent gain, g (λ i ), of each EDFA.However, it is impossible to measure all 2 95 configurations with 95 × 50 GHz channels where each channel can be switched ON/OFF with all different input channel power levels.To address these challenges, we carefully design five sets of diverse channel-loading configurations (see Fig. 2) with different numbers of channels n:

D. Collected Dataset
We consider a target gain of g B ∈ {15, 18, 21} dB and g P ∈ {15, 18, 21, 24, 27} dB for each booster and preamplifier EDFA, respectively, in the high-gain mode.We consider 0 dB gain tilt for all EDFAs targeting metro networks, where Raman tilt is less significant and thus 0 dB gain tilt is a simple and low-cost option that can be selected by different service providers and system vendors.For each of the 16 EDFAs at a given gain setting, a total number of 3,168 measurements are collected where, for each channel-loading configuration, we also collected repeated measurements with varying EDFA input power levels for comprehensiveness, as summarized in Table 1.In particular, each measurement is stored in the machine-actionable json COSMOS EDFA format (see Listing 1 for the structure of the captured measurement data), which includes (i) the input and output power spectrum of the EDFA measured by the OCM, S in (λ i ) and S out (λ i ), from which g (λ i ) can be derived; (ii) the total input and output power of the EDFA measured by the PD, P in and P out ;  (iii) auxiliary information such as the EDFA gain setting, channel-loading configuration, and WSS attenuation setting.
The EDFA gain profile measurement can be timeconsuming, mainly due to the time it takes to set the WSS attenuation values (0.85 s) and channel-loading configuration (3 s), and to fetch the OCM/PD readings (6 s).To guarantee that the OCM power readings are reliable, extra waiting time is applied depending on channel-loading conditions.On average, each measurement lasts ∼41 s and ∼58 s for the booster and pre-amplifier EDFA, respectively.Overall, the collected dataset, with a size of 3.1 GB, includes a total number of 202,752 gain spectrum measurements across 16 EDFAs collected over 2,785 hours.

EDFA GAIN SPECTRUM MEASUREMENT RESULTS
We now provide a quantitative overview of the collected EDFA gain profile measurement dataset.For each EDFA with a given channel-loading configuration, we focus on (i) the total input-output power relationship, (P in , P out ); (ii) the gain ripple as a function of the wavelength, given by g (λ i ) = g (λ i ) − g 0 , ∀ i, where g 0 is the EDFA gain setting, i.e., we consider gain ripple normalized to the target gain instead of with zero mean; (iii) the peak-to-peak gain ripple, given by max i { g (λ i )} − min i { g (λ i )} across the loaded wavelength channels.
Figure 3 shows the total input-output power relationship, (P in , P out ), of the collected EDFA gain spectrum measurement dataset obtained using the built-in PDs (see Fig. 1).The inputoutput power measurements are also overlaid on the EDFA gain mask (in the high-gain mode) with the corresponding operation/alarm range of each EDFA specified by the vendor (Lumentum).In particular, different curves represent the measured (P in , P out ) values under different gain settings across all EDFAs of the same type.It can be seen that the collected dataset covers a significant portion of the high gain range for both the booster EDFA (13.4-23.4dB) and pre-amplifier EDFA (14.8-29.8dB).Overall, most of the measurement results exhibit a linear input-output power relationship, except for scenarios where the EDFA output power is close to the lower limit of the operation range, e.g., P in < −20 dBm.These measurements are important since the built-in OCMs can be ensured to maintain the 0.1 dB measurement resolution when the EDFA operates within the operation range, and may provide alarming and fault detection when the EDFA operates outside the operation range but inside the alarm range.
Through analysis of the gain ripple spectrum of individual EDFAs under different gain settings and channel-loading configurations, a better understanding of the wavelengthdependent gain spectrum among all tested EDFAs can be derived.Figure 4 shows examples of the measured gain ripple spectrum, g (λ i ), for all 16 EDFAs at 18 dB gain setting, under the full (WDM) and single-channel-loading configurations.The gain ripple is normalized to the target gain (instead of with  zero mean) and the different gain profiles across EDFAs can be clearly visualized.The solid lines represent the mean gain ripple averaged across all measurements, while the shaded areas represent the full range of the measured gain ripple, including the minimum and maximum values.It can be seen that different types of EDFA (booster or pre-amplifier), or different EDFA devices of the same type (e.g., booster EDFA of two ROADM units), have different gain ripple spectra and that the gain ripple spectrum of each EDFA also depends on the channel-loading configurations (WDM or single channels).
In addition, Fig. 5 shows the measured gain ripple spectra of the EDFAs in ROADM 1 and ROADM 5 across all considered gain settings and under the full (WDM) and singlechannel-loading configurations.It can be seen that with the full (WDM) channel-loading configuration, the gain ripple spectrum for the booster EDFA is similar across the 15/18/21 dB gain settings, but different from that of the pre-amplifier EDFA across the same gain settings, especially in channels with longer wavelengths.Similarly, the gain ripple spectrum for the pre-amplifier EDFA is similar across the 15/18/21/24/27 dB gain settings.Overall, it can be observed that the gain ripple profile for each EDFA depends on many factors including the gain setting, channel-loading configurations, and input power level.To obtain overall statistics of the gain ripple spectrum for different EDFA devices, types, and gain settings, Fig. 6 shows the mean peak-to-peak gain ripple averaged across all channelloading configurations for each EDFA at a given gain setting.In particular, each entry represents the mean peak-to-peak gain ripple for one EDFA at a given gain setting, averaged across 3,168 gain spectrum measurements.The results show that the mean peak-to-peak gain ripple is within a range of 0.5-0.9dB but varies by different EDFA types, devices, and gain settings.
We also evaluate the variation of the EDFA gain spectrum across a long time period, which is important due to factors such as the potential aging of the hardware.In particular, 10 months after completion of the initial dataset collection, we re-collected the gain spectrum measurements for all 16 EDFAs at 18 dB gain settings and under the same channelloading configurations (see Table 1).Figure 7 shows the gain ripple spectrum for the EDFAs in ROADMs 1 and 5 under the WDM (full) channel-loading configurations from the first and second measurement round using solid and dashed lines, respectively.It can be seen that the difference in the gain ripple  spectrum across the measurements spanning 10 months is minimal, i.e., the difference in the mean gain ripple is only <0.2 dB.We also analyze the mean/95th-percentile/maximum absolute difference in the gain spectrum measurements spanning 10 months for individual booster and pre-amplifier EDFAs, and the results are shown in Fig. 8.Note that the 95thpercentile and maximum absolute difference has a resolution of 0.1 dB, due to the 0.1 dB measurement resolution of the builtin OCMs.Overall, the mean difference in the gain spectrum measurements spanning 10 months is less than 0.1 dB, while the 95th-percentile difference is within 0.3 dB.We would like to note that more measurements are ongoing, with the aim to provide a comprehensive characterization of the EDFA gain spectrum across different devices and time spans.

DNN-BASED EDFA GAIN SPECTRUM MODEL
In this section, we present a DNN-based EDFA model for characterizing the wavelength-dependent gain spectrum using the collected dataset, and compare it against the CM model.

A. Architecture of the DNN-Based EDFA Model
Figure 9 shows the DNN model architecture, which consists of an input layer, four hidden layers with 256/128/128/128 neurons, and an output layer, where the neurons are initialized by the Kaiming normalization.The input features to the DNN model include the EDFA gain setting (g 0 ), total input and output power (P in and P out ), input power spectrum (S in (λ i )), and a binary vector indicating the channel-loading configuration, denoted by c = [c i ] 95 i=1 , with The output layer predicts the EDFA gain spectrum, g (λ i ), corresponding to the input parameters.For the input and hidden layers, we apply batch normalization and use the exponential linear unit (ELU) activation function.We consider the following loss function based on the mean squared error (MSE) of the predicted and ground truth gain spectrum profile across the loaded channels: where g pred (λ i ) and g meas (λ i ) denote the predicted and measured gain in the ith wavelength channel, respectively.A component-level DNN model is trained for each EDFA with the same setting: a gradient clipping threshold of 3.0 and a learning rate of 0.001 over 600 epochs.We use all the EDFA gain spectrum measurements under three gain settings of 15/18/21 dB to train and test the performance of the DNN-based EDFA gain model.Note that although there are two additional gain settings for the pre-amplifier (24/27 dB), we only choose the dataset corresponding to the 15/18/21 dB gain setting to keep a consistent size of the dataset used for the DNN model training and testing across the booster and pre-amplifier EDFA types.For each gain setting, we split the EDFA gain measurement dataset into the training/test sets with a split ratio of 0.86/0.14:2,732 gain spectrum measurements are used as the training set, and the remaining 436 gain spectrum measurements are used as the test set.Specifically, the test set includes 20% of the Fixed Goalpost (216 measurements) and 20% of the Random Baseline (220 measurements) gain spectrum measurements, which represent a diverse set of channel-loading configurations with randomly selected channels and groups of close-by channels (see Table 2).

B. Performance of the DNN-Based EDFA Gain Model
We now show the performance of the developed DNN-based model and compare it with the CM model [Eq.( 2)]. Figure 10 shows the mean absolute error (MAE) and standard deviation of the EDFA gain spectrum predicted by the component-level DNN and CM models, across eight booster and pre-amplifier EDFAs using test sets with different channel-loading configurations (random and goalpost).Specifically, across the eight  Figure 11 shows the mean and standard deviation of the positive/negative gain prediction errors when keeping the sign, i.e., separately calculated across the errors with the ± signs.The results show a similar trend whereby the DNN model outperforms the CM model in terms of prediction accuracy.Overall, the DNN-based model suffers from slightly larger prediction errors for the goalpost test set compared to the random test set.To visualize how the prediction errors distribute across the frequencies, Fig. 12 shows the mean error of the positive/negative (+/−) gain prediction in each wavelength channel, across all eight EDFAs of each type and both the  random and goalpost test sets.It can be observed that the perchannel mean errors are similar across the wavelength channels on both booster and pre-amplifier EDFAs.
Figure 13 shows the cumulative distribution function (CDF) of the absolution error of gain spectrum prediction across all EDFAs of the same type, under both the DNN and CM models.The results show that for booster EDFAs, the DNN model is able to achieve a median gain prediction error of 0.05/0.06dB for the random/goalpost test set, which is significantly smaller than that achieved by the CM model (0.21/0.21 dB for the random/goalpost test set).In terms of the tail performance, the DNN model is able to achieve a 95th-percentile gain prediction error of 0.13/0.15dB for the random/goalpost test set, which is again much smaller than that achieved by the CM model (0.58/0.58 dB for the random/goalpost test set).The results for the pre-amplifiers  show similar trends when comparing the performance achieved by the DNN and CM models.In addition, Table 3 shows the maximum absolute errors of the gain spectrum prediction achieved by the DNN and CM models.For both test sets, the maximum absolute errors of DNN are smaller than those achieved by the CM models.The maximum absolute errors achieved by the DNN and CM models across all test sets and EDFA types are 0.89 and 1.14 dB, respectively.Such a maximum error achieved by the model can be used to provide insights into the system margin design.

TL-BASED EDFA GAIN SPECTRUM MODEL
TL is an ML method that uses domain knowledge from a pre-trained model to apply to a new but similar problem.In this section, we show that TL can be used to model the gain spectrum modeling across different EDFAs with minimum data collection.

A. TL Model and Target Dataset Selection
We apply the following TL procedure to transfer a DNN-based source model to a target model.First, the input layer and all four hidden layers of the DNN (see Fig. 9) are frozen, which are treated as the feature extractor of the DNN model, and the weights of the output layer using the Kaiming normalization are reinitialized.Then, the DNN model is re-trained using the same MSE loss function given by Eq. ( 4) with a step size of 0.05 over 150 epochs.Finally, all layers are unfrozen and fine-tuned with a step size of 0.001 over 20 epochs, while the batch normalization parameters are kept unchanged.
Using the dataset and DNN-based EDFA model described above, we first investigate for a given (pre-trained) source model, how much new data is needed from a target EDFA.We consider all cases where each booster/pre-amplifier EDFA serves as the source model, which is then transferred to each of the seven other booster/pre-amplifier EDFAs using different sizes of target EDFA datasets.Let N tgt and N src denote the number of gain spectrum measurements at each gain setting used to train the source model and construct the target model for TL, respectively.We consider N src = 2,732 gain spectrum measurements from the source EDFA and different numbers of measurements from the target EDFA: Figure 14 shows the MAE and standard deviation of the EDFA gain spectrum prediction accuracy averaged across all possible source-target model pairs for the random and goalpost test sets, with varying ratios of N tgt /N src .The results show that the average EDFA gain prediction accuracy achieved by the target model with a target-source data size ratio of N tgt /N src = 0.5% outperforms that achieved with N tgt /N src = 0.2%, but is comparable to that achieved with N tgt /N src = 1.5%.Therefore, we empirically select N tgt = 13 with N tgt /N src = 0.5% in the rest of the evaluations, which largely reduces the target data size 200× while achieving an MAE of <0.2 dB across all EDFAs.Below, we evaluate the performance of TL-based EDFA models in three scenarios.

B. TL between EDFAs of the Same Type
Figure 15 shows the MAE matrices (in dB) across eight EDFAs of the same type (booster or pre-amplifier) under the random and goalpost test sets, with three gain settings (15/18/21 dB) and a target data size of N tgt = 13 (N tgt /N src = 0.5%).In each MAE matrix, (i) entry (i, i), i = 1, ... , 8 corresponds to the component-level DNN-based EDFA model (i.e., without TL), and (ii) entry (i, j ), j = i corresponds to the transferred EDFA model with the ith and j th EDFA being the source and target model, respectively.For the ith row in the MAE matrix, each entry (i, i) is always smaller than (i, j ), ∀ j = i.This shows that the TL-based model always achieves a slightly larger gain spectrum prediction error than the DNN-based model  without TL, which is as expected given the limited number of new measurements used for deriving the target model.
To compare the TL on two test sets, it can be observed that for booster EDFAs, the TL-based model achieves an MAE between 0.06-0.12and 0.08-0.18dB on the random and goalpost test sets, respectively.Similarly, for pre-amplifier EDFAs, the TL-based model achieves an MAE between 0.09-0.18and 0.12-0.24dB on the random and goalpost test sets, respectively.In particular, TL achieves better average gain prediction accuracy for booster EDFAs compared to the pre-amplifier EDFAs, and suffers from lower accuracy under goalpost channel-loading configurations, exhibiting a similar trend to the performance of the component-level DNN model presented in Section 5. We expect that the performance of the target model can be further improved by including (a small number of ) gain measurements under the random/goalpost channel-loading configurations in the target data.Overall, the MAE achieved by the target booster/pre-amplifier model is within 0.18/0.24dB across all the test sets.
In addition, Fig. 13 shows the CDF of the absolute prediction error achieved by the TL-based models compared to the DNN-based models.The results show that for booster EDFAs, the TL-based model achieves a median absolute error of 0.06/0.09dB on random/goalpost test sets, which is (slightly) worse than that achieved by the DNN-based model but outperforms the CM models.Similar trends have been observed for pre-amplifiers EDFAs.In terms of the tail performance, the 95th absolute errors for booster/pre-amplifier EDFAs achieved by TL-based model prediction are 0.22/0.28and 0.32/0.50dB for random and goalpost test sets (see Fig. 13).However, TL does not perform well in terms of the maximum absolute error (which can be a few dB) due to the limited new data collected from the target EDFA.Improving the prediction accuracy for TL-based EDFA gain models, especially with goalpost channel-loading configurations, is considered as a subject of our future research.

C. TL between Gain Settings of the Same EDFA
Figure 16 shows the MAE and standard deviation of the booster EDFA gain spectrum prediction accuracy achieved by the TL-based model trained using one or two source gain settings, and then transferred to the target model using additional measurements from a new target gain setting.For example, "15 & 18 → 21" means that the source model is trained using EDFA gain spectrum measurements with 15 and 18 dB gain settings, and then transferred to the target model using measurements with 21 dB gain setting.The results show that the TL approach using a single source gain setting can result in an MAE of up to 0.8 and 1.0 dB under the random and goalpost test sets, respectively.These MAE values under the random and goalpost test sets can be further reduced to 0.21 and 0.19 dB with the additional domain knowledge from the measurements under a second gain setting of 15 dB.In addition, the standard deviation of the gain spectrum prediction error is reduced with the additional domain knowledge.Overall, the MAE across all source/target gain combinations achieved by the TL-based models with two gain settings is 0.16 dB.Similar MAE performance is observed for TL-based models constructed for the pre-amplifier EDFAs.

D. TL between EDFA Types
So far, we consider TL between the same EDFA type (booster or pre-amplifier); another way that can benefit the target EDFA data collection process is to apply TL across different EDFA types.Figure 17 shows the MAE and standard deviation averaged across eight ROADMs of the EDFA gain spectrum prediction accuracy when transferred from a source booster model with three gain settings to a target pre-amplifier model (B → P) or vice versa (P → B) on the same ROADM, compared to the DNN-based model without TL (B → B and P → P).The MAE achieved by the target model is all within 0.21 dB with an average MAE of 0.16 dB, and TL introduces an MAE degradation of only 0.06/0.10dB and 0.10/0.13dB for the booster/pre-amplifier EDFA compared to that achieved by the source model under the random and goalpost test sets, respectively.TL between different EDFA types (B, booster; P, pre-amplifier) on the same ROADM.

Fig. 1 .
Fig. 1. (Left) COSMOS optical data center with ROADM devices.(Right) Block diagram of the Lumentum ROADM-20 unit and the measurement setup for the DUT booster/pre-amplifier EDFA.

Fig. 3 .
Fig. 3. Input-output power of the collected EDFA gain spectrum measurements overlaid on the EDFA gain masks (high gain mode).

Fig. 6 .
Fig. 6.Mean peak-to-peak gain ripple across eight booster and eight pre-amplifier EDFAs with different gain settings.

Fig. 7 .
Fig. 7. Example gain ripple spectrum measurements of the booster and pre-amplifier EDFAs at 18 dB gain setting and under WDM channel-loading configurations, spanning 10 months.

Fig. 8 .
Fig.8.Mean, 95th-percentile, and maximum values of the absolute difference in the two rounds of EDFA gain ripple spectrum measurements spanning 10 months.

Fig. 9 .
Fig. 9. Architecture of the DNN model used for EDFA gain prediction.Transfer learning reinitializes the orange output layer before retraining.

Fig. 10 .
Fig. 10.MAE of the gain spectrum prediction error achieved by the component-level DNN and CM EDFA models across eight boosters and eight pre-amplifiers on two test sets.

Fig. 11 .
Fig. 11.Mean errors of the positive/negative (+/−) gain prediction achieved by the DNN and CM EDFA models across eight boosters and eight pre-amplifiers on two test sets.

Fig. 12 .
Fig. 12. Mean errors of the positive/negative (+/−) gain prediction on each wavelength channel achieved by the DNN model across eight boosters and eight pre-amplifiers on two test sets.

13 .
CDF of absolute errors on component DNN and CM EDFA models across eight booster and eight pre-amplifier EDFAs.
(i) N tgt = 5 gain spectra under fully loaded channel configurations, with N tgt /N src = 0.2%; (ii) N tgt = 13 gain spectra under fully loaded and half loaded channel configurations, with N tgt /N src = 0.5%; (iii) N tgt = 41 gain spectra under fully/half/single/double loaded channel configurations, with N tgt /N src = 1.5%.

Fig. 14 .
Fig. 14.MAE of the EDFA gain spectrum prediction achieved by the TL-based EDFA gain model with varying target to source data size ratios, N tgt /N src .

Fig. 15 .
Fig. 15.MAE matrix (in dB) of ML-based EDFA gain spectrum prediction averaged across the random and goalpost test sets, where entry (i, i) corresponds to the DNN-based EDFA model (without TL), and entry (i, j ), i = j represents the TL-based model trained on the ith source EDFA and transferred j th target EDFA.

Fig. 16 .
Fig.16.MAE of the EDFA gain spectrum prediction accuracy using TL from one (left) or two (right) source gain settings to another target gain setting on the same booster EDFA.
APPENDIX A: EXAMPLE EDFA GAIN SPECTRUM MEASUREMENT DATASET IN JSON FORMAT Listing 1. Structure Outline for JSON-Based Dataset Files

Table 1 .
Summary of the Measurements for Each EDFA (Booster or Pre-amplifier) under a Given Gain Setting

Table 2 .
Dataset Split for Training (Including Both the DNN-and TL-Based EDFA Gain Models) and Test at Each EDFA Gain Setting

Table 3 .
Maximum Absolute Error for EDFA Gain Spectrum Prediction Achieved by the CM and DNN Models

Table continued )
Research Article Funding.National Science Foundation (1827923, 2029295, 2112562, 2211944); Science Foundation Ireland (#13/RC/2077_P2); Google; International Business Machines Corporation.addressing interdisciplinary challenges for smart cities, sustainability, and digital equity.Tingjun Chen is an Assistant Professor of Electrical and Computer Engineering and Computer Science at Duke University.He received the B.Eng. degree in electronic engineering from Tsinghua University in 2014, the Ph.D. degree in electrical engineering from Columbia University in 2020, and was a Postdoctoral Associate with Yale University from 2020 to 2021.His research interests are in the area of networking and communications, with a specific focus on next-generation wireless networks and Internetof-Things systems.He received the Google Research Scholars Award, the IBM Academic Award, the Facebook Fellowship, the Wei Family Private Foundation Fellowship, the Columbia Engineering Morton B. Friedman Memorial Prize for Excellence, the Columbia University Eli Jury Award and Armstrong Memorial Award, the ACM SIGMOBILE Doctoral Dissertation Award Runner-Up, and the ACM CoNEXT'16 Best Paper Award.