A State-Variable Approach to Submarine Links Capacity Optimization

—We consider the capacity optimization of subma- rine links when including a realistic model of the gain-ﬂattened constant-pump erbium doped ﬁber ampliﬁers (EDFA). While Perin et al. [1] numerically attacked this optimization for Constant-Gain (CG) ampliﬁed links, we extend the analysis also to more realistic submarine constant power-spectral-density (CPSD) links. As in [1], we concentrate on a single spatial mode of a spatial division multiplexed (SDM) link at low EDFA pump power P p , and thus consider only the impairments of ampliﬁed spontaneous emission noise. Here we adopt a novel semi-analytical approach which consists of ﬁxing the inversion x 1 of the ﬁrst EDFA (the state-variable of the link) and analytically ﬁnding capacity C ( x 1 ) by searching over the x 1 − feasible input wavelength division multiplexed (WDM) PSD distributions. Then the optimum inversion x 1 that maximizes C ( x 1 ) is numerically obtained. This approach enables us to get both approximate (for CG links) and exact (for CPSD links) capacity-maximizing WDM input distributions, which vary inversely with the EDFA gain proﬁle. For CG links the optimal WDM allocation is called the gain-shaped water-ﬁlling. Other practical allocations are analyzed, such as the signal to noise ratio equalizing allocation (CSNR), and the constant input power (CIP) allocation which uses a ﬂat WDM distribution. We ﬁnd that, for typical submarine span attenuations around 10dB and when the link works at the optimal inversion x 1 , CIP and CSNR achieve essentially the same capacity as the optimal allocation. At sufﬁciently large pump P p 30 the optimal inversion x 1 is such that the EDFA gain at 1538nm equals the span attenuation, for EDFA emission and absorption as in [1]. When span attenuations increase to 20dB, then we start seeing an advantage of the optimal allocation. Another key ﬁnding is that optimized CG and CPSD links behave roughly the same, with a slightly superior capacity for CPSD.


I. INTRODUCTION
S ubmarine systems are fundamentally power constrained.
The quest for energy efficiency has become critical, and new industrial cable designs have arisen, leveraging space division multiplexing (SDM) with either fiber bundles combined with pump-farming solutions at the industrial level [2], [3], or investigations of multi-core fiber solutions at the research level [4]. For each spatial-mode, line optimization has also been rethought, with suggestions to optimize amplifiers bandwidth [1], gain shaping filters [5], [6], or Manuscript received xxxxxxxxx xx, xxxx; accepted xxxxxxxxx xx, xxxx. Date of publication xxxxxxxxx xx, xxxx; date of current version xxxxxxxxx xx, xxxx.
A. Bononi  the wavelength division multiplexing (WDM) input power spectral density (PSD) [1], [7], also known as pre-emphasis, most often based on direct numerical or machine-learning optimization.
In particular, Perin et al. [1] numerically studied how the capacity of submarine single-mode fiber links with endspan gain-flattened, fixed-pump erbium-doped fiber amplifiers (EDFA) scales with the EDFA pump power. The endto-end pump requirements of SDM links based on fiber bundles are then inferred from the single-mode fiber results. In [1], the detailed EDFA physics were first introduced into the capacity-maximizing design of submarine pumppower constrained links. The innovative concept was that of bringing the bandwidth-signal to noise ratio (SNR) trade-off of the fixed-pump EDFA into the link design. The results in [1] were obtained by a fully numerical optimization, which limits the understanding of the underlying general concepts.
In this paper, we use a novel semi-analytical approach to both shed light into the results of [1], and extend them to cope with today's submarine links where at all spans the WDM signal PSD, the EDFAs, and their (fixed) gain shaping filters (GSF) are nominally identical. Luckily, these latter links are simpler to analyze, and we present them first. Then we attack the link in [1] and derive results quite close to those in [1], thus both confirming their results and validating our novel method.
The first key enabler of our approach is the use of the classical Saleh EDFA model [8]- [10] extended to include self-saturation by amplified spontaneous emission (ASE) [11]. The Saleh model without ASE self-saturation [8] was also used in [1] to provide the starting conditions for the global optimization search. The reader may review the EDFA model in Appendix 1, whose exposition is tailored to the developments in the main text.
The second and fundamental enabler of our approach is a strategy that allows us to break the complex capacity optimization problem into a set of simpler problems amenable to an analytical solution. Here is the idea. As in [1], our goal is to maximize the single-mode fiber capacity at a given per-EDFA pump power P p . For each P p , our original approach is based on assuming knowledge of the inversion x 1 at the first EDFA (the "state-variable" [10] of EDFA 1 and thus of the entire link) and analytically finding the capacity-achieving WDM distribution and the capacity value at x 1 . The dependence on all physical link parameters is summarized by the sole state variable x 1 . We finally scan the whole x 1 inversion range and numerically obtain the maximum link capacity. Once maximum capacity versus P p is available, it is possible to carry out optimizations for power-constrained submarine SDM systems [1]. Figure 1. (a) M -span amplified line with span attenuation A > 1 and endspan amplification. At each span k, an EDFA with inversion x k is followed by a GSF, for a total gain at frequency f j of G j (x k ) = Aχ j (x k ), where χ j ≤ 1 is the span droop. In CPSD mode, the GSF and thus χ j are tailored such that the span input flux Q j equals the span output flux at each frequency. (b) block diagram of a span. δQ j = F j ∆f is the ASE equivalent EDFA-input flux at frequency f j over bandwidth ∆f . (c) span block-diagram equivalent to (b), obtained by "factoring out" the attenuation, i.e., moving the A multiplier upstream of the addition operator.
At the low per-EDFA pump powers envisaged for submarine SDM, only ASE matters and nonlinear effects can be disregarded [1], [12]. The single-mode fiber is thus a wavelength-parallel additive Gaussian noise channel. If we had a signal power constraint and the received noise power did not depend on the WDM allocation, the capacityachieving distribution would be the well-known classical water-filling (CW) [13]. Alas, EDFA gain and noise do depend on input power, and the constraint is here on the per-EDFA pump power. For the links considered in [1] we will find an x 1 -dependent quasi-optimal input distribution that we call the gain-shaped water-filling (GW) since the sum of signal and noise is not flat as in CW, but shaped as the inverse EDFA gain. For more realistic submarine links with constant PSD at the input/output of each EDFA, which we label constant-PSD (CPSD) links, we manage to get instead a recursive algorithm yielding the optimal WDM PSD and thus capacity. Our treatment of the CPSD links is an extension of the generalized droop model [14] to the wavelength-dependent amplifier gain. This paper is an extension of the conference paper [15], which first introduced the GW allocation for CG links. The rest of the paper is organized as follows: Section II introduces the link model. Section III analyzes the CPSD link. Section IV analyzes the CG link. Finally Section V concludes the paper.

II. SINGLE-MODE FIBER LINK MODEL
As depicted in Fig. 1(a), we consider the transmission of an WDM signal with along a link of M end-amplified single-mode fiber spans. The span attenuation is A > 1. Each end-span amplifier consists of an EDFA followed by a Gain-Shaping Filter (GSF) which may either flatten or properly shape the amplifier gain over the amplifier bandwidth.
As described in Appendix 1, the useful bandwidth B k of the k-th gain-shaped amplifier in the link is defined as the frequency size of the set B k = {f j : G j (x k ) ≥ A}, i.e., the frequency range over which the EDFA gain exceeds or equals the attenuation. In this study, the amplifier bandwidth is quantized in multiples of the bin size ∆f , so that B k = N c (x k )∆f , where N c (x k ) is the number of non-overlapping slots covering B k . WDM channels may be allocated in all or part of the N c bins within the bandwidth.
The EDFA length , the absorption α and emission g * parameters (see Appendix 1), and the pump power P p are the same at all EDFAs. As in [14], Fig. 1(b) shows the power-transfer block diagram of the generic span k, where δQ j (x k ) is the ASE equivalent EDFA-input flux (ph/s) and G j (x k ) = Aχ j (x k ) is the shaped-amplifier power gain (EDFA+GSF: G j = G j G SF,j ) at all bins j centered at frequency f j and of width ∆f , belonging to the set B k , with χ j (x k ) ≤ 1 the (span) droop [14]. The droop is thus the net span-k power gain at bin j. The shaped total gain is assumed to be ideally zero out of band. Fig. 1(c) shows a more convenient representation of the span block diagram, where the span loss is "factored out" [14]. δQ j A is the equivalent span-input ASE noise generated at the end-span EDFA.
We will study the following two link types: 1) Constant Power Spectral Density (CPSD) link: at all spans k and frequencies f j in the bandwidth, the shapedamplifier gain G j (x k ) = Aχ j has a span-independent, frequency-dependent droop χ j ≤ 1 adjusted such that the span-output photon flux Q j (or equivalently the power P j ) at f j equals the span-input (i.e., the transmitted (TX)) flux. Hence the whole M -span line attenuates the input WDM channel at f j by χ M j . Such a stabilization of the photon flux profile is achieved by suitable GSFs. Hence the signal+cumulated-ASE PSD at the input/output of every EDFA is preserved. Since all EDFAs have the same pump, input PSD, and physical parameters, they all work at the same inversion x, with identical (frequency-dependent) gain and noise figure. Of course, CPSD implies a constant total output power (COP) at each amplifier, as in [14], but COP does not necessarily imply CPSD. CPSD operation requires an input-PSD-dependent GSF, identical at all amplifiers. We will show numerically that for typical submarine links the combined EDFA+GSF amplifier for CPSD operation, over the range where WDM channels are allocated, is very close to a perfectly flat filter with gain equal to A (to within a tiny fraction of a dB), and has infinite attenuation where no power is allocated.
The CPSD case is studied first since all EDFAs are identical and more analytical insight is possible. CPSD links with a flat input PSD are a typical choice in today's submarine systems, hence of great practical importance.
2) Constant Gain (CG) link: This is the case considered in ref. [1]. It is less realistic for submarine systems, but better approximates the operation of COP terrestrial links. Here at all spans k and frequency bins in the bandwidth range B k the span droop is unity: χ j (x k ) = 1, hence the shaped-amplifier gain G is perfectly flat and exactly equals the span attenuation A at all frequencies of the amplifier bandwidth. This implies that amplifier+fiber is a unit-gain block, so that the whole line has unit gain, the transmitted WDM power profile (i.e., the TX signal PSD) flows unchanged down the line, while ASE adds to it and increases down the line, so that the EDFAs inversion decreases down the line for constant-pump EDFAs. Thus the (mostly fixed and thus cheap) GSFs need to be all different, a serious practical drawback. The next sections detail the analysis in the two cases.
III. CPSD LINK In this case every EDFA in the line has the same input pump flux Q p (i.e., same power P p ) and the same input signal fluxes {Q j /A} Nc j=1 , thus (per Saleh equation (26), see Appendix 1) the same inversion x, hence the same EDFA gain profile G j (x) at all frequencies f j = c/λ j (c = speed of light, λ j the j-th wavelength) and the same frequencydependent GSF power transfer function: The overall amplifier gain profile is G j G j (x)G SF,j (x). With these ideal GSFs there is no need of adjustable gain equalizers regularly spaced along the link, which instead real links have.
As in [14], the droop χ j at each frequency bin j is found by imposing that the flux Q j (power P j ) entering the generic span shown in Fig. 1(b), or its equivalent form Fig. 1(c), be equal to the flux/power out of the gain-shaped amplifier: where (see Appendix 1) the equivalent EDFA-input ASE flux at f j is δQ j = F j ∆f , where F j is the EDFA noise figure at the j-th frequency bin f j . Thus we find [14]: where is the total signal to injected noise ratio at f j at each EDFA input. Note from (2) that when SN R 1j 1 then χ j → 1; conversely, when SN R 1j 1 then χ j → 0. The received (RX) signal flux after M spans at f j is (3). The SNR at f j is thus, using (5),(3): which is the generalized droop (GD) formula [14] at the j-th frequency, and we note that every input flux vector Q = {Q j } Nc j=1 is associated with a droop vector χ = {χ j (Q j )} Nc j=1 that ensures the CPSD behavior of the link, as per (2).
The quantity we wish to maximize for this WDM-parallel additive Gaussian noise channel with our signal flux (or power) allocation is the achievable information rate (AIR) per 2-polarization spatial mode at the fixed (Q p , x) [1]: with an implementation penalty 0 < Γ < 1 known as SNR gap to capacity 1 . As in [1], in all calculations we will assume Γ = 0.79 (-1dB). 1 When Γ = 1, AIR coincides with capacity.

A. AIR optimization
Our strategy is to fix the value of the (common) inversion x, and among all input flux vectors Q compatible with x (or feasible) we look for the one that maximizes AIR. The set of feasible flux vectors (which are those that solve Saleh equation for the given x and pump Q p ) is a convex simplex, with bounded maximum power. Luckily, in this case the optimum can be found analytically as follows.
We wish to maximize with respect to (w.r.t.) the unknown vector Q the function AIR(Q p , x, Q) in (7), subject to the positivity constraints for Q and Saleh feasibility constraint: where K, given by eq. (27) in Appendix 1, is the useful pump flux. Before delving into the optimization, let's consider first the AIR of some popular sub-optimal CPSD flux allocations. In the analysis, it is useful to decompose the flux vector as Q = Qs, where the scalar total flux is Q Nc j=1 Q j , and the split vector is s = {s j } Nc j=1 with s j Q j /Q ≥ 0. The split vector is thus a probability mass function (PMF). For any split, the total feasible flux Q is found from Saleh equation (8),(27) as: and is itself a function of the split vector. 1) Constant-SNR allocation (CSNR) : CSNR has been for a long time the standard WDM allocation in submarine links [16]. Let's first search for the TX flux vector that makes the received SN R RX j in (6) equal at all channels j. It should make the ratio SN R 1j ∝ Qj δQj equal for all j, i.e., Q j Qs j ∝ F j . Since s is a PMF, we then must have for all j = 1, ..., N c Hence using (9), for all j = 1, ..., N c the CSNR feasible fluxes are explicitly: which means the WDM signal and the ASE have the same frequency/wavelength power profile. Since SN R 1j is identical for all j, from (3),(4), (11) the droops at all f j are all equal to: which means that the CSNR allocation produces a perfectly flat droop (=net span gain). The equalized RX SNR is then: [14]. The CSNR AIR is finally obtained from (7). 2) Constant-input-power allocation (CIP) : CIP is another practical allocation often used in submarine links due to the simplicity by which one can tailor and monitor the needed GSFs that keep a nominally flat WDM spectrum along the line (a special case of CPSD). In the CIP case the input WDM powers (not the fluxes!) are all equal. The power vector is P = P c 1 with P c the per-channel TX power, and 1 the vector of all ones. So the fluxes are , and the split vector thus has entries Given the above splits, the total flux Q is again given by (9). Hence for all j = 1, ..., N c the CIP fluxes are explicitly which correctly yield a constant per-channel power P c = hf j Q j . Since SN R 1j is j-dependent, then from (3) the droop is also j-dependent: The first equality shows that the non-flatness of the span gain χ j is negligible whenever SN R 1j 1 for all j. The second equality shows that the χ j profile is the inverse of that of ASE power. The CIP received SNR and AIR are finally computed as usual per (6), (7).

3) Optimal allocation (OPT):
We now tackle the AIRmaximizing signal allocation. We form the Lagrangian: is the Lagrange multiplier and AIR is given by (7), and then set the derivative w.r.t. all Q k , k = 1, ..., N c , equal to zero. After some algebra we get: where we defined the key function which goes to 1 M Γ as χ → 1. Fig. 2 shows the normalized g(χ)M Γ versus the single-span ASE-corrupted SN R 1 ≡ (χ −1 − 1) −1 in dB scale for some values of parameters M and Γ. Now, setting ∂L ∂Q k = 0 yields: where θ 2∆f ΓM λ ln (2) . The unknown parameter θ is found by summing the solution (17) over all k and plugging into Saleh's constraint: . Plugging back into (17) we finally get the optimal (OPT) fluxes for all k = 1, ..., N c as:  (18) is a contraction map) always worked for us: where our starting guess Q (0) was the CIP solution (14) at the same (Q p , x) values. What kind of flux profile vs. frequency do we obtain from (18)? If for simplicity we assume OPT flux is allocated only over the N b bins where g(χ) is above 1/10 of its maximum, and the {g(χ k )} on such bins are all equal, then from (18) Hence the OPT allocation has a frequency profile as the inverse EDFA gain, as in the GW allocation that we will tackle later. Although we have no analytical proof, in numerical experiments the above solution is always unique and corresponds to a maximum.

B. CPSD link: numerical results
We analyze a CPSD link, with the same parameters as in [1]: M = 287 spans with attenuation A = 9.5dB, in the typical range of submarine links.
For an EDFA length = 6.27m, and pump powers Fig. 3(a) shows AIR vs. x for three different power allocation policies: OPT (solid), CIP (dashdotted), CSNR (dashed). We observe that AIR versus x grows identically for all three policies up to a maximum at inversion x o roughly equal to ∼ = 0.63 at all pumps ≥ 30mW. Beyond the maximum the OPT policy has larger and larger AIR w.r.t. the other policies (which behave almost the same). Reason of the existence of an optimum inversion is the bandwidth-SNR trade-off at play for the AIR (7) when increasing inversion. To highlight this trade-off, Fig. 3(b) shows both bandwidth B and RX SNR (averaged over the WDM channels) versus inversion x. Bandwidth vs. x is seen to increase monotonically independently of the allocation, as discussed in Appendix 1. The optimal x o turns out to be very close to the "knee" of the B vs. x curve (circle in Fig. 3(b)). As explained in Appendix 1, such a "knee" occurs when the EDFA gain trough at 1538nm is tangent from above to the A(dB) horizontal line. SNR instead has normally a decreasing behavior versus x. Reason is that, at fixed pump, increasing x implies more gain and thus smaller feasible WDM signals, which implies (normally) a decreasing SNR, even though noise figure (hence ASE) also decreases as x increases. We see that the SNR curves for CIP and CSNR allocations are monotonically decreasing and almost coincident at all x, while the OPT SNR coincides with that of the other allocations up to and a little beyond x 0 (which justifies the common AIR growth for all policies at small x), and then OPT starts to have a much larger SNR than the sub-optimal policies, and thus much larger AIR.
The key observation from 3(a) is that at submarine-typical span loss (e.g. A = 9.5dB) and working at the optimal inversion, where SNR is still "sufficiently" large, the OPT AIR is almost the same as that of the other policies, a fact already known for high-SNR channels where classical waterfilling is optimal [17]. The consequence is that, if pump is large enough and the line inversion is optimally chosen, there is little scope for allocation optimization, and the most practical allocation should be selected [7].
For the CIP allocation, Fig. 4 reports the same AIR values as in Fig. 3(a), but now plotted vs. the total input power N c P c for flat CIP allocation, which depends on inversion as per eq. (14). This figure is reminiscent of similar curves with Kerr nonlinearity [19, Fig. 10], although here the maximum is due to the above-described SNR-bandwidth trade-off. At the optimal inversion x o = 0.63 and at pump P p = 60mW, Fig. 5(a) plots both the EDFA gain (blue solid) and the EDFA+GSF gain (symbols, whose details are given in the inset) versus wavelength for OPT ('o'), CIP ('+'), CSNR ('*'). We see that the equalized gain profiles vs. wavelength (hence the net span gain (i.e., droop) profiles) for the three allocation policies differ by less than 0.006 dB. The only truly flat one is that of CSNR, and the χ k profile of CIP is the inverse of the ASE power, as already noted in (12), (15). All three droop profiles are thus practically flat and the GSF has essentially the profile of the inverse EDFA gain, as per (1). Given the manufacturing tolerances of the order of a fraction of dB, it is not possible to precisely engineer the three (ideally different) GSFs for the three policies, which is the main reason of the insertion of extra flattening filters placed after a block of spans, whose main task is to restore the TX input PSD. Fig. 5(b) plots the PSD at each EDFA input versus wavelength for the OPT, CSNR, and CIP allocations. We note that the input PSD for CSNR has the same wavelength profile as the ASE noise, the CIP allocation is flat, while the OPT PSD is shaped as the inverse of the EDFA gain, as noted below (18).
Finally, for M = 287 spans and A = 9.5dB, Fig. 6 shows (crosses) the x-maximized AIR of the CPSD link with OPT allocation versus pump power. Here the EDFA length was optimized at each pump, as explained in the next section. The remaining curves, to be derived later, refer to the CG link in [1]. The figure anticipates the slight AIR superiority (at equal EDFA pump) of the optimized CPSD link over the CG link, mostly visible at the lowest pumps below 50mW, i.e., those of most interest for SDM submarine links [18].
1) Optimal inversion and EDFA length: Now the question is: what is the optimal EDFA inversion-length pair (x o , o )?
We already observed that, at large-enough pump and at typical submarine span attenuations, maximum AIR is normally achieved at an optimal inversion close to the knee of the B vs. x curve, where the gain trough at λ T =1538 nm is tangent from above to the attenuation line and bandwidth is a compact interval. Let α T , g T be the absorption and emission coefficients at λ T . Imposing G(λ T ) = A and using (Appendix 1): G T = exp( ((α T + g * T )x − α T )) yields the following relation between x o , o : Fig. 7(a) shows AIR vs. inversion x at various EDFA lengths , where we see that = 5.41m yields the largest top AIR (hence the value 6.27m reported in [1] and used above is close to optimum). Fig. 7(b) reports as circles the optimal pairs (x o , ) from each in left figure, and shows in the insets the corresponding EDFA gain vs. wavelength. Solid line is the theoretical curve (19). The three circles right on top of theory are for lengths = [4.41, 5.41, 6.41]m which show a cuspid at the top of the AIR vs x curve and have tangency of the gain trough at 1538nm with the attenuation line. The two edge circles refer to lengths = [3.41, 7.41]m at which tangency is not exact, so that circles are slightly off theory and the cuspid is not visible in the AIR curve.
From the bottom inset in Fig. 7(b) we find that the gain averaged over wavelengths is G = [10.3, 10.6, 11.7] at lengths = [4.41, 5.41, 6.41]m. Hence among the lengths that cause trough tangency, we see that the AIR-maximizing length = 5.41m (blue) is not exactly (but close to) the one with the smallest G, i.e., the one closest to A, hence the one that minimizes the photon flux wasted by the GSF, i.e., the most pump-photon efficient.
We next look at the performance of a M =63-span link with a typical terrestrial attenuation A = 20dB, which corresponds to roughly a link of half the length of the 287span link with 9.5 dB attenuation. Fig. 3(a), Fig. 8 shows AIR (Tb/s) vs. x at various pumps, but now for A = 20dB and the sub-figures refer to two different EDFA lengths = 6.27m and = 9.27m. We see that compared to the A = 9.5dB case, at A = 20dB a wider gap between OPT and the other allocations is obtained even at the optimal inversion, especially at the lowest pumps. At EDFA = 9.27m the gap is minimum since that length is optimal. At = 6.27m the inversion needs to be sharply increased right into the range where OPT is much more efficient than the other allocations, and the gap is thus maximum.

As in
2) Optimal span length: To straddle a certain total distance, we can either use a few widely spaced high-pump amplifiers, or many narrowly-spaced low-pump amplifiers, and the question is: what is the best number of amplifiers to minimize the required total line pump power? For concreteness, we analyze again our 14350 = 7 · 41 · 5 · 5 · 2 km CPSD link and look for the best among the following 6 combinations: 1) M=287*2=574 spans of 25km each Using our reduced-complexity extended-Saleh EDFA model, instead of the full EDFA model used in [1], the search is much faster. Fig. 9 shows the largest AIR (Tb/s) (obtained through EDFA length optimization) versus total link pump power M P p (W) for the 6 above CPSD systems (as shown in [1], nonlinearities are negligible below ∼ 287 · 100mW 30 W, i.e. over the shown total pump range). AIR is shown for OPT allocation, but is almost identical for both CSNR and CIP allocations. We see that the AIR curve initially improves as the span length decreases (because ASE decreases), but after an optimum span length the AIR curves worsen again because too many amplifiers consume too much of the available total pump power. The 50km/span link (A = 9.5dB) yields the largest AIR per pump watt and is thus the most energy-efficient, with the 41km being just slightly inferior (this is consistent with results in [1]). Clearly, not only power efficiency but also cost must be considered in the final link design [3], [12].

IV. CONSTANT-GAIN (CG) LINK
We next attack the more difficult case considered in [1]. Here the EDFA+GFF amplifier has a flat gain that exactly equals the attenuation over all channels in the bandwidth set B, hence the flattened amplifier (signal) gain is constant at all amplifiers in the link (CG operation). Thus the net span gain (i.e., the droop) is unity at all channels. Hence every span has the equivalent block diagram in Fig. 1(c) with χ = 1. Thus in the CG scenario the RX signal flux after M spans at the j-th frequency bin is Q SIG,RX j = Q j , the same as the transmitted one, while the RX ASE flux is Q ASE,RX j = A M k=1 F j (x k )∆f , so the ASE flux increases down the line. Thus EDFAs down the line see increasing input fluxes at the same pump, hence their inversions decrease: Since bandwidth is monotonically increasing with inversion, then the EDFA bandwidth decreases down the line and thus the bandwidth of the whole line is set by that of the last EDFA: Now, the inversions down the line are a function of the chosen input signal fluxes {Q j } and of the inversion x 1 of the first EDFA (which is taken as a known quantity). TX fluxes Q = {Q j } Nc j=1 must be feasible for the given pump Q p and inversion x 1 , i.e., they must satisfy Saleh equation (8) at the first EDFA. The inversions x k of the subsequent EDFAs are recursively found from the known x 1 and the Q vector by solving the Saleh equation at the k-th EDFA, for all k = 2, 3, ..., M . Fluxes Q depend on the flux allocation strategy, as detailed later.
The RX SNR Q SIG,RX j /Q ASE,RX j at frequency j is (20) and the AIR per (2-polar.) spatial mode at fixed Q p and inversion x ≡ x 1 of the first EDFA is still given by eq.
The flux vector Q to be used in each EDFA Saleh equations is specified depending on the allocation strategy, as detailed next.

A. Gain-shaped Waterfilling allocation (GW)
Define the inversion vector as x = {x 1 , x 2 , ..., x M }, and from (20) define the noise vector as for j = 1, ..., N c (x M ). Then the optimal Q maximizes the AIR subject to the Saleh constraint (8) at the first EDFA. The problem differs from that of CPSD links in two respects: 1) we have an SNR where the noise flux N j depends on the inversions only. Alas, the inversions in turn depend on Q through Saleh balance equation at each EDFA; 2) the total link bandwidth B(x M ) and hence N c (x M ) are not independent of the actual fluxes Q as it was in the CPSD case.
Finding the exact AIR-maximizing Q (as we did in the CPSD case) is ruled out, since the feedback dependence of the inversions {x 2 , ..., x M } and N c (x M ) on the fluxes Q is too complicated to be analytically treated. However, if we ignore the dependence of the inversions on Q, we can find a simple analytical solution as described next, which turns out to be "quasi-optimal".
When the inversion vector x is fixed, and so is N c (x M ), Appendix 2 derives the optimal fluxes Q Q GW (x) that maximize AIR (22) subject to constraint (8), whose entries j = 1, .., N c are where θ/(G j − 1) is the inverse-gain-shaped water-level, and theta a parameter to be found from Saleh equation at EDFA 1 (30). We call this flux allocation the (inverse-)Gainshaped waterfilling (GW), in analogy to the well-known classical water-filling (CW) where the water-level is flat. Fig. 10 gives an example of both GW and classical water-filling (CW) signal power allocations for a CG line, M = 287 span, A = 9.5dB, and EDFAs with length = 6.27m and pump P p = 60mW, working at inversion x 1 = 0.75. The solid curve is the non-flat water-level hf j θ/(G j (x 1 ) − 1), the dark shaded level corresponds to the noise power hf N j (mW), and the GW signal power S j = hf j Q j (light shaded) is allocated between the noise level and the water-level curve, as per (23). For CW instead the water-level is the flat dashed line.
The inversion x is next iteratively found by algorithm 1. By exhaustive search over the feasible input simplex, we verified that the sub-optimal GW practically coincides with the optimal allocation at all but the largest inversions. GW will be validated also by comparison with the results in [1]. In Appendix 2 we also consider the classical water-filling allocation (CW) with a flat water level. The inversions vector x in the CW case is found by the same iterative algorithm as above, when updating Q (n) = Q CW (x (n−1) ) by using CW fluxes (31).
For CG links, we next outline the two sub-optimal fluxallocation strategies CSNR and CIP already considered in the CPSD case.

B. CSNR allocation
To equalize the SNR (20), the ratio Qj M k=1 Fj (x k ) must be equal for all j, i.e., Q j ≡ θs j ∝ M k=1 F j (x k ) using the power split s j Q j /θ (θ is here the total TX signal flux 2 ) and since s is a PMF we then must have for all which explicitly depends on the inversion vector x. These are the splits used in EDFA-1 Saleh equation (8) that allow us to get the total flux θ from (9). So we now have the CSNR flux vector Q CSN R (x) for given x. The inversions vector x is found by the same iterative algorithm as above, when updating Q (n) = Q CSN R (x (n−1) ) . At convergence, we get the SN R RX from (20) and the AIR from (7).

C. CIP allocation
As in the CPSD case, the CIP split vector entries {s j } are obtained from (13) and they do not depend on inversions x. With these, the total flux θ is obtained from (9) and only depends on the known x 1 . So we have directly the CIP flux vector Q CIP (x 1 ). The rest of the inversions vector x is found by just one run of step 2 of the standard iterative algorithm. We finally get the SN R RX from (20) and the AIR from (7).

D. CG link: numerical results
Very similar numerical results to those obtained for the CPSD links are obtained for CG links, and we next present them. However note that the shaping filters in the CG link need to be matched to the corresponding inversion and are thus all different. We present it for sanity checks with the seminal work in [1]. Fig. 11(a) shows the achievable information rate AIR (Tb/s) (solid: GW; dash-dot: CW; dash: CSNR; dot: CIP) versus inversion x at various pump powers from 15 to 180mW. We see that all allocation policies yield essentially the same AIR (differences below 2%) up to a little beyond the optimal inversion x 1 , which occurs close to the "knee" of the corresponding B vs. x 1 curve, cfr. Fig. 13(b). The global AIR vs x 1 behavior is quite similar to that of the corresponding CPSD link, shown in Fig. 11(b) (same curves as Fig. 3(a) with added GW curve).
We note that the CPSD link has always a slightly larger AIR than the CG link at same pump, and the differences become more noticeable at lower pumps. Also note that for the CPSD link we added the GW allocation (solid) as described in Appendix 4, and GW turns out to be coincident with the optimal allocation (solid with circles) up to large inversions.
For a global comparison between optimized CPSD and CG links, please refer to Fig. 6, which shows (circles) the x 1maximized AIR of the CG link with GW allocation versus pump power. Here the EDFA length was optimized at each pump and for each allocation. The solid lines are the AIR results for the same link from [1, Fig.4(a)], both without (red) and with nonlinear effects (blue), which shows that at P p below 100mW nonlinear effects are negligible [1]. With only ASE, we see that our GW AIR is practically coinciding with the optimal values found in [1].
For the same link, Fig. 12 shows the signal powers at every EDFA input at various pumps and at the optimal inversions, where symbols are our CG GW values and solid curves are the results in [1, Fig. 2a]. We see that we are able to get similar curves. Note that our bandwidth calculation considers all bins with EDFA gain above attenuation, without any smoothing of the bandwidth edge bins, while the work in [1], in order to perform particle swarm maximization, introduces artificial smoothing, which is evident at the bandwidth edges in Fig. 12.

V. SUMMARY AND CONCLUSIONS
We started our exploration of this topic when trying to reproduce the fully-numerical CG link results in the seminal paper [1]. Key to our novel approach was the use of the Saleh EDFA model enriched with ASE self-saturation (Appendix 1), and the understanding that knowledge of the state variable of the first EDFA [10] and thus of the entire link allows breaking the whole optimization problem into a series of much simpler optimizations, amenable to analytical solution. We thus discovered the GW allocation (Appendix 2) as the quasi-optimal allocation in CG links [15], which is a new twist in optical communications of the classical capacity-achieving water-filling allocation in additive Gaussian noise channels [13]. Today's submarine links, however, are not CG links, but rather CPSD links with CSNR or CIP input WDM PSDs. The PSD is preserved along the line by (nominally) identical GSFs at all EDFAs. Their role is not exactly gain flattening, but rather PSD enforcing. In this paper we explored variations of the CPSD link with non-flat input PSDs. Our analytical approach to the study of CPSD links is an extension of the GD model [14] to the case where the amplifier channel gains are not all equal. Using the GD model we discovered the optimal PSD, eq. (18), that maximizes capacity (more correctly the AIR, since we introduce a "gap to capacity" Γ), whose analytical formulation formally does not resemble the GW allocation. However, we proved that the GW allocation in CPSD links provides essentially the same AIR as the optimal allocation, up to almost the largest inversions (Cfr. Fig. 11(b)). The same thing happens in CG links.
The main findings from this study are: 1) optimized GC and CPSD links behave roughly the same, with a slightly larger optimal AIR for CPSD links (Cfr Fig. 6); 2) with pumps ≥30mW and typical submarine span attenuations in the order of 10dB, sensible allocations such as the flat one (CIP) or the SNR-equalizing allocation (CSNR) achieve AIR values to within a few percent of the optimal AIR (Cfr Fig. 11). There is thus no point in using non-flat allocations [7]. However, when span losses are increased to 20dB (typical of terrestrial systems), the optimal allocation provides sizable AIR improvements, especially at lower pumps or when EDFA length is sub-optimal (Cfr Fig. 8); 3) at submarine span losses, the optimal EDFA inversion (for absorption and emission coefficients as in [1, Fig. 7]) is very close to the one that makes the EDFA gain at 1538nm equal to the span attenuation (Cfr. Fig. 7); 4) the fixed gain shaping filters to implement CPSD for different PSD allocations differ from the inverse EDFA gain profile by a fraction of a dB (Cfr Fig. 5). Since it is unfeasible to implement them with such tolerances, then periodically spaced adjustable gain/PSD equalization is needed along the link; 5) power-efficiency optimal span attenuations are in the range 8-9.5 dB, corresponding to span lengths 40-50km (Cfr. Fig. 9), in agreement with [1], [3].
An interesting question is how to possibly estimate the inversion x of a black-box commercial amplifier just from input/output measurements, and thus reconnect that estimate to the picture presented in this paper. Some recent black-box EDFA models point in that direction [20], [21], but the problem remains open. the fluorescence r τ , with τ the fluorescence time, iii) the forward and backward ASE flux Q F +B ASE , which causes ASE self-saturation, and is analytically included as [11]: with n sp,j = g * j x (g * j +αj )x−αj the spontaneous emission factor [22], and ∆f the ASE resolution bandwidth over bins j = 1, ..., L tiling the significant wavelength range. The first multiplier 2 gives equal strength to forward and backward ASE, since (28) is obtained in the assumption of a spatially uniform inversion at value x [11].
EDFA Noise Figure : The EDFA noise figure is defined for all frequency bins j as [23] F j = 2n sp,j Gj −1 Gj . The equivalent-input (forward) ASE flux to the EDFA at λ j is F j ∆ν.
Amplifier Bandwidth: In Fig. 13(a) we report the Gain versus wavelength at various inversions, where we also show a horizontal line at level A =9.5dB, i.e., the span attenuation considered in [1]. The gain-flattened amplifiers in the main text consist of an EDFA followed by an ideal gain flattening filter that slices off all gain in excess of the span attenuation A and blocks all wavelengths whose EDFA gain is below A. N c WDM signals of bandwidth ∆f and spacing ∆f completely occupy the available amplifier bandwidth B. Such a bandwidth (THz) is plotted in Fig. 13(b) versus inversion x for attenuation A both 9.5 and 20dB. For A = 9.5dB, we note that B is zero at all inversions below the cutoff value x = 0.585 which is the largest x for which the gain curve is fully below A (see the thick purple curve in Fig. 13(a)). Above cutoff, B increases vs. x initially fast, and then more slowly after a knee (at x = 0.63 for A = 9.5dB), which from Fig. 13(a) (thick blue curve) corresponds the smallest x for which the EDFA gain trough at 1538 nm equals A, so that the bandwidth comes from a compact wavelength set, instead of disjoint segments as at lower inversions. Here is why the slope of B vs. x is large at the knee and becomes abruptly small after the knee. The bandwidth change dB due to a change dx can be expressed as dB = I i=1 df i , where df i is the change in bandwidth at the ith gain intersection point f i with the attenuation line, i = 1, .., I. By multiplying and dividing by the induced gain change dG dB i = 10 log 10 (e) (α j + g * j )dx, we write dB = I i=1 10 log 10 (e) (α i + g * i )/ dG dB i dfi . Whenever the derivative dG dB i /df i is small (as at 1538nm and x = 0.63), then dB is large. Conversely when the G dB vs. f curve is steep at the intersection points, then dB is small, as for all x above the knee.  (21) are fixed, and λ > 0 is the Lagrange multiplier. We now set the derivative of the Lagrangian w.r.t. all Q k , k = 1, ..., N c , equal to zero:

Now,
∂Qj ∂Q k = δ jk is a Kronecker delta, i.e., 1 if j = k and zero else. Thus we get: 0 = 2∆f ln (2) 1 N k +Q k − λ(G k − 1),hence (N k + Q k )(G k − 1) = θ, with constant θ 2∆f ln(2)λ , and thus get the signal fluxes for all k = 1, .., N c as: Q k = θ G k −1 − N k , with a k−dependent inverse-gain-shaped water-level θ k θ G k −1 equal to the sum of signal and noise fluxes on frequency bin k. The Lagrange optimization fails when some of the signal fluxes are negative. In this case we must include the flux non-negativity in the constraints and solve the resulting Karush-Kuhn-Tucker (KKT) problem [25]. As in classical water-filling, we find that the correct KKT solution for all k = 1, ..., N c is [13, Sec. 9.4]: where x + max(x, 0). The expression for the water-level parameter θ is found by forcing compliance with the Saleh constraint (8) at EDFA 1: Contrary to the classical water-filling for the parallel additive Gaussian channel, the water-level θ k θ G k −1 is now k-dependent, and water-filling is not in the powers but in the photon fluxes. In solving the above nonlinear equation in θ, as a starting guess we may choose the lower bound θ LB = AK(Qp,x1)+ Nc k=1 N k (G k −1) N ch ≤ θ that we get in absence of (.) + .

Classical Water-filling (CW)
If instead of condition (29) we impose then we obtain the classical water-filling allocation (although using fluxes instead of power) with flat water level θ, obtained by solving Saleh equation at EDFA 1: As a starting guess we may use the lower-bound value θ LB = AK(Qp,x1)+ Nc k=1 N k (G k −1) Nc k=1 (G k −1) .

APPENDIX 3: GW ALLOCATION IN CPSD LINK
We can also impose allocations such as GW and CW (See Appendix 2) in the CPSD scenario. Here we explain GW.
The RX SNR at every WDM frequency f j is ΓSN R j Q j /N j , where from (7) the TX-equivalent ASE noise is and lim Qj →0 N j (x, Q j ) = ∞. Using this expression in the calculations in Appendix 2 we get that the GW allocated fluxes correspond to equation (29), where the "waterlevel" parameter θ satisfies eq. (30). Such last equation is nonlinear in the unknown flux vector Q. To solve for Q , we can use the following recursion. If we have guesses Q i−1 and θ i−1 at epoch i = 1, 2, ...., then from (29) we update all entries k = 1, .., N ch of Q i as: Q k,i = ( θi−1 G k (x)−1 − N k (x, Q k,i−1 )) + , then update N k (x, Q k,i ) per (33), and finally update θ i as the solution of: Nc(x) k=1 (θ i − N k (x, Q k,i )(G k (x) − 1)) + = AK(Q p , x 1 ), as per (30). In our Matlab software we use as starting guess Q 0 the CIP solution (11) at the same (Q p , x) values, and use as a starting guess: is the number of channels where N k is finite.