Evaluation of Irradiance Variability Adjustments for Subhourly Clipping Correction

Subhourly changes in solar irradiance can lead to energy models being biased high if realistic distributions of irradiance values are not reflected in the resource data and model. This is particularly true in solar facility designs with high inverter loading ratios (ILRs). When resource data with sufficient temporal and spatial resolution is not available for a site, synthetic variability can be added to the data that is available in an attempt to address this issue. In this work, we demonstrate the use of anonymized commercial resource datasets with synthetic variability and compare results with previous estimates of model bias due to inverter clipping and increasing ILR.


I. INTRODUCTION
In previous work [1], we showed that typical photovoltaic (PV) system performance modeling workflows based on hourly resource datasets suffer from overprediction bias that increases with increasing inverter loading ratio (ILR).This ILR-dependent model bias was shown to be consistent with the bias expected from subhourly clipping, a normal behavior of real PV systems that hourly simulations are inherently incapable of modeling directly (although adjustment-based workarounds have been explored [2]- [5]).Direct simulation of this effect is possible with the combination of high-frequency resource datasets and performance modeling tools capable of making use of them.Sub-hourly simulation tools have been available for some time in the form of various open-source [6], [7] and proprietary software packages, but high-frequency resource datasets have historically been limited to the few locations with ground-based resource monitoring stations.
In more recent years, enabled by advancements in weather satellite imagery, the "native" time resolution of many satellitebased resource datasets has improved to 5-minute intervals [8].While this represents a significant improvement, it does not offer a complete solution to direct modeling of subhourly clipping.One problem is that even intervals as short as 5 minutes do not fully capture the fastest components of realworld irradiance fluctuation, which in many climates require minutely or even sub-minutely data to accurately represent.Another limitation is that the current models for producing satellite-based irradiance datasets do not account for some complex secondary phenomena like cloud edge enhancement and therefore underestimate the upper tail of the distribution of irradiance data [1].
To address these issues, statistical/stochastic adjustments are sometimes applied after the primary irradiance modeling process to generate plausible irradiance signals with higher time resolution and more desirable variability characteristics.Existing methods include the use of Markov chains with transition probability tables [9], implemented in modeling software PV*SOL [10] and resource data from Meteonorm [11].Markov chains are also used in a subhourly resource data product from Solargis [12].Random noise based on regional parameters adds variability and clear-sky exceedence to a SolarAnywhere product [13], [14].
However, it remains to be seen whether these variability adjustments successfully address the limitations mentioned previously.In this work, we evaluate the suitability of five such datasets for the purpose of directly simulating the effect of subhourly clipping.

II. METHODS
To evaluate some of these enhanced resource datasets with synthetic variability adjustments, we use the six PV systems from [1], with specifications shown in Table I.We modeled the plants with NREL SAM, version 2022.11.21 [6], matching plant specifications as closely as possible with available models and equipment in SAM, using default options otherwise, and using the Perez transposition model with DNI and GHI as inputs.For each of the sites we used five commerciallyavailable enhanced resolution datasets.In each case, we first modeled a year of plant operation using the enhanced resource dataset at its native time resolution (e.g., 1 or 5 minutes).Next, we averaged the resource dataset to 60 minute intervals and modeled it again.Finally, we compared the two resulting annual energy values.This is analogous to "clip, then average" and "average, then clip" bias described in [15].We refer to this as simulation bias: where E sim, AtC is total energy modeled in SAM using resource data that was first averaged to 60 minute intervals and E sim, CtA is energy from the same model but using the resource data at its native resolution.  2 Solar Variability Zone [17] We then increased the number of strings behind the inverter in the model, and in some cases modified the inverter model, to evaluate ILRs of 1.2 to 2.0, with intervals of 0.1.
We note that the enhanced resource datasets we used are not from the same calendar year as the previous analyses, however, we expect that interannual variability in subhourly clipping error is small relative to the magnitude of the error based on [18].
As a reference to compare the simulation bias to, we calculated an empirical bias using real power measurements from an inverter.We used the same methods as in [1], with the exception that we did not apply rescaling coefficients in the current work, as the mismatched calendar years between measured power and the enhanced resource datasets make tuning to the modeled output impractical.The empirical approach of using measured power from low ILR systems lets us estimate subhourly clipping bias that varies with ILR without risk of model error, controlling for other effects outside of ILR.Empirical subhourly clipping bias is calculated similarly to simulation bias, using the "average, then clip" vs "clip, then average" method: where E meas,AtC is total energy calculated by averaging measured data to 60 minute intervals and then artificially clipping (analogous to what a conventional hourly simulation model would calculate) and E meas,CtA is energy calculated by first clipping at 1 minute intervals and then averaging to 60 minutes (analogous to what a real system would do).For each system, this bias is evaluated at ILRs of 1.2 through 2.0 in increments of 0.1.

III. RESULTS
Simulated bias results for five datasets (V1-V5) are shown in Fig. 1, along with empirical biases calculated in the same way as in [1].Tabular data are in Table II.For V1 across all sites except NREL, simulated bias is higher than empirical bias at lower ILR values, and simulated bias is lower than empirical bias at higher ILR values.For V2 across all sites, simulated bias is very low across all ILR values, and V2 bias is lower than V1 bias for all sites and ILR values, with one exception (NIST site at 2.0 ILR, where the two are approximately equal).
For V3-V5 there is a broad range of results, with V3 being significantly higher than other datasets and empirical bias, V4 is similar to V2, and V5 is generally higher than V2 but lower than empirical bias.

IV. CONCLUSION
Our analysis of five enhanced resource datasets using five PV systems indicates that there is a broad range of results for subhourly model bias, with some trends evident.All of the tested subhourly datasets modeled at all ILRs yield lower energy than is obtained from the hourly model, which is the expected general behavior.However, most datasets tend to underestimate subhourly losses and resulting bias at higher ILRs.Exceptions were V3, which notably overestimated bias at all ILRs and most sites, and V1, which tended to slightly overestimate bias at ILRs up to 1.3-1.5, both with the exception of the NREL site.Models using V2 have similar bias to the empirical model at low ILR for most projects, but significantly under predict bias at high ILR.Models using V4 underpredict bias at all ILR greater than 1.2-1.3.Models using V5 also underpredict bias for all but the lowest ILR, but are generally closer to the empirical bias than V4.
Regarding sites analyzed, NREL was an exception in a number of areas.As previously noted in [1], the NREL site had unexplained performance issues, which could account for some differences there.It appears, however, based on empirical and simulated biases being approximately equal to or lower than other sites, that the higher Solar Variability Zone (SVZ) [17] classification does not necessarily indicate higher bias in hourly modeling.
Based on the sites and datasets evaluated here, the authors recommend that users seek validation of synthetic variability datasets.Ideal validation would include multiple relevant sites and ILRs.As an example to support needing validation at multiple sites, V5 matches empirical data well at the NREL site, but appears to overestimate significantly at the other sites.And in support of needing multiple ILRs, for example, V1 and V5 appear to match well for several sites at ILRs of 1.4-1.5, but they underestimate at high ILRs and often overestimate at low ILRs.
Regarding applicability of this work, we recognize that some datasets may be intended for things like ramp rates analysis for battery sizing and grid integration studies, which 978-1-6654-6059-0/23/$31.00 ©2023 IEEE 2 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply. is not something we looked at here.And specific to subhourly model bias, we note that inverter-level effects are the primary focus here, and phenomena like plant-level clipping with inverter ac overbuilds are a separate consideration.Future work could include comparing enhanced resource datasets with comparable standard datasets from the same source (rather than post-processing to get hour averages ourselves) and exploring the reasons for different bias results in the different datasets.We also expect the vendor algorithms to significantly evolve over the next few years and that reevaluation will be needed.

Fig. 1 .
Fig. 1.Comparison of the ILR-dependent simulated and empirical biases for five sites, using between two and five enhanced resource datasets with synthetic variability adjustments, anonymously denoted V1-V5.See Table II for tabular data.

TABLE II TABULAR
BIAS RESULTS.SEE FIG.1FORA GRAPHICAL VERSION.