S1 Text -
Supporting information text. This file includes: (1) Query terms used for Google Trends in Table A; (2) Characteristics for each of the 20 chosen cities in Table B; (3) Additional information and illustrations of the binary prediction task and improvements using historical seasonality, in Table C and Figs A, B and C; (4) Performance of the baseline approach, a linear regression which uses only the most recently available data points as features, in Table D; (5) Violin plots of performance across all cities, broken down by base model (LASSO or Random Forest), feature set (AR, GT, AR+GT, AR+GT+W), and delay in the receipts of epidemiological information, in Figs D1 and D2; (6) Appendix: Plots of comparative Lasso and Random Forest performance for all 20 cities, across all feature sets and AR lags.
(ZIP)