figshare
Browse

Code and data for reproducing the results in the original paper of DML-Geo

Version 2 2025-12-03, 15:03
Version 1 2025-12-03, 07:51
online resource
posted on 2025-12-03, 15:03 authored by Pengfei CHENPengfei CHEN
<p dir="ltr">This asset provides all the code and data for reproducing the results (figures and statistics) in the original paper of DML-Geo</p><h2>Main Files:</h2><p dir="ltr"><b>main.ipynb</b>: the main notebook to generate all the figures and data presented in the paper</p><p dir="ltr"><b>data_generator.py</b>: used for generating synthetic datasets to validate the performance of different models</p><p dir="ltr"><b>dml_models.py</b>: Contains implementations of different Double Machine Learning variants used in this study.</p><p dir="ltr"><b>ridge_gwr.py</b>: Implementations of a modified Geographically Weighted Regression (GWR) with ridge regression</p><p dir="ltr"><b>ridge_sel_bw.py</b>: Implementations of a modified selector of band width in GWR with ridge regression</p><p dir="ltr"><b>scenario_manager.py</b>: Functions to create simulation scenarios</p><p dir="ltr"><b>utility.py</b>: Functions for testing spatial causal effects using different models and placebo tests for inference.</p><h2> `data` folder:</h2><p><br></p><p dir="ltr">grf_example.csv: Raw data of dataset 1 used for comparing Geoshapley, Causal Forest and DML-Geo.</p><p dir="ltr"><b>rslt.pkl</b>: A pickled Python object that stores the explainer based on geoshapley for dataset 1.</p><p dir="ltr"><b>seattle_raw_filtered.csv</b>: Processed dataset for the 'house price dataset' in King County mentioned in the study.</p><p dir="ltr"><b>Depression.csv</b>: The raw dataset of 'mental health dataset' in South Carolina mentioned in the study.</p><p dir="ltr"><b>simulation_res.csv</b>: Results data for simulation experiments of 118 scenarios.</p><p dir="ltr"><b>SouthCarolina.geojson</b>: Geospatial boundary data for South Carolina.</p><p dir="ltr"><b>King_council_legdst_area.geojson</b>: Geospatial boundary data for King County.</p><h2> `output` folder:</h2><p dir="ltr">This folder is used to store result files generated after running the models.</p><p dir="ltr"><b>res_depression</b>/: Contains results from DML models (e.g., DML-CATE, DML-GAM, DML-Geo, DML-GWR) run on the Depression.csv dataset.</p><p dir="ltr"><b>res_seattle_raw/</b>: Contains results from DML models (e.g., CATE, GAM, Geo, GWR) run on the seattle_raw_filtered.csv dataset.</p>

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC