USGS Surface Water Data Scraper

2017-09-14T17:04:09Z (GMT) by Sean Hardison
These two scripts can be used to pull all of the water quality and flow data from the USGS "Current Conditions" page for a given state. You will first need to build a list of URLs using the URL script, and then you can use the second script to pull all of the data from each URL. The scraper is configured to use the maximum available date range for each variable of interest (e.g. discharge, conductivity...), meaning that file sizes can be anywhere from very small to very large. If you're finding that your data is being cut-off, you'll need to extend the "timeout" parameter within the url request call (currently set to 199 s), so that the total extent of the data may load. <div><br></div><div>The USGS prefers automated data retrieval take place between 12 am - 6 am. Please adhere to these guidelines or your connection may be blocked. To facilitate friendly scraping, I've added a start time parameter to the second script. Set this variable ('then') to the time you want to start running the scraper. <div><br></div><div>Another common issue is the parameter code dictionary. The USGS maintains a large list of parameters assigned to 5 digit codes. If a code is not present in the dictionary as is (i.e. dictionary 'd'), the URL will be saved to Broken_links.csv and data will not be collected. To remedy this, follow these URLs in your browser, find the new variables/codes, and then add them to the dictionary. </div></div>