How to get historical weather for any city with BigQuery?

BigQuery has NOAA gsod data loaded as a public dataset - since 1929: https://www.reddit.com/r/bigquery/comments/2ts9wo/noaa_gsod_weather_data_loaded_into_bigquery/

How to get historical data for any city?

+9
source share
4 answers

Update 2019: for convenience

SELECT * FROM 'fh-bigquery.weather_gsod.all' WHERE name='SAN FRANCISCO INTERNATIONAL A' ORDER BY date DESC 

Updated daily - or report here if it is not.

For example, to get the hottest days for San Francisco stations since 1980:

 SELECT name, state, ARRAY_AGG(STRUCT(date,temp) ORDER BY temp DESC LIMIT 5) top_hot, MAX(date) active_until FROM 'fh-bigquery.weather_gsod.all' WHERE name LIKE 'SAN FRANC%' AND date > '1980-01-01' GROUP BY 1,2 ORDER BY active_until DESC 

enter image description here

Please note that this request was processed only 28 MB thanks to the cluster table.

And similarly, but instead of using the station name, I will use the location and a table grouped by location:

 WITH city AS (SELECT ST_GEOGPOINT(-122.465, 37.807)) SELECT name, state, ARRAY_AGG(STRUCT(date,temp) ORDER BY temp DESC LIMIT 5) top_hot, MAX(date) station_until FROM 'fh-bigquery.weather_gsod.all_geoclustered' WHERE EXTRACT(YEAR FROM date) > 1980 AND ST_DISTANCE(point_gis, (SELECT * FROM city)) < 40000 GROUP BY name, state HAVING EXTRACT(YEAR FROM station_until)>2018 ORDER BY ST_DISTANCE(ANY_VALUE(point_gis), (SELECT * FROM city)) LIMIT 5 

enter image description here


2017 Update: Standard SQL and the latest tables:

 SELECT TIMESTAMP(CONCAT(year,'-',mo,'-',da)) day, AVG(min) min, AVG(max) max, AVG(IF(prcp=99.99,0,prcp)) prcp FROM 'bigquery-public-data.noaa_gsod.gsod2016' WHERE stn='722540' AND wban='13904' GROUP BY 1 ORDER BY day 

An additional example to show the coldest days in Chicago over this decade:

 #standardSQL SELECT year, FORMAT('%s%s',mo,da) day ,min FROM 'fh-bigquery.weather_gsod.stations' a JOIN 'bigquery-public-data.noaa_gsod.gsod201*' b ON a.usaf=b.stn AND a.wban=b.wban WHERE name='CHICAGO/O HARE ARPT' AND min!=9999.9 AND mo<'03' ORDER BY 1,2 

To get historical weather for any city, we first need to find which station reports in that city. The table [fh-bigquery:weather_gsod.stations] contains the names of well-known stations, their status (if it is located in the USA), country and other details.

Thus, to find all stations in Austin, Texas, we will use the following query:

 SELECT state, name, lat, lon FROM [fh-bigquery:weather_gsod.stations] WHERE country='US' AND state='TX' AND name CONTAINS 'AUST' LIMIT 10 

enter image description here

This approach has 2 problems that need to be addressed:

  • Not all known stations are present in this table - I need to get an updated version of this file. So don’t give up if you cannot find the station you are looking for here.
  • Not every station found in this file works every year, so we need to find stations that have data for the year we are looking for.

To solve the second problem, we need to combine the station table with the actual data that we are looking for. The following query looks for stations around Austin, and column c shows how many days in 2015 contain actual data:

 SELECT state, name, FIRST(a.wban) wban, FIRST(a.stn) stn, COUNT(*) c, INTEGER(SUM(IF(prcp=99.99,0,prcp))) rain, FIRST(lat) lat, FIRST(lon) long FROM [fh-bigquery:weather_gsod.gsod2015] a JOIN [fh-bigquery:weather_gsod.stations] b ON a.wban=b.wban AND a.stn=b.usaf WHERE country='US' AND state='TX' AND name CONTAINS 'AUST' GROUP BY 1,2 LIMIT 10 

enter image description here

It's good! We found 4 stations with data for Austin during 2015.

Please note that we had to relate to the β€œrain” in a special way: when the station does not track rain, instead of null it marks it as 99.99. Our query filters these values.

Now that we know the stn and wban numbers for these stations, we can select any of them and visualize the results:

 SELECT TIMESTAMP('2015'+mo+da) day, AVG(min) min, AVG(max) max, AVG(IF(prcp=99.99,0,prcp)) prcp FROM [fh-bigquery:weather_gsod.gsod2015] WHERE stn='722540' AND wban='13904' GROUP BY 1 ORDER BY day 

enter image description here

+12
source

Now, the official NOAA dataset on BigQuery is in addition to a blog post describing it .

An example of obtaining minimum temperatures on August 15, 2016:

 SELECT name, value/10 AS min_temperature, latitude, longitude FROM [bigquery-public-data:ghcn_d.ghcnd_stations] AS stn JOIN [bigquery-public-data:ghcn_d.ghcnd_2016] AS wx ON wx.id = stn.id WHERE wx.element = 'TMIN' AND wx.qflag IS NULL AND STRING(wx.date) = '2016-08-15' 

What returns:

enter image description here

+2
source

Thanks for pulling out the data and making it publicly available. Here is BigQuery, which returns the total rainfall in 2014 for each station in Texas:

 SELECT FIRST(name) AS station_name, stn, SUM(prcp) AS annual_precip FROM [fh-bigquery:weather_gsod.gsod2014] gsod JOIN [fh-bigquery:weather_gsod.stations] stations ON gsod.wban=stations.wban AND gsod.stn=stations.usaf WHERE state='TX' AND prcp != 99.99 GROUP BY stn 

which returns: table of results

Pulling the number of rainy days in each place and sorting the results based on this:

 SELECT FIRST(name) AS station_name, stn, SUM(prcp) AS annual_precip, COUNT(prcp) AS rainy_days FROM [fh-bigquery:weather_gsod.gsod2014] gsod JOIN [fh-bigquery:weather_gsod.stations] stations ON gsod.wban=stations.wban AND gsod.stn=stations.usaf WHERE state='TX' AND prcp != 99.99 AND prcp > 0 GROUP BY stn ORDER BY rainy_days DESC 

appears these results .

+1
source

Using the station name is unreliable. In addition, it is difficult to use geospatial query using the new features of large queries, since the boundaries of cities do not have clear shapes (such as a circle or a rectangular rectangle).

Therefore, the best solution I found for your problem is to use reverse geocoding, requesting the Google Maps API to get the address, state, city and county for each station using latitude / longitude coordinates.

Here is the resulting CSV ( StationNumber,Lat,Lon,Address,State,City,County,Zip ) for the USA (you will notice that 98% of the stations exist there): https://gist.github.com/orcaman/a3e23c47489705dff93aace2e35f57d3

Here is the code in case you want to re-launch it at stations outside the United States (Golang): https://gist.github.com/orcaman/8de55f14f1c70ef5b0c124cf2fb7d9d1

0
source

All Articles