This tutorial shows how to join your observational data with envirounmental data
latitude
, longitude
and date
(and any other non-key columns).latitude
and longitude
(and any other non-key columns).The output table would have observational data (all the rows) and for each row we find nearest location (geo distance) from envirounmental data.
We use Google BigQuery as an underlying engine to perform this join.
Make sure you uploaded your observational data file to Wildflow. You can go to the Datasets
tab and click on the +
button and drag and drop your file there.
For now it only supports CSV
format. It only supports ISO date time (or
anything Google BigQuery supports). E.g 2000-10-31 23:30
.
Please specify the file name and make sure all the columns have correct names. File name should start with raw.
for now (it’s a name of the dataset/collection of tables in BigQuery).
Proceed to upload your table.
If you need your dataset available in wildflow, please contact us via WhatsApp or Discord.
Please provide us the name of the dataset, e.g.: https://data.marine.copernicus.eu/product/OCEANCOLOUR_GLO_BGC_L4_MY_009_104
Navigate to the Workflows tab and select tables you need to join:
First choose your observational data (e.g. trails of your shark) and then choose large dataset (like chlorophyll levels).
You can select as many environmental tables as you want.
Type
Now you should be able to see the SQL code generated to join these tables:
Feel free to save it for future experiment reproducibility. You can run this in the Google BigQuery console.
When you continue the query will be executed:
You can see most recent jobs here: Jobs.
And it should finish with a few minutes:
Now you should be able to see a new table with merged observational and environmental data.
Yay! 🐳
If you have a timestamp format that is not supported by BigQuery (06/05/18 10:14
but you need 2018-05-06 10:14
), you can run this adhoc query to create raw.test_table
from gs://bucket-name/uploaded/Table_2024-04-10T16:59:21.145Z.csv
.