In this tutorial, the basics of Importing data with Colabs are introduced and data is loaded with Pandas and Geopandas.

⚠️ This writing is a work in progress. The functions work. ⚠️

Please read everything found on the mainpage before continuing; disclaimer and all.

Binder Binder Binder Open Source Love svg3

NPM License Active Python Versions GitHub last commit

GitHub stars GitHub watchers GitHub forks GitHub followers

Tweet Twitter Follow

About this Tutorial:

Whats inside?

The Tutorial

In this notebook, the basics of data-intake are introduced.

  • Data will be imported using Colabs Terminal Commands then load this data into pythons pandas
  • We will import geospatial data from Esri then load this data into geo-pandas.
  • A variety of data formats will be imported.

Objectives

By the end of this tutorial users should have an understanding of:

  • Importing data with pandas and geopandas
  • Querying data from Esri
  • Retrieveing data programmatically
  • This module assumes the data needs no handling prior to intake
  • Loading data in a variety of formats

Background

Importing Data with Colabs:

Instructions: Read all text and execute all code in order.

How XYZ :

  • TODO

If you would like to ...

For this next example to work, we will need to import hypothetical csv files

Try It! Go ahead and try running the cell below.

Advanced

# Otherwise this tool assumes shp or pgeojson files have geom='geometry', in_crs=2248. 
# Depending on interactivity the values should be 
# coerce fillna(-1321321321321325)
# Returns 

class Intake[source]

Intake()

u = Intake
rdf = Intake.getData('https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Hhchpov/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson')
rdf.head(1)
Getting Data From:  https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Hhchpov/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson
gothere
gothere2
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-31-e7d44780f501> in getData(url, interactive)
     15         print('gothere2')
---> 16         from dataplay import geoms
     17         print('gothere3')

ModuleNotFoundError: No module named 'dataplay'

During handling of the above exception, another exception occurred:

KeyboardInterrupt                         Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py in _input_request(self, prompt, ident, parent, password)
    728             try:
--> 729                 ident, reply = self.session.recv(self.stdin_socket, 0)
    730             except Exception:

/usr/local/lib/python3.7/dist-packages/jupyter_client/session.py in recv(self, socket, mode, content, copy)
    802         try:
--> 803             msg_list = socket.recv_multipart(mode, copy=copy)
    804         except zmq.ZMQError as e:

/usr/local/lib/python3.7/dist-packages/zmq/sugar/socket.py in recv_multipart(self, flags, copy, track)
    624         """
--> 625         parts = [self.recv(flags, copy=copy, track=track)]
    626         # have first part already, only loop while more to receive

zmq/backend/cython/socket.pyx in zmq.backend.cython.socket.Socket.recv()

zmq/backend/cython/socket.pyx in zmq.backend.cython.socket.Socket.recv()

zmq/backend/cython/socket.pyx in zmq.backend.cython.socket._recv_copy()

/usr/local/lib/python3.7/dist-packages/zmq/backend/cython/checkrc.pxd in zmq.backend.cython.checkrc._check_rc()

KeyboardInterrupt: 

During handling of the above exception, another exception occurred:

KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-32-14bebfffa187> in <module>()
      1 u = Intake
----> 2 Intake.getData('https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Hhchpov/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson', True)

<ipython-input-31-e7d44780f501> in getData(url, interactive)
     20       return df
     21     except:
---> 22       if interactive: return Intake.getData(input("Error: Try Again?  ( URL/ PATH or  'NO'/ <Empty> ) " ), interactive)
     23       return False
     24 

/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py in raw_input(self, prompt)
    702             self._parent_ident,
    703             self._parent_header,
--> 704             password=False,
    705         )
    706 

/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py in _input_request(self, prompt, ident, parent, password)
    732             except KeyboardInterrupt:
    733                 # re-raise KeyboardInterrupt, to truncate traceback
--> 734                 raise KeyboardInterrupt
    735             else:
    736                 break

KeyboardInterrupt: 

Here we can save the data so that it may be used in later tutorials.

rdf
False
# .to_csv(string+'.csv', encoding="utf-8", index=False, quoting=csv.QUOTE_ALL)

Download data by:

  • Clicking the 'Files' tab in the left hand menu of this screen. Locate your file within the file explorer that appears directly under the 'Files' tab button once clicked. Right click the file in the file explorer and select the 'download' option from the dropdown.

You can upload this data into the next tutorial in one of two ways.

1)

  • uploading the saved file to google Drive and connecting to your drive path

OR.

2)

  • 'by first downloading the dataset as directed above, and then navigating to the next tutorial. Go to their page and:
  • Uploading data using an file 'upload' button accessible within the 'Files' tab in the left hand menu of this screen. The next tutorial will teach you how to load this data so that it may be mapped.

Here are some examples:

Using Esri and the Geoms handler directly:

import dataplay
geoloom_gdf_url = "https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Geoloom_Crowd/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson"
geoloom_gdf = dataplay.geoms.readInGeometryData(url=geoloom_gdf_url, porg=False, geom='geometry', lat=False, lng=False, revgeocode=False,  save=False, in_crs=4326, out_crs=False)
geoloom_gdf = geoloom_gdf.dropna(subset=['geometry']) 
geoloom_gdf.head(1)
OBJECTID Data_type Attach ProjNm Descript Location URL Name PhEmail Comments POINT_X POINT_Y GlobalID geometry
0 1 Artists & Resources None Joe Test 123 Market Pl, B... -8.53e+06 4.76e+06 e59b4931-e0c8-4d... POINT (-76.60661...

Again but with the Intake class:

u = Intake
Geoloom_Crowd, rcol = u.getAndCheck('https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Geoloom_Crowd/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson')
Geoloom_Crowd.head(1)
OBJECTID Data_type Attach ProjNm Descript Location URL Name PhEmail Comments POINT_X POINT_Y GlobalID geometry
0 1 Artists & Resources None Joe Test 123 Market Pl, B... -8.53e+06 4.76e+06 e59b4931-e0c8-4d... POINT (-76.60661...

This getAndCheck function is usefull for checking for a required field.

Hhpov, rcol = u.getAndCheck('https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Hhpov/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson', 'hhpov19', True)
Hhpov = Hhpov[['CSA2010', 'hhpov15',	'hhpov16',	'hhpov17',	'hhpov18',	'hhpov19']]
# Hhpov.to_csv('Hhpov.csv')
Hhpov.head()
Getting Data From:  https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Hhpov/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson
CSA2010 hhpov15 hhpov16 hhpov17 hhpov18 hhpov19
0 Allendale/Irving... 24.15 21.28 20.70 23.00 19.18
1 Beechfield/Ten H... 11.17 11.59 10.47 10.90 8.82
2 Belair-Edison 18.61 19.59 20.27 22.83 22.53
3 Brooklyn/Curtis ... 28.36 26.33 24.21 21.54 24.60
4 Canton 3.00 2.26 3.66 2.05 2.22

We could also retrieve from a file.

u = Intake
# rdf = u.getData('Hhpov.csv')
rdf.head()
Unnamed: 0 CSA2010 hhpov15 hhpov16 hhpov17 hhpov18 hhpov19
0 0 Allendale/Irving... 24.15 21.28 20.70 23.00 19.18
1 1 Beechfield/Ten H... 11.17 11.59 10.47 10.90 8.82
2 2 Belair-Edison 18.61 19.59 20.27 22.83 22.53
3 3 Brooklyn/Curtis ... 28.36 26.33 24.21 21.54 24.60
4 4 Canton 3.00 2.26 3.66 2.05 2.22