:::: MENU ::::
Browsing posts in: Programming

Creating spatial data analytics dashboards in Cartoframes

With the strength of Carto in terms of spatial science and location intelligence ; and the easy access to data science packages in Python, Carto’s new project ‘Cartoframes‘ has a lot of potential to provide excellent mapping dashboards for data-hungry workflows.

Below is a quick tutorial I have made which will hopefully help new users figure out how to use it. It is in no way comprehensive, and there are probably some pieces missing ; but it should be enough to go off to get started! The tutorial covers some of the elements of creating a ‘live’ weather data dashboard for New South Wales in Australia.

What is Cartoframes ? (from https://github.com/CartoDB/cartoframes)
A Python package for integrating CARTO maps, analysis, and data services into data science workflows.

Python data analysis workflows often rely on the de facto standards pandas and Jupyter notebooks. Integrating CARTO into this workflow saves data scientists time and energy by not having to export datasets as files or retain multiple copies of the data. Instead, CARTOframes give the ability to communicate reproducible analysis while providing the ability to gain from CARTO’s services like hosted, dynamic or static maps and Data Observatory augmentation.

Features

Write pandas DataFrames to CARTO tables
Read CARTO tables and queries into pandas DataFrames
Create customizable, interactive CARTO maps in a Jupyter notebook
Interact with CARTO’s Data Observatory
Use CARTO’s spatially-enabled database for analysis

Step 1 – Install libraries
Install all of the relevant libraries. For me I’m using Canopy. Canopy provides Python 2.7 and 3.5, with easy installation and updates via a graphical package manager of over 450 pre-built and tested scientific and analytic Python packages from the Enthought Python Distribution. These include NumPy, Pandas, SciPy, matplotlib, scikit-learn, and Jupyter / IPython. You can get Canopy for free here.

Once installed, open the console and install the packages:
pip install cartoframes
pip install pandas

Step 2 – Import libraries

In a new Jupyter notebook, start by importing the libraries in the first block, these are the ones you’ll generally need (though you can go to town with other numerical / statistical packages here!):

import cartoframes
import pandas as pd
import numpy as np

Step 3 – Set up a Carto account and register for an API key

Start by going to Carto.com and signing up through the prompts.

Once you have signed up, in the top-right of your home page there should be setting toggle which show you:

View your public profile
Your account
Your API keys
Close session

Click on ‘Your API keys’ and copy what shows up on the next page. It should be a long string of text, looking something like this:

31b453f27c085747acc6a51a9e5717beae254ced

Step 4 – Connecting to your Carto account in Python
Try the following line of code in your next Jupyter code block, where xxxxxxxxxxxx is your new API key. This key allows Cartoframes to communicate directly with the data in your Carto account.

Where it says ‘oclock’ you should put your own username.

cc = cartoframes.CartoContext(base_url='https://oclock.carto.com',api_key='xxxxxxxxxxxx')

When you run this code and call ‘cc’ it should provide you with a message such as this:
cartoframes.context.CartoContext at 0x1ea3fa2c518

This means that cartoframes has successfully accessed your Carto account and you can call ‘cc’ to reference accessing this account from now on. Make sure you keep your API key safe!

Step 5 – Upload some data to Carto
For this task, I downloaded the shapefile components of weather locations from the Australian Bureau of Meteorology. This is all of the spatial files (.shp, shx, .dbf etc) for IDM 13 from:
ftp://ftp.bom.gov.au/anon/home/adfd/spatial/

These are all the files prefixed by IDM000013 and suffixed by .dbf,.prj,.sbn,.sbx,.shp,.shx,.shp.xml. Carto will need these all in a .zip file before you upload them.

The metadata for this dataset can be found here:

IDM00013 – point places (precis, fire, marine)
http://reg.bom.gov.au/catalogue/spatialdata.pdf

Once you have downloaded these you can upload the shapefile and it should give you a series of geolocated dots covering all of Australia, with many attributes as described in the metadata above. For this I called the dataset ‘idm00013’.

Step 6 – Read the data in jupyter
Let’s test if everything is working. The following should display a dataframe of all of the aspatial information stored in each weather location:

carto_df = cc.read('idm00013')
carto_df

The following should give you a list of all of the variables available to you to access and change:
list(carto_df.columns.values)

Step 7 – Making a map
Now for the exciting bit – creating a Carto map inside the Jupyter notebook.
Here I’ve picked the elevation column with a brown colour scheme, try:


from cartoframes import Layer, BaseMap, styling
cc.map(layers=[BaseMap('light'),Layer('idm00013',color={'column': 'elevation','scheme': styling.brwnYl(7)},size=5)],
interactive=True)

The following map should display, with light brown showing where the weather points are a low elevation, and high points shown in a darker brown.

Extension – Accessing and parsing a live data feed

The code below retrieves the latest weather forecasts for the weekend ahead from the Bureau of Meteorology’s API. It is stored in a dataframe ‘df’.

I’ll leave the indentation as part of this tutorial!

import xml.etree.ElementTree as ET
import csv
import pandas as pd
import urllib.request
req = urllib.request.Request('ftp://ftp.bom.gov.au/anon/gen/fwo/IDN11060.xml')
with urllib.request.urlopen(req) as response:
xml_data = response.read()
list_dict = []
root = ET.XML(xml_data)
for element in root.findall('forecast'):
for area in element:
for forecast in area:
min_temp = ''
max_temp = ''
aac_id = area.get('aac')
forecast_date = forecast.get('start-time-local')
for element in forecast:
if element.attrib['type'] == 'air_temperature_minimum':
min_temp = element.text
elif element.attrib['type'] == 'air_temperature_maximum':
max_temp = element.text
list_dict.append({'aac':aac_id, 'forecast_date':forecast_date, 'low_temp': min_temp, 'max_temp':max_temp})
df = pd.DataFrame(list_dict)
df

Extension Part 1 – Joining in a live data source

We now want to join the geographical data from the first exercise with this live data feed.
This is done with a ‘left’ join, so we keep all of the weather forecast records and add the geographic data to them.

merged_data = pd.merge(df,carto_df,on='aac')
merged_data

Extension Part 2 – Selecting some data

Now we filter out all records to get one particular day’s forecast (you will need to change the date here to current date).
The filtered data is then written to a new dataset in Carto called ‘merged_weathermap’.

one_forecast = merged_data[merged_data['forecast_date']=='2018-01-16T00:00:00+11:00']
cc.write(one_forecast, 'merged_weathermap',overwrite=True)

Extension Part 3 – Putting it all together
#Step 10

Now let’s add the data from the Weather feed API to a Cartoframes map. The following reads in the merged_weathermap dataset we just created and colours
in the maximum temperature for the forecast data for each weather point in New South Wales. Pink being a high temperature, and blue being a lower temperatue.

from cartoframes import Layer, BaseMap, styling
cc.map(layers=[BaseMap('light'),Layer('merged_weathermap',color={'column': 'max_temp','scheme': styling.tropic(10)},size=10)],
interactive=True)

That’s it! From here, it is feasible to see with a bit of extra work and some scripts that continuously ping the APIs etc that we are only a few steps away from creating live dashboards which integrate other statistical and mathematical packages, such as even including machine learning.

Looking forward to seeing developments in this space and if you have any feedback or ideas let me know!

For more information on Cartoframes have a look at their documentation.

FlipboardShare

Measuring accessibility – on the 30 minute city

30min2

One of the recent projects I’ve been involved in at Arup has been developing spatial, analytical tools to understand transport accessibility. In particular, this is to do with destination-based accessibility – so rather than assessing how well-performing a city is delivering transport at particular points (which could go anywhere), we looked at how this performs delivering to all other places in the city. In particular we were looking at places that are important to creating liveable environments – such as to education, parks, healthcare and our jobs.

For me, this topic was building well on research I had done in 2015 (See ‘Where to From Here? A Modelling Methodology for Measuring Land-Use and Public Transport Accessibility in Melbourne), which assessed destination-based accessibility within transport modelling software, restricted to travel zones. This time there were some major improvements to the method ; mostly from removing from a software shell to raw code, and much more disaggregate units of analysis.

We assessed Greater Sydney Sydney at a 300m x 300m grid level, producing over a million travel time isochrones for driving (including traffic), public transport and walking to assign accessibility values to liveability variables in approximately 120,000 small cells in the city. In a nutshell, our toolkit involved a bit of OpenTripPlanner, Python, Amazon Web Server and FME – all using Open Data sources. This means means the method is highly reproducible for both other cities, and applicable to the same city with a different network (which, could be used to evaluate transport network changes, or alternate land use scenarios). A web map has been produced to showcase some of the work done in this space is so far , exploring what the ’30 minute city’ means for Sydney:

30minutecity.arup.digital

30min

It is certainly exciting to see the potential of this thinking and method being applied to both Sydney and other cities. Accessibility and the impact on individual opportunities is often overlooked and undervalued in many forms of transport analyses. With the increasing richness of the data that is becoming available from the Government and other forms of Open Data; combined with open analytical and visual methods like these it is encouraging and clear that these analyses can potentially produce insight towards tackling some of our growing issues in Australian cities, such housing affordability, transport disadvantage, sustainability.

FlipboardShare

Dissertation research

sf-nodes2

My research interests earlier this year began in understanding what drives certain behaviour in the urban environment – specifically in what changes to transport systems do in affecting people’s choice to use public transit. It’s been a long haul, but today marks the completion of this project, titled ‘The use of automatically-collected transport data in the spatiotemporal analysis and visualisation of policy change: a case study of the San Francisco Municipal Railway’. Here I will provide a bit of an overview for those who might be interested.

There are a wide array of technologies within our urban environment that are constantly collecting data. Many of these systems are still in relative infancy, however for transit systems have been among the earliest to be implemented and mature. While many of these sensors, particularly in transit, are used for real-time decision making, there is an increasing interest in the use of this data for retrospective analyses. Studies over these long periods provide improved means in understanding changes in behaviour, as they can potentially be provided with enough data to achieve statistically sound results.

The dissertation focuses on utilizing data that has been collected throughout the public transport network of the City of San Francisco. San Francisco are innovative in their approach towards collecting data to improve their public transit network. In particular, they have been collecting detailed data of their bus network over the past five years. This data includes automatic-vehicle-location (AVL) and automatic-passenger-counting (APC) data. This vehicle location data, which many people are exposed to every day through ‘next bus’ information in signs or on their mobile devices can provide very detailed information on the transport system.

When archived over large periods of time, it can draw a picture of how the systems has been performing and how it has changed, from a stop-level, to route-level and system-wide perspective. There are many performance variables that can be deduced from these samples of vehicle location – including the how often the bus arrives on time, the speed of the bus and the waiting time for passengers. While this gives us an indication of the vehicular performance of the fleet, we can also ascertain an understanding of passenger experience, and from their ridership changes, how this experience affected their behaviour.

2

This is achieved through APC data, which people may be less aware of. This data is collected through laser beams above the door of the bus which perform count calculations of how many people entered and exited the vehicle based on how the beams were broken. Counting data could also be used from information sources such as fare-collection boxes, where you touch on your Oyster Card or Myki.

The dissertation uses these large datasets to assess how both people and vehicles changed over time throughout the city. During the research, it was decided to focus on one particular policy where a large change was hypothesized to have occurred in the system. A number of changes were identified, while a number of variables also appeared inelastic to the policy change.

As part of this dissertation, two specific visualisation and analysis techniques were employed. Firstly, this involved the development of an interactive visualisation tool which enabled rapid exploratory analysis of the data. Secondly, two networks were created to represent the data – one where all consecutive bus stations were linked, and the other where all bus stops with a shared route were linked. With these networks, a number of graph-theoretic attributes were tested, weighted with the data at different time periods – including degree, betweenness centrality and efficiency.

Throughout the process it was found that the policy changes and effects were able to be portrayed in the automatically collected data, however not without its own challenges and potential inaccuracies based on its size and sampling methods. Further research in this field could involve utilising similar tools and methods in understanding at a smaller time scale, such as the system response after a tube strike or natural disaster. If you are interested, pieces of this work feature in the publication listed below and the full dissertation will possibly be available in several months after marking at CASA, UCL library.

3

Lock, O. 2014. The use of automatically-collected transport data in the spatiotemporal analysis and visualisation of policy change: a case study of the San Francisco Municipal Railway. Diss. University College London (UCL). London, United Kingdom.

Erhardt, G. D., Lock, O., Arcaute, E. & Batty, M. 2014. A Big Data Mashing Tool for Measuring Transit System Performance. The Big Data and Urban Informatics Workshop. Chicago, Illinois, USA.

FlipboardShare

Pages:1234