:::: MENU ::::
Browsing posts in: Programming

Bus flows in Processing


This visualisation was created as part of a short exercise for a digital visualisation module of my current studies.

It was coded in the software/language Processing . The challenge of this task was making sure each bus was
treated as an individual object/agent and creating a global timer that dictates not only where but when the object appears there.

Visualising this movement was also a challenge – as without trails of the previous bus movement it just looked like a twinkling display of unrelated dots. Using unique colours to identify buses was an alternative idea, but resulted in a bit of cognitive overload. It has here ended up to include very slight differences in shade of blue for each bus. The technique used here seem to be particularly effective on the edges of the city, and it becomes less effective when there is a high density of buses clustered near each other.

Another interesting experiment was drawing uniquely coloured lines as the bus progresses along its route.


Gentrification in London continued…


Observational analysis of the alterations of the social structure and housing markets of areas in inner-city London led Glass (1964) to coin the term ‘gentrification’, described as a phenomenon in which the social character of a district is transformed through the displacement of its working class inhabitants. Further research on gentrification has identified it to include factors such as the physical regeneration of housing stock and industrial areas.

Above is an example of the output of a function I have been developing in R to support quantitative research in urban planning. This function provides a tool to analyse weighted combinations of different variables and how they may have changed over time – suitable for identifying scenarios such as gentrification which have no clear single identifier.

The map aims to show concentrations of areas that have experienced a significantly high change in class, in dwelling stock and property value (dark blue) over the ten year period between 2001 and 2011. A number of clear areas have been identified in the inner-east and outer-east of the city.

Further information about this function and analysis of results to come.


Data mining with XPath


At the recent CASA Hackathon I was part of a team that developed a Social Radar (originally forecast to be a Google Glass app), retrieving data from Twitter and producing maps led by Steven Gray. Steve has produced a ergonomic and very useful presentation on how to retrieve these feeds here.

While working on this project I thought that it would be useful for the radar to display what events are currently on in the city.

One useful way to retrieve data when there is no API, or you have no time to learn an API, is by using XPath.

In my most simple definition, XPath is a way to retrieve specified nodes within XML (or simply HTML) pages.

For example, if you have the following code:

< div class = "name" >
< firstname > John < / firstname >
< surname > Smith < /surname
< div >

You could query the ‘firstname’ element and it would return ‘John’.

One can write a script in a language such as Python to automatically retrieve data from webpages, to either display it live or collect it over time.

Google Drive have made it very simple to undertake such tasks, and here I will post a short tutorial.

Using Google Drive

1. Create a Spreadsheet in Google Drive
2. In Sheet1 fill the first two column headings (A1, B1) with ‘Source’ and ‘XPath’ respectively.
3. In the Source column (B2), put in the name of a website with some data you would like to retrieve (e.g. TimeOut Events : http://www.timeout.com/london/search?_source=global&profile=london&_dd=&page_zone=events&keyword=&section=events&on=today&locationText=&_section_search=events )
4. In the XPath column, place the XPath (e.g. //div[@class=’topSection’]/h3 ). This is the hardest bit – reading this might help.
5. Create Sheet2, and use the importxml function to run the XPath query e.g. ( importxml(Sheet1!A2,Sheet1!B2)
6. Once you have tried a few of these feeds you can publish the document to the web, or geocode the results if you have chosen a dataset with addresses (such as this TimeOut dataset). Geo for Google Docs is quite useful for this, particularly if you use TileMill/Mapbox. Google Fusion tables also has good mapping capabilities.

Click here for an example of one of these documents.

Using Python

One of the limitations of the method mentioned above is it is limited to a maximum of 50 XPath queries (last time I checked) per document. If you are interested in harvesting a large dataset (for example, parsing through a real estate website for all houses currently on sale in a city, or collecting them continuously over time) one can utilise XPath in Python. The following script was produced at the event to retrieve data from TimeOut, clean the data, Geocode it and place it in a CSV file.

import lxml.html
import urllib
import urllib2
import pprint
import json
import csv
invalid = 0
with open('timeout-london.csv', 'wb') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=',',
quotechar='"', quoting=csv.QUOTE_MINIMAL)
print 'title;time;venue;location;lat;lon'
while invalid < 5:
for num in range(1,500):
s1 = "http://www.timeout.com/london/search?"
s2 = "language=en_GB&profile=london&order=popularity&page="+str(num)+"&page_size=1&source=admin&type=event&on=today&_section_search=&section="
s3 = str(s1+s2)
page = urllib.urlopen(s3)
doc = lxml.html.document_fromstring(page.read())
xme = doc.xpath("//h3/a")
xmf = doc.xpath("//h4")
title = xme[0].text_content()
location = xmf[0].text_content()
breakdown = xmf[1].text_content().split('|')
line = breakdown
time = line[0]
loc_name = line[1]
loc_area = line[2]
loc_area = 'London'
time = " ".join(time.split())
loc_name = " ".join(loc_name.split())
loc_area = " ".join(loc_area.split())
locstr = loc_name+', '+loc_area+', '+'London'
add = str(locstr)
add = urllib2.quote(add)
geocode_url = "http://maps.googleapis.com/maps/api/geocode/json?address=%s&sensor=false&region=uk" % add
req = urllib2.urlopen(geocode_url)
jsonResponse = json.loads(req.read())
x = json.dumps([s['geometry']['location']['lat'] for s in jsonResponse['results']], indent=0)
x = x.strip('[')
x = x.strip(']')
y = x.strip("'")
x = x.split()
x = x[0]
x = x.strip('[')
x = x.strip(']')
x = x.strip(',')
y = json.dumps([s['geometry']['location']['lng'] for s in jsonResponse['results']], indent=0)
y = y.strip('[')
y = y.strip(']')
y = y.strip("'")
y = y.split()
y = y[0]
y = y.strip('[')
y = y.strip(']')
y = y.strip(',')
print title+';'+time+';'+loc_name+';'+loc_area+';'+x+';'+y
invalid +=1

The output looks something like this, with lat/long columns suitable for mapping.

New Year's Eve 2013 Firework Display;Tue Dec 31;EDF Energy London Eye;Waterloo;51.5033;-0.11475
Christmas at Kew Gardens 2013;Until Sat Jan 4 2014;Kew Gardens;Kew, Surrey;51.4782342;-0.2984129
Carnaby Christmas 2013: The Year of the Robin;Until Mon Jan 6 2014;Carnaby Street;Soho;51.5148445;-0.1413416
The Book of Mormon;Until Sat Apr 5 2014;Prince of Wales Theatre;Leicester Square;51.51121389999999;-0.1198244
Coriolanus;Until Sat Feb 8 2014;Donmar Warehouse;Leicester Square;51.51121389999999;-0.1198244
Mojo;Until Sat Feb 8 2014;Harold Pinter Theatre;Trafalgar Square;51.51121389999999;-0.1198244
Winter Lights at Canary Wharf 2013;Tue Dec 31 - Sat Jan 25 2014;Canary Wharf;Docklands;51.50755299999999;-0.024526

I hope this code and examples prove useful for anyone trying to retrieve data from difficult pages on the web where no feeds are available. If you would like to appropriate this code for other webpages, feel free to contact me or try yourself. Quick example:

(example Social Event map using code from this tutorial ; data: TimeOut London, map: MapBox)