:::: MENU ::::
мужской спа салон самара | спа салон для мужчин | плакетка наградная заказать | модульные котельные на газе | насосы для отопления цена | насосы циркуляционные для отопления цена | дрель аккумуляторная | дрель миксер купить | отбойный молоток электрический | рубанок купить | соус sriracha купить | острые соусы купить | цветы с доставкой

Data mining with XPath


At the recent CASA Hackathon I was part of a team that developed a Social Radar (originally forecast to be a Google Glass app), retrieving data from Twitter and producing maps led by Steven Gray. Steve has produced a ergonomic and very useful presentation on how to retrieve these feeds here.

While working on this project I thought that it would be useful for the radar to display what events are currently on in the city.

One useful way to retrieve data when there is no API, or you have no time to learn an API, is by using XPath.

In my most simple definition, XPath is a way to retrieve specified nodes within XML (or simply HTML) pages.

For example, if you have the following code:

< div class = "name" >
< firstname > John < / firstname >
< surname > Smith < /surname
< div >

You could query the ‘firstname’ element and it would return ‘John’.

One can write a script in a language such as Python to automatically retrieve data from webpages, to either display it live or collect it over time.

Google Drive have made it very simple to undertake such tasks, and here I will post a short tutorial.

Using Google Drive

1. Create a Spreadsheet in Google Drive
2. In Sheet1 fill the first two column headings (A1, B1) with ‘Source’ and ‘XPath’ respectively.
3. In the Source column (B2), put in the name of a website with some data you would like to retrieve (e.g. TimeOut Events : http://www.timeout.com/london/search?_source=global&profile=london&_dd=&page_zone=events&keyword=&section=events&on=today&locationText=&_section_search=events )
4. In the XPath column, place the XPath (e.g. //div[@class=’topSection’]/h3 ). This is the hardest bit – reading this might help.
5. Create Sheet2, and use the importxml function to run the XPath query e.g. ( importxml(Sheet1!A2,Sheet1!B2)
6. Once you have tried a few of these feeds you can publish the document to the web, or geocode the results if you have chosen a dataset with addresses (such as this TimeOut dataset). Geo for Google Docs is quite useful for this, particularly if you use TileMill/Mapbox. Google Fusion tables also has good mapping capabilities.

Click here for an example of one of these documents.

Using Python

One of the limitations of the method mentioned above is it is limited to a maximum of 50 XPath queries (last time I checked) per document. If you are interested in harvesting a large dataset (for example, parsing through a real estate website for all houses currently on sale in a city, or collecting them continuously over time) one can utilise XPath in Python. The following script was produced at the event to retrieve data from TimeOut, clean the data, Geocode it and place it in a CSV file.

import lxml.html
import urllib
import urllib2
import pprint
import json
import csv
invalid = 0
with open('timeout-london.csv', 'wb') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=',',
quotechar='"', quoting=csv.QUOTE_MINIMAL)
print 'title;time;venue;location;lat;lon'
while invalid < 5:
for num in range(1,500):
s1 = "http://www.timeout.com/london/search?"
s2 = "language=en_GB&profile=london&order=popularity&page="+str(num)+"&page_size=1&source=admin&type=event&on=today&_section_search=&section="
s3 = str(s1+s2)
page = urllib.urlopen(s3)
doc = lxml.html.document_fromstring(page.read())
xme = doc.xpath("//h3/a")
xmf = doc.xpath("//h4")
title = xme[0].text_content()
location = xmf[0].text_content()
breakdown = xmf[1].text_content().split('|')
line = breakdown
time = line[0]
loc_name = line[1]
loc_area = line[2]
loc_area = 'London'
time = " ".join(time.split())
loc_name = " ".join(loc_name.split())
loc_area = " ".join(loc_area.split())
locstr = loc_name+', '+loc_area+', '+'London'
add = str(locstr)
add = urllib2.quote(add)
geocode_url = "http://maps.googleapis.com/maps/api/geocode/json?address=%s&sensor=false&region=uk" % add
req = urllib2.urlopen(geocode_url)
jsonResponse = json.loads(req.read())
x = json.dumps([s['geometry']['location']['lat'] for s in jsonResponse['results']], indent=0)
x = x.strip('[')
x = x.strip(']')
y = x.strip("'")
x = x.split()
x = x[0]
x = x.strip('[')
x = x.strip(']')
x = x.strip(',')
y = json.dumps([s['geometry']['location']['lng'] for s in jsonResponse['results']], indent=0)
y = y.strip('[')
y = y.strip(']')
y = y.strip("'")
y = y.split()
y = y[0]
y = y.strip('[')
y = y.strip(']')
y = y.strip(',')
print title+';'+time+';'+loc_name+';'+loc_area+';'+x+';'+y
invalid +=1

The output looks something like this, with lat/long columns suitable for mapping.

New Year's Eve 2013 Firework Display;Tue Dec 31;EDF Energy London Eye;Waterloo;51.5033;-0.11475
Christmas at Kew Gardens 2013;Until Sat Jan 4 2014;Kew Gardens;Kew, Surrey;51.4782342;-0.2984129
Carnaby Christmas 2013: The Year of the Robin;Until Mon Jan 6 2014;Carnaby Street;Soho;51.5148445;-0.1413416
The Book of Mormon;Until Sat Apr 5 2014;Prince of Wales Theatre;Leicester Square;51.51121389999999;-0.1198244
Coriolanus;Until Sat Feb 8 2014;Donmar Warehouse;Leicester Square;51.51121389999999;-0.1198244
Mojo;Until Sat Feb 8 2014;Harold Pinter Theatre;Trafalgar Square;51.51121389999999;-0.1198244
Winter Lights at Canary Wharf 2013;Tue Dec 31 - Sat Jan 25 2014;Canary Wharf;Docklands;51.50755299999999;-0.024526

I hope this code and examples prove useful for anyone trying to retrieve data from difficult pages on the web where no feeds are available. If you would like to appropriate this code for other webpages, feel free to contact me or try yourself. Quick example:

(example Social Event map using code from this tutorial ; data: TimeOut London, map: MapBox)

Re.Work Cities

Today I was fortunate enough to have attended a RE.WORK Cities conference at the stunning Tobacco Dock in London.

It was a great opportunity to meet people from fields as diverse as architecture, economics, data security, programming, planning, social work all with a shared, positive interest to make our cities better.

The conference covered topics as diverse as urban mobility, the internet of things, sensors, urban interaction design, synthetic biology and 3D printing.

While there were many highlights there were some topics in particular that stood out.

1) Getting there & shared mobility

Part of the conference included a voucher to use Uber. Uber, at first glance seems like a regular taxi service where one rings up, ask for a taxi, the taxi comes and takes you from A to B. However, experiencing it first hand makes you realise the difference. The whole process was seamless.

One simply opens the app, location is automatically detected, fare is calculated if you enter destination and all nearby drivers are mapped. Once ‘Go’ was pressed, I only had to wait a few minutes until an SMS arrived informing me of the sleek, black car waiting outside. The journey was minimal fuss – just a confirmation of destinations and automatically charged upon arrival.

This experience set the tone for the rest of the day. From this well-integrated process it was clear that some of the topics mentioned, such as driverless cars incorporated with car-sharing schemes, are not impossible future scernaios and are quite achievable with only minor modifications to this user-friendly, yet clearly comprehensive dispatch system offered by this application.

2) Heal-able materials

Erik Schlangen‘s self-healing asphalt and concrete ; not only reducing maintenance cost and time, but implemented on a much larger scale reducing transport disruptions and reduce cost in repairing buildings (such as affordable housing).


Source : BBC

3) Drones

Earlier this year I was exposed to drone delivery through this news article.

China grounds world’s first CAKE DRONES over fears they might fall on someone’s head as novelty delivery service goes from sweet to sour

While this use of drones was a bit tongue-in-cheek, drones were brought up a few times in the conference.

For example for the creation of parametric structures and for assisted navigation.

4) Changing nature of work / Future Londoners


The globalisation and Internet are changing the traditional notions of work and work hours. Work-life patterns changing and no longer fitting in the same paradigms. The Future Londoners project, for example, imagines some of these future citizens and how they live and work.

5) New approaches to sustainability

Collecting the carbon pollution and using it as a resource for cities – creating structures from thin air.

There were many more new ideas and projects that I would like to write about in future – hopefully a broadcast of the event comes out to be shared.

Introductory Post

City Informatics, this blog, will share research and thoughts on many fields – from urban planning, design and architecture to human-computer interaction, spatial information and analysis to social media and transport systems. While these are diverse topics this blog will draw upon their intersections – at people, place and technology.

City planning as an interest developed through studying where I grew up, Melbourne. Melbourne is an interesting case ; the world continuously anoints it as one of the most liveable cities ; yet the public ceaselessly criticizes all aspects of its transport infrastructure.

I have studied Urban Design & Planning and Informatics at the University of Melbourne. Through this I developed a strong interest in solving urban problems – food security, climate change, overpopulation with a continuous lens towards sustainability. The interaction between people and technology also drives me – particularly the information that is broadcast from these interactions that decision-makers can use to manage urban problems, as well as that smart citizens can use to solve them individually and collectively.

I now live and study in one of the world’s most global cities, London, working towards a Master of Research in Advanced Spatial Analysis and Visualisation. This is at CASA – an innovative and exciting lab, combining the skills of computer scientists, mathematicians, urban planners and architects to using and developing latest visualisation and modelling technologies to provide knowledge and insights to tackling these urban issues.

For example, this data visualisation of train journeys in London, Pulse of the City from Jon Reades on Vimeo.

Through this blog I would like to share and discuss my thoughts on this field with readers, as well as use as a platform to share and discuss others’ thoughts/work that I find interesting or useful. Where possible I would like to also provide tips for undertaking this kind of work, as many times I have found very valuable information on blogs .. for free!

If you would like to get in contact, please email me at oliverclock@gmail.com or tweet me at @oc_lock.