Exploring Arlington's Bikeometer Data
Arlington, Virginia has this great Bikeometer on the Custis Trail in Rosslyn. This tall digital sign always catches my attention when I ride by it. It displays the number of bicycles that pass that sign daily, monthly and year to date. I discovered that this data is also available online. I decided to use my Python skills to import this data, clean it up, and do some basic analysis.
Accoring to Arlington's website, the Rosslyn Bikeometer, the first Eco-Totem on the east coast, launched April 1, 2014. Westbound Lee Highway near N Lynn Street, Rosslyn. Bicycle-only detection and display is in real time with direction detection, four pavement loops. You can find out more about it at http://www.bikearlington.com/pages/biking-in-arlington/counting-bikes-to-plan-for-bikes/about-the-counters/.
My first step in this project was to start this Python 3.5 Jupyter Notebook. Then I import the libraries I plan to use for this project. They are as follows:
import pandas as pd
import requests
from xml.etree import ElementTree
import numpy as np
Then I created an API query to get the Rosslyn Bikeometer (counter number 28) from April 1, 2014 to January 1, 2017 for bicycles in the inbound direction. For this first look at the data, I decided to only pull the daily totals. Using the url link, the following code uses Requests to get the url and save it to the local file, xml_data.xml. Arlington provides the details of their API at http://www.bikearlington.com/tasks/sites/bike/assets/File/Regional_bikearlington_webservices.pdf.
url = "http://webservices.commuterpage.com/counters.cfc?wsdl&method=GetCountInDateRange&counterid=28&startDate=04/01/2014&endDate=01/01/2017&direction=I&mode=B&interval=d"
xmlfile = open('xml_data.xml', 'w')
xmldata = requests.get(url)
xmlfile.write(xmldata.text)
xmlfile.close()
xml_data = 'xml_data.xml'
I took the XML data file and parsed it with ElementTree.
document = ElementTree.parse(xml_data)
After parsing the XML file, I pulled out the "count" data using document.find().
cnt = document.find('count')
Then you can extract the data using the attributes for count. These include count, date, direction, and mode. For now, I only need count and date.
cnt.attrib['count']
cnt.attrib['date']
cnt.attrib['direction']
date = []
count = []
for day in document.findall('count'):
date.append(day.attrib['date'])
count.append(day.attrib['count'])
dfbikes = pd.DataFrame({'date' : date, 'count': count})
dfbikes['date'] = pd.to_datetime(dfbikes.date)
dfbikes.head()
The above dataframe can then be merged with other data for each of the dates listed. You can speed up this process by making the date field the index.
dfbikes = dfbikes.set_index("date")
dfbikes.head()
Adding in Weather Data¶
People typically consider weather conditions before deciding to take a bike trip. Thankfully, Arlington County also provides easy access to their local weather data via API. The API details are provided below.
Source: http://webservices.commuterpage.com/weatherdata.cfc?wsdl&method=GetCodes
<weather_codes>
<code name="dewpoint" binary="No" cumulative="No" units="°F"/>
<code name="fog" binary="Yes" cumulative="No" units=""/>
<code name="hail" binary="Yes" cumulative="No" units=""/>
<code name="rain" binary="Yes" cumulative="No" units=""/>
<code name="snow" binary="Yes" cumulative="No" units=""/>
<code name="thunder" binary="Yes" cumulative="No" units=""/>
<code name="tornado" binary="Yes" cumulative="No" units=""/>
<code name="humidity" binary="No" cumulative="No" units="%"/>
<code name="pressure" binary="No" cumulative="No" units="inHg"/>
<code name="temperature" binary="No" cumulative="No" units="°F"/>
<code name="visibility" binary="No" cumulative="No" units="miles"/>
<code name="windspeed" binary="No" cumulative="No" units="mph"/>
<code name="precipitation" binary="No" cumulative="Yes" units="inches"/>
</weather_codes>
Description: Returns the weather data based on the requested weather code and the date. Request Fields: • dates: string value, required field. Is a range of dates to retrieve the weather data, acceptable format yyyymmdd-yyyymmdd • code: string value, required field. Supply the available weather codes from the getcodes response. • function: string value, optional. Function to perform on the data (avg, sum, min, max); defaults to avg. • hours: string value, optional. Hours in 24-hour format (0-23); may be commadelimited list, hyphen-separated range, or special keywords "all" or "daytime" • dayLimiter: numeric value, optional. Day of the week. Sun to Sat – 1 to 7 respectively, weekends – 8, sat & sun – 9, All - 0. Defaults to All - 0. Example: http://webservices.commuterpage.com/weatherdata.cfc?wsdl&method=GetData&dates=20110301-20110305&code=temperature
weatherapi_url = "http://webservices.commuterpage.com/weatherdata.cfc?wsdl&method=GetData&dates=20140401-20170101&code=temperature"
xmlfile = open('xml_temps.xml', 'w')
xmldata = requests.get(weatherapi_url)
xmlfile.write(xmldata.text)
xmlfile.close()
xml_temps = 'xml_temps.xml'
document = ElementTree.parse(xml_temps)
date = []
temp = []
for day in document.findall('dataset/date'):
date.append(day.attrib['value'])
for day in document.findall('dataset/date/data'):
temp.append(day.attrib['value'])
dftemperature = pd.DataFrame({'date' : date, 'temperature' : temp})
dftemperature['date'] = pd.to_datetime(dftemperature.date)
dftemperature = dftemperature.set_index("date")
dftemperature.head()
pd.concat([dfbikes, dftemperature], axis=1).head()
Add in wind and precipitation¶
weatherapi_url = "http://webservices.commuterpage.com/weatherdata.cfc?wsdl&method=GetData&dates=20140401-20170101&code=windspeed"
xmlfile = open('xml_wind.xml', 'w')
xmldata = requests.get(weatherapi_url)
xmlfile.write(xmldata.text)
xmlfile.close()
xml_wind = 'xml_wind.xml'
document = ElementTree.parse(xml_wind)
date = []
wind = []
for day in document.findall('dataset/date'):
date.append(day.attrib['value'])
for day in document.findall('dataset/date/data'):
wind.append(day.attrib['value'])
dfwind = pd.DataFrame({'date' : date, 'windspeed' : wind})
dfwind['date'] = pd.to_datetime(dfwind.date)
dfwind = dfwind.set_index("date")
dfwind.head()
weatherapi_url = "http://webservices.commuterpage.com/weatherdata.cfc?wsdl&method=GetData&dates=20140401-20170101&code=precipitation"
xmlfile = open('xml_precip.xml', 'w')
xmldata = requests.get(weatherapi_url)
xmlfile.write(xmldata.text)
xmlfile.close()
xml_precip = 'xml_precip.xml'
document = ElementTree.parse(xml_precip)
date = []
precip = []
for day in document.findall('dataset/date'):
date.append(day.attrib['value'])
for day in document.findall('dataset/date/data'):
if day.attrib['value'] != 'T':
precip.append(day.attrib['value'])
else:
precip.append(0.0)
dfprecip = pd.DataFrame({'date' : date, 'precipitation' : precip})
dfprecip['date'] = pd.to_datetime(dfprecip.date)
dfprecip = dfprecip.set_index("date")
dfprecip.head()
bikedataframe = pd.concat([dfbikes, dftemperature, dfwind, dfprecip], axis=1)
bikedataframe.head()
bikedataframe.index[0].weekday_name
bikedataframe["dayofweek"] = bikedataframe.index.dayofweek
bikedataframe = bikedataframe.astype(float)
bikedataframe["weekday"] = 0
bikedataframe['weekday'] = np.where(bikedataframe['dayofweek'] < 5, 1, 0)
bikedataframe.head(14)
bikedataframe.mean()
As you can see from the mean data above, the average number of inbound bike trips per day was 426. It would then be interesting to see which days of the week had higher or lower than average bike usage.
bikedataframe.groupby('dayofweek').mean()
It looks like Monday (dayofweek = 0) through Thursday (dayofweek = 4) has higher than average usage and Saturday has the lowest average number of bike trips. It seems there might be a significant difference between weekday and weekend bike usage on the Custis Trail.
week_versus_end = bikedataframe.groupby('weekday')
week_versus_end['count'].mean()
The average daily weekday number of inbound bike trips was 458 versus just 347 per day on the weekends.
week_versus_end.min()
week_versus_end.max()
%matplotlib inline
import matplotlib.pyplot as plt
plt.figure()
bp = week_versus_end.boxplot(column = 'count', return_type='axes')
The boxplot of the left clearly shows that on average there are fewer bike trips on the weekend days versus weekdays. However, there are some weekend day outliers that are almost just as high as the weekday extreme usage days. There seems to be a daily maximum capacity for the trail around 1,000 inbound trips. It makes me think some of these extremely high outliers are associated with events.
My next steps will be to explore these extreme outliers, dig further into the temperature, wind and precipitation data. Ultimately, I want to build a model that can predict daily bike trips on this part of the Custis Trail.
Comments
Comments powered by Disqus