In my last blog, I talked about the unlimited source of data you can gain access to through the technique of webscraping. But most of the time a lot of this data can be very unstructured and would require a good bit of clean-up time.
Alternatively, you could connect to free public Application Programming Interfaces (APIs) which also opens up all sorts of new data sources. Once you have a bit of understanding as to what an API is they aren’t necessarily that difficult to use, so it’s worth taking the time to learn! In this week’s blog, I’d like to show how you can connect to and download data from an API and parse through it using a couple of recent new tools I came across in Alteryx.
What is an API?
Basically, imagine you have Computer 1 and Computer 2. Computer 1 wants some data. Computer 2 has this data. What form of communication can they use to exchange this data? Through an API! Let’s explain this in a little more detail.
APIs work using ‘requests’ and ‘responses.’ It’s something you use to put in a GET request (some would say it is the request). The place that APIs send a request to and where the data resource lives, is called an endpoint which is essentially the end of Computer 2’s communication channel. Most of the time, an endpoint includes a URL of a server or service, otherwise better known as the web address. So when an API requests information from an endpoint URL of Computer 2 it will receive a ‘response’ which is the data that is being sent back to Computer 1.
After researching and reading up a little more about APIs, I have been curious on how to use them to gain access to archived data. After a quick little digging within the Alteryx and Information Lab community, I quickly found my answer!
For this example, I want to gain location coordinates for all of the train stations in Ireland. I can easily download this information by connecting to Irish Rail’s Realtime API. The information page to the various Irish Rail API URL endpoints can be found here:
Using Alteryx to Parse an XML file from Irish Rail Realtime API
For this example I just want to gather the location of all the train station coordinates in Dublin using the API below:
When we click the link above we get a “tree view” display of the XML file here. Although it may be a little hard to read, it does have some logical sense to it . A file with XML file extension is an Extensible Markup Language (XML) file. These are really just plain text files that use custom tags to describe the structure and other features of the document. Here we can see the tags wrapping each of the values we want highlighted in red such as station Station, latitude, longitude, station code and station id.
At first glance you might be frantically wondering what sort of complex regular expression (regex) code will I need to parse through all this data in Alteryx and have it amenable for analysis? Whilst applying Regex could be an option for parsing XML data into individual fields (which you can learn more about from Nikita’s video lesson here), we could also use a much simpler tool in Alteryx! That is, the XML parse tool.
Before we apply this tool, we must first read in the data into an Alteryx by dropping the URL link into a Text Input Tool and naming this field as URL for best naming practices. In addition, when you are reading in and downloading data from an API, it could be a looong process, depending on how quick the API response is. So whenever you are retrieving data from a web API it’s always a good idea to add in a Block Until Done Tool after the Text Input tool . This tool essentially ensures that the queried API data will only be parsed and further processed downstream until ALL of the records have been successfully read into the workflow.
Final Workflow and Output
To see how this overall process should appear, here’s an image of the final workflow below.
Below is the final output file containing the coordinates for all of the 157 train stations across Ireland.
A link to the csv. file of this data can be downloaded here if anyone is interested in doing some spatial analysis on the Irish Transportation system in Tableau! If you have any ideas we’d love for you to share it with us on Twitter.
See if you can find some interesting API data and play around with them in Alteryx!
Want to join us?
When we are hiring, we will post any recruiting news and event information on our blog. So keep your eyes out here.
The Information Lab Ireland is at the forefront of creating a data-driven culture in Ireland.
As part of its vision, The Information Lab Ireland regularly hosts free events throughout the country to show how being data-driven can improve decision making and lead to a better understanding of the world around us.