To start hacking data and metadata (that’s data about data – now we’re getting meta), we’re going to start with isolating files on the web.
You might be familiar with file paths – the trail to a file’s location. They’re also called “directories.” If you aren’t familiar with these terms, it’s easiest to think of them as folders on your computer.
In this example, I’m highlighting a file – a PDF – called “trailMap.” This file is in my “NICAR pres” folder, in my Desktop folder, and so on. You can see the file path at the bottom of the window, tracing the file all the way back to “Macintosh HD”, which is my hard drive. So writing out its file path, or directory, would be:
Macintosh HD/Users/samanthasunne/Desktop/NICAR pres/trailMap.pdf
Files on the web follow the same pattern. In fact, that’s what URLs are. Let’s find the URL (a.k.a., web address, file path, directory) of this image:
To find the file path, right-click on the picture and select “Open Image in New Tab.” Then we look at the URL in the address bar:
What do we learn from this web address? Let’s read from right to left:
- File type. This file ends in .jpg, meaning it’s a jpeg, a type of image file. If it were a video file, it might be an .mov or a .mp4; if it were audio, it might be a .wav or an .mp3. If you aren’t familiar with the file type at the end of the file path, just google it.
- File name. Whoever uploaded this photo (me) named it “capitol-300×237.” We can guess it’s a photo of a capitol with a size of 300 pixels by 237 pixels.
- File path. This text: “wp-content/uploads/2012/07/” indicates the series of folders that contain the file. It looks like it was uploaded in February 2014.
- Content management system. The umbrella for all these folders – “www.samanthasunne.com” – is pretty simple. Others can be more complicated. Facebook images, for example, are stored in “scontent-b-iad.xx.fbcdn.net.” Even then, the “fb” in there hints that the content management system is related to Facebook. The “wp-” at the beginning of my content folder gives us a hint that I’m using WordPress’s content management system. A content management system is merely what it sounds like – a program that helps people organize and publish content.
So we’ve learned a little bit about this photo we just found on the web. We know what kind of file it is, what it’s called, how big it is, when it was uploaded, what folders it’s in and what CMS the uploader was using.
Perhaps most importantly, we can now download the image and have it for our own. Go back to where the image is open in a new tab, and select File > Save Page As. Once you’ve downloaded the capitol pic to your computer, we can move on to STEP TWO.