Lazy Mineral Mashed Leaves
What would you do if your boss told you to produce web pages for 1,500 television and radio programs each day, in multiple languages and character sets, with a staff of a handful of people? What if you needed to publish web content for every band and the songs they record, updated each day? How about web pages for each ani- mal species and its habitat, inclusive of its endangerment status, when your organi- zation doesn’t have that information? The development team at the British Broadcasting Corporation (BBC) faced all three challenges at once during a period of budget cuts. Very soon we’ll show you how they solved all three using Linked Data. Linked Data makes the World Wide Web into a global database that we call the Web of Data. Developers can query Linked Data from multiple sources at once and combine it on the fly, something difficult or impossible to do with traditional data- management technologies. Imagine being able to gather any data you require in a
single step! Linked Data can get you there. We know this may seem impossible, and it is with traditional techniques, but we’ll demonstrate how it works. In this chapter, assuming that you have a basic familiarity with fundamental web technologies such as HTML, URIs, and HTTP, we introduce you to Linked Data, place it in context, outline its principles, and show you how to use it by walking you through your first Linked Data application. We may reference resources that you don’t instantly recognize, such as MusicBrainz—the open music encyclopedia. Don’t worry. We provide URLs to help you gather the context you’ll need to be productive. Linked Data defined The World Wide Web is full of data. Data is published in formats such as PDF, TIFF, CSV, Excel spreadsheets, embedded tables in Word documents, and many forms of plain text. These files are linked to and from HTML and other documents. They are, in a sense, data that you can link to. But this kind of data has a limitation: it’s format- ted for human consumption. It often requires a specialized utility to read it. It’s not easy for automated processes to access, search, or reuse this data. Further processing by people is generally required for this data to be incorporated into new projects or allow someone to base decisions on it. We’d rather have a universal way for anyone to read and reuse data on the Web. You don’t want to just link to the files that data comes in; you want data that you can link into. You want your data to link to related data. You want to foster reuse between people who may never meet. This book will introduce you to a new way to consume, reuse, and publish data on the Web so that it may be reused by automated processes on either side of enterprise firewalls. The way to do this is called Linked Data. The term Linked Data refers to a set of best practices for publishing and connecting structured data on the Web using international standards of the World Wide Web Consortium. You already know some of the techniques we use for Linked Data because you understand HTTP, URIs, and hyperlinks. You want to publish data on the Web, and you use URIs to identify data elements and the relationships between them. You can use those URIs to hyperlink between data elements the way web pages are hyperlinked. Linked Data is just data but it’s on the Web and structured the way the Web is struc- tured. These ideas are collected into the Linked Data principles, described in more detail in section 1.4. The more principles you adhere to, the better your Linked Data. The 5-Star scor- ing system of Linked Data is: The 5-Star system is cumulative. Each additional star presumes that your data meets the criteria for the previous steps. Linked Data developers pride themselves on creat- ing 5-Star Linked Data. Anything less gives you more work to do, perhaps in convert- ing data into 5-Star format, creating additional links, or trying to convince your data sources to create better data. By creating 5-Star Linked Data, you’re making the world a nicer place. The World Wide Web Consortium (W3C) defines standards for the Web, including an open data model and several formats for that model. This chapter will introduce you to the Resource Description Framework (RDF), which is used for the best quality Linked Data. A paraphrasing of the 5-Star system is given on the W3C coffee mug shown in fig- ure 1.1. Linked Data (pardon the repeated defi- nition) is a set of techniques for the publica- tion of data on the Web using standard formats and interfaces. We also call data that conforms to those techniques Linked Data. For example, much of the content in Wiki- pedia can be thought of as structured data. A Wikipedia article may have a box in the upper right of the page with information like names, dates, places, and links to other content. The Figure 1.1 The 5-Star Linked Data mug, avail- able from cafepress.com. The mug may be or- DBpedia project (http://dbpedia.org) dered either with the “Open” and “Open extracts this structured data from Wikipedia license” labels or without them; Linked Open articles and puts it on the Web. Once the data Data uses both, Linked Data uses neither. is published in accordance with the Linked Sales of the mugs benefit the W3C. Data principles, it is Linked Data and may be used by others who have access to the data.
Another useful feature of Linked Data is that it’s self-documenting. You can imme- diately figure out what a term means by resolving it on the Web. This makes Linked Data a wonderful new technique for data sharing. So, imagine that you’re gathering data from a variety of sources to perform an analysis or make a mash-up. You could grab data from DBpedia and other Linked Data from elsewhere on the Web and throw it all together to make the data set you want. The rest of this chapter presents programming ideas from the public Web, but don’t let that convince you that all Linked Data needs to be public. Linked Data tech- niques are also widely deployed behind enterprise firewalls on private networks. Everything you learn in this book can be deployed with public or private data or a mix of the two. 2 What Linked Data won’t do for you You might be wondering whether Linked Data is too good to be true. It’s not. Linked Data is built on the Web