1.5 The economic motivation of datafication/dataveillance: web 2.0, big data, alienated intelligence
¶ 1 Leave a comment on paragraph 1 0 As speculative as Kelly’s analysis is, it is important in that it recognizes the pivotal role that tracking plays in the development and use of networked digital technologies. Viktor Mayer-Schonberger and Kenneth Cukier describe this emerging paradigm of transforming activity, whether social, geographical, biological, economic and so forth, into quantifiable data for tracking, analyzing, and predicting, as “datafication.” Datafication has become such an important profit strategy for digital companies that Michael Palmer’s observation in 2006 that “data is the new oil” continues to be widely repeated. As media scholar Jose van Dijck observes, “The digital transformation of sociality spawned an industry that builds its prowess on the value of data and metadata—automated logs showing who communicated with whom, from which location, and for how long. Metadata—not too long ago considered worthless byproducts of platform-mediated services—have gradually been turned into treasured resources that can ostensibly be mined, enriched, and repurposed into precious products” (“Datafication” 199). Van Dijck notes that metadata has become a type of “currency used to pay for online services” that social media companies “monetize” by “repackaging and selling them to advertisers or data companies” (200-201).
¶ 2 Leave a comment on paragraph 2 0 The human use of data of course is nothing new as practices of collecting information such as manifest by libraries, censuses, surveys, and so forth date back thousands of years in some cases. Nor obviously, is data a new feature of computers or networks—the word was first used in a computational sense in 1946 (OED) and is an inherent feature of the modern computer. Palmer’s catch phrase however refers to the capitalistic use of a new modality of data, often referred to as “big data,” or an approach to knowledge production which makes predictions by finding correlations in data sets that are vastly larger than data sets than humans or even computers could handle before the 21st century. While scientists have long speculated on the research potential of analyzing massive data sets, it has only been recently that advancements in data storage and computing power, and the near-ubiquity of receptor devices have made big data a viable approach for scientific inquiry. Scientific processes that would take a decade in the nineties or even early 2000s—such as sequencing a genome or collecting astronomy data —now take only several days. Viktor Mayer-Schonberger and Kenneth Cukier cite one example in astronomy that gives a sense of the scale of these advancements:
¶ 3 Leave a comment on paragraph 3 0 When the Sloan Digital Sky Survey began in 2000, its telescope in New Mexico collected more data in its first few weeks than had been amassed in the entire history of astronomy. By 2010 the survey’s archive teemed with a whopping 140 terabytes of information. But a successor, the Large Synoptic Survey Telescope in Chile, due to come on stream in 2016, will acquire that quantity of data every five days. (7)
¶ 4 Leave a comment on paragraph 4 0 As evidenced by these early endeavors, the development of big data resources and practices was initially driven by the scientific community. However, as the business strategies of web companies shifted in the wake of the dot com crash, big data practices soon became one of the primary interests of Internet companies such as Google and Facebook who could leverage its techniques for selling targeted ads, as well as improving its own products to make them both more attractive to users and more strategic to the companies. The medium in which these activities were carried out also happened to be by far and away the most effective tool for gathering data, given that data regarding every single aspect of network activity can be easily collected without any required effort or consent on the part of users.
¶ 5 Leave a comment on paragraph 5 0 These strategies were perhaps most famously articulated by Tim O’Reilly in his 2005 post “What Is Web 2.0: Design Patterns and Business Models for the Next Generation of Software.” Tim O’Reilly has famously called “Web 2.0.” The burst of the dot com bubble in 2001 made it clear that web businesses would need to develop new strategies if they were to survive. O’Reilly popularized the concept of “Web 2.0” as a way to revive enthusiasm for the web and redirect its course with new business models and design patterns. Web 2.0 was unlike the previous generation of web businesses and technologies in numerous ways, but most importantly, it shifted its core means of value creation to the collection and use of user data. In some cases, that involved enabling users to have more input or flexibility with the code of the software, but of most importance was the passive collection of user data.. In an article detailing Web 2.0’s features, O’Reilly writes, “One of the key lessons of the Web 2.0 era is this: Users add value. But only a small percentage of users will go to the trouble of adding value to your application via explicit means. Therefore, Web 2.0 companies set inclusive defaults for aggregating user data and building value as a side-effect of ordinary use of the application. As noted above, they build systems that get better the more people use them.” O’Reilly also notes that “Real time monitoring of user behavior to see just which new features are used, and how they are used, thus becomes another required core competency (of Web 2.0 platforms).”
¶ 6 Leave a comment on paragraph 6 0 User data isn’t just collected for the purpose of improving superficial features of the platform, it constitutes the core value of the platform as it can be used to sell targeted ads and further develop business intelligence. O’Reilly observes, “The race is on to own certain classes of core data: location, identity, calendaring of public events, product identifiers and namespaces. In many cases, where there is significant cost to create the data, there may be an opportunity for an Intel Inside style play, with a single source for the data. In others, the winner will be the company that first reaches critical mass via user aggregation, and turns that aggregated data into a system service.” The economic incentive to capture user data is so great, that companies such as Google and Facebook provide their services for free, and continue to expand their free services into new domains such as higher education, workplace communications, event management, and so forth.
¶ 7 Leave a comment on paragraph 7 0 The ease in which networked digital activity tracked and analyzed contributed to the increasing sophistication of big data methods. Mayer-Schonberger and Cukier describe this general shift in valuing data as one in which “data was no longer regarded as static or stale, whose usefulness was finished once the purpose for which it was collected was achieved” (5). However, whereas the sciences began to understand the increased potential scientific value of reusing data, and the processes, protocols, and resources needed to do so, Internet companies recognized the data as a “raw material of business, a vital economic input, used to create a new form of economic value,” which could be “cleverly reused to become a fountain of innovation and new services” (5). Given the medium of Internet companies which enabled them to easily collect data on users, they were poised to easily take advantage of this type of research.
¶ 8 Leave a comment on paragraph 8 0 Mayer-Schonberger and Cukier write, “Because Internet companies could collect vast troves of data and had a burning financial incentive to make sense of them, they became the leading users of the latest processing technologies, superseding offline companies that had, in some cases, decades more experience.” Big data tactics, many made possible by everyday use of the web, have been used to generate insights on problems spanning across private and public sectors and nearly all areas of research. As one data evangelist notes:
¶ 9 Leave a comment on paragraph 9 0 For many companies, their data infrastructure is still a cost center nowadays and should become a profit center by using the data to improve everything, day by day. Companies must begin treating data as an enterprisewide corporate asset while also managing the data locally within business units.
¶ 10 Leave a comment on paragraph 10 0 This enables sharing of data about products and customers – which provides opportunities to up sell, cross sell, improve customer service and retention rates. By using internal data in combination with external data, there is a huge opportunity for every company in the world to create new products and services across lines of business. (Toonders)
¶ 11 Leave a comment on paragraph 11 0 Roger Clarke coined the term “dataveillance” in 1988 to describe the “the systematic use of personal data systems in the investigation or monitoring of the actions or communications of one or more persons.” As Clarke himself points out, neither surveillance in general nor consumer surveillance in particular was a new phenomenon. Big data enthusiasts, for example, are fond of referring to Richard Millar Devans’ use of the term “business intelligence” in 1865 to describe a banker’s practice of methodically collecting and analyzing information related to his business as a means to gain a competitive advantage. Bruce Schneier, a cryptographer and specialist in digital privacy, also reminds us that companies have been methodically tracking customer long before the arrival of the networked information economy, using tools such as loyalty cards and direct marketing, and consolidating customer data gleaned from credit bureaus and public records. However, Clarke points out that dataveillance is different in it enables organizations to cheaply and granularly track, analyze, cross check, and indefinitely store the entirety of user behavior, or all the activities they conduct in a computer environment that the organization controls. Van Dijck further points out that data generated by dataveillance can be easily stored for future analysis of undetermined purposes and merged with other data sets.
¶ 12 Leave a comment on paragraph 12 0 Dataveillance — the monitoring of citizens on the basis of their online data — differs from surveillance on at least one important account: whereas surveillance presumes monitoring for specific purposes, dataveillance entails the continuous tracking of (meta)data for unstated preset purposes. Therefore, dataveillance goes well beyond the proposition of scrutinizing individuals as it penetrates every fiber of the social fabric (Andrejevic 2012: 86).” (van Dijck “Datification” 205).
¶ 13 Leave a comment on paragraph 13 0 With the increase in both the general Internet user population and the value of their data in the early 2000s, dataveillance became a much more profitable enterprise than it was previously. Companies could use its techniques to monitor, study, and instrumentalize user activity with great ease and virtually no effort or conscious consent on the part of the user. John Cheney-Lippold describes how this dataveillance is used to produce “algorithmic identities” of users, that are then the product of this dataveillance as the “new algorithmic identity, an identity formation that works through mathematical algorithms to infer categories of identity on otherwise anonymous beings” (165). Algorithmic identity “uses statistical commonality models to determine one’s gender, class, or race in an automatic manner at the same time as it defines the actual meaning of gender, class, or race themselves. Ultimately, it moves the practice of identification into an entirely digital, and thus measurable, plane” (165). These identities are also subject to analysis by additional algorithms that attempt to “make sense of that data.”
¶ 14 Leave a comment on paragraph 14 0 Algorithmic identities, to speak in Cheney-Lippold’s language, are composed of both explicit data consciously produced by the user, as well as implicit data, that describes particular aspects of the user’s behavior when producing or interacting with explicit data and whose parameters are set by owner of the information environment she’s using, be it a platform, app, or webpage. Explicitly, a user produces data in the form of content she intends to communicate with others, such as posting on social media, uploading a video, sending an email, or creating a connection with another user through likes, follows, friending, favoriting and so forth. Implicitly, she also generates data related to the meta circumstances of producing or interacting with content—such as related to the device she is using, her geographical coordinates, the time of day, her browsing habits, and other circumstantial aspects of her activity ad environment.
¶ 15 Leave a comment on paragraph 15 0 It is important to note that the user need not actually produce content for implicit data to be collected. Engaging with these platforms in any way, sometimes even merely by having an app installed on one’s phone, enables these companies to collect data. Indeed, data pertaining to the content she reads, the time she takes to read it, the links she clicks, the character and activity of her network, or even her geographical activity while carrying a phone with an app installed, can all be collected by digital platforms and apps. Schneier gives examples of how some web platforms and smartphone apps collect these implicit forms of user data, sometimes even collecting data that extends far beyond the user’s interaction with the platform or app itself. In 2013, Jay Z and Samsung released an app package of his album Magna Carta Holy Grail that was essentially spyware in that it could view “all accounts on the phone, track the phone’s location, and track who the user was talking to on the phone” (X). Likewise, the popular Angry Birds mobile game app, first released in 2009, tracks the user’s location if when she is not playing the game. While the user may view the content of her explicit data as more representative of her character, both explicit and implicit forms of data are grouped together to form a profile of the user which can be analyzed in multiple ways to better predict user behavior.
¶ 16 Leave a comment on paragraph 16 0 The massive amounts of explicit and implicit data produced by users on networks constitutes a major portion of the intelligence of the networked world brain. This intelligence, however, can be said to be alienated from the majority of users who produce it, in that it becomes the private intelligence of corporations who use it further their own interests and arguably without public appreciation of the full scope of its instrumentalization. As van Dijck observes “It is far from transparent how Facebook and other platforms utilize their data to influence traffic and monetize engineered streams of information” (2013, 12). While not all types of web activity alienate users from the intelligence they are helping produce, the networked status quo is strongly characterized by an industry-wide trend to monetize user data. One would have to avoid the bulk of popular sites, services, software and even hardware to protect one’s data from being harvested by corporate entities. The monetization of user data constitutes the core business strategy of the top three websites in the world—Google, Youtube, and Facebook. It also serves one of the most pervasive business models of across all web platforms. A study done in 2013 discovered that 98% of all web platforms are for profit, with only 2% being nonprofit, and 88% of all web platforms relying on targeted advertising as a primary means of generating revenue (Jin 2013). For Google and Facebook, the first and third biggest Internet companies in terms of revenue, that revenue amounted respectively to $74.54 billion and 17.93 billion in 2015. These facts strongly suggest that the free availability of digital platforms and services that directly enable or indirectly support networked activity are in a large part reliant upon and shaped by companies whose key business goal is the collection and analysis of user data.