Q: Just how large is 170 billion tweets?
A: Enough to stump even the Library of Congress.
In a recent blog post, the Library of Congress recently announced that it had archived 170 billion tweets from the years 2006 to 2010 in its Twitter archive. Given Twitter’s reputation as the foremost pop culture diary, this archive represents a swath of information valuable to researchers of all kind. Sociologists, political scientists, and even nutritionists (admit it-you have posted about your lunch) can gain insight to broader trends in American society. Never before has so much candid, personal information been readily preserved and centralized, meaning future applications of this information could be both remarkable and even slightly embarrassing.
The Twitter archive remains off the shelves
However, the Twitter archive remains off the library shelves, so to speak. The Library of Congress noted that it processes 500 million tweets a day, meaning that the current archive of 170 billion tweets could double in size within a year. And, even though tweets may only be 140 characters long, making this Twitter archive both readily available and quickly searchable remains the primary challenge for the Library of Congress today. Simply put, the sheer size of the Twitter archive makes a search function painfully slow, especially by today’s standards. A query of the 2006-2010 archive can take upwards of a full day to complete. As anyone who has had to research anything can attest, the current solution is thus not a practical. So, even though there have been over 400 research requests, the Library of Congress has denied access to the Twitter archive until a workable solution is implemented
The Library of Congress’ information paradox
Thus, the Library of Congress’ current situation represents the “Information Paradox.” Simply, there is so much information available to researchers that it actually a detriment to their studies. The sheer size of the archive makes it nearly impossible to navigate, meaning that the information has essentially buried itself. Prioritizing a solution to this problem, the Library of Congress hopes to address this paradox and make the Twitter archive readily available. Because, if there is one thing libraries pride themselves on, it is their ability to support the distribution of information to the public.