Since 2011, HathiTrust Research Center has been developing tools to utilize text and data mining methodologies using the HathiTrust collection. Until now, this service has been available mostly in the portion of the collection that is out of copyright, with copyrighted item access restricted. Now, with the development of a landmark HathiTrust policy and an updated release of HTRC Analytics, the complete 16.7-million-item HathiTrust corpus is available for non-consumptive research, such as data mining and computational analysis, including items protected by copyright.
This work has been several years in the making, realizing a primary goal of HathiTrust to facilitate the widest possible lawful research and educational uses of the HathiTrust collection.
For more details on this milestone, read the complete HathiTrust blog post here.