News from Jun 20, 2012
The Web Data Commons project extracts structured data from several billion HTML-pages and has performed a large-scale analysis about how structured data is embedded in Web pages today. We have used the Amazon Web Services infrastructure to procure the required computing resources for the project.
As the Web Data Commons project demonstrates how very large corpora of Web pages can be analyzed at very reasonable costs in the cloud, Amazon has invited Hannes Mühleisen to present the project at this year's Amazon Web Services Summit in Berlin. The slides from Hannes talk are available online.