Bixo is an open source web mining toolkit that runs as a series of Cascading pipes on top of Hadoop. By building a customized Cascading pipe assembly, you can quickly create specialized web mining applications that are optimized for a particular use case.

Take a look at the Getting Started page, and also the list of resources (mailing list, bug database, source code, etc)

Bixo is an open source project released under the MIT License.

Note: If you are looking for the Bi(x)o command line tool project, the home page is here.