Oral interpretation and language teaching's Fan Box

Search This Blog

Sunday, January 02, 2011

Douglas Reed Cutting is an advocate and creator of open-source search technology



Douglas Reed Cutting is an advocate and creator of open-source search technology. He originated Lucene and, with Mike Cafarella, Nutch, both open-source search technology projects which are now managed through the Apache Software Foundation. Prior to developing Lucene, Doug held search technology positions at Excite, Apple Inc. and Xerox PARC. Lucene, a search indexer, and Nutch, a spider or crawler, are the two key components of an open-source general search platform, which first crawls the Web for content, and then structures it into a searchable index. Cutting's leadership of these two projects extended the concepts and capabilities of general open-source software projects such as Linux and MySQL into the important vertical domain of search. While it is difficult to track the total number of installations of these platforms, public announcements of the use of Lucene and its direct descendant Solr by various venture-backed startups indicate a significant level of adoption. Perhaps the most significant deployment of Lucene is Wikipedia, where it powers search for the entire site.[1]
In December 2004, Google Labs published a paper on the MapReduce algorithm, which allows very large scale computations to be trivially parallelized across large clusters of servers. Cutting, realizing the importance of this paper to extending Lucene into the realm of extremely large (web-scale) search problems, created the open-source Hadoop framework that allows applications based on the MapReduce paradigm to be run on large clusters of commodity hardware. Cutting was an employee of Yahoo!, where he led the Hadoop project full-time, he has since moved on to Cloudera.[2].
In July 2009, Doug Cutting was elected to the board of directors of the Apache Software Foundation.






Doug Cutting





Doug's Blog

Doug Cutting has been working in the field of information retrieval for over fifteen years.

Beginning in 1988, he spent five years at Xerox's Palo Alto Research Center (PARC) developing novel approaches to information access. These included a high-performance retrieval engine, several innovative search paradigms, advanced linguistic analysis methods, and high-quality text summarization algorithms. This work resulted in seven publications and six issued patents. Some of these technologies are now marketed by Inxight.

In 1993 he moved to Apple's Advanced Technology Group (ATG). There he developed a state-of-the-art retrieval engine code-named V-Twin. This engine was to be a part of the Copland operating system, automatically indexing the content of all files as they are created so that the the entire file system could be efficiently searched at any time. Copland was cancelled, but V-Twin has been used in several other Apple products.

In April of 1996, Doug left Apple and joined Excite where he took over development of the core search technology. This included growing Excite's web index from two million to fifty million pages; substantially optimizing Excite's search performance; adding phrase-searching capabilities; and creating a thesaurus-like feature which suggests related terms to add to queries.

In the fall of 1997 he reduced his commitment at Excite to part-time so that he could write Lucene, an efficient, full-featured text search engine written in Java. In early 1998 he returned to Excite full-time for two more years. Lucene sat on the shelf for much of that time, and was made open-source in the spring of 2000.

Doug now works as chief architect and president of Nutch, a nascent effort to implement an open-source web search engine, which aims to provide a transparent alternative to commercial web search engines. The specific purposes for which this corporation is organized are scientific and educational in nature: namely, to promote public access to search technology without commercial bias by:
* Providing free high-quality search software and its source code to the public; and
* Facilitating ongoing research and development of search technology in a public forum.

Doug also serves on Nutch's board of directors, together with Mitch Kapor, Tim O'Reilly, Peter Savich ( Overture Research), Raymie Stata (UCSC), and Graham Spencer ( Digital Consumer).

Geography Matters

blekko: how to slash the web

blekko: how to slash the web from blekko on Vimeo.