org.apache.nutch.crawl
Class CrawlDbFilter

java.lang.Object
  extended by org.apache.nutch.crawl.CrawlDbFilter
All Implemented Interfaces:
Closeable, org.apache.hadoop.mapred.JobConfigurable, org.apache.hadoop.mapred.Mapper<org.apache.hadoop.io.Text,CrawlDatum,org.apache.hadoop.io.Text,CrawlDatum>

public class CrawlDbFilter
extends Object
implements org.apache.hadoop.mapred.Mapper<org.apache.hadoop.io.Text,CrawlDatum,org.apache.hadoop.io.Text,CrawlDatum>

This class provides a way to separate the URL normalization and filtering steps from the rest of CrawlDb manipulation code.

Author:
Andrzej Bialecki

Field Summary
static org.apache.commons.logging.Log LOG
           
static String URL_FILTERING
           
static String URL_NORMALIZING
           
static String URL_NORMALIZING_SCOPE
           
 
Constructor Summary
CrawlDbFilter()
           
 
Method Summary
 void close()
           
 void configure(org.apache.hadoop.mapred.JobConf job)
           
 void map(org.apache.hadoop.io.Text key, CrawlDatum value, org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.Text,CrawlDatum> output, org.apache.hadoop.mapred.Reporter reporter)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

URL_FILTERING

public static final String URL_FILTERING
See Also:
Constant Field Values

URL_NORMALIZING

public static final String URL_NORMALIZING
See Also:
Constant Field Values

URL_NORMALIZING_SCOPE

public static final String URL_NORMALIZING_SCOPE
See Also:
Constant Field Values

LOG

public static final org.apache.commons.logging.Log LOG
Constructor Detail

CrawlDbFilter

public CrawlDbFilter()
Method Detail

configure

public void configure(org.apache.hadoop.mapred.JobConf job)
Specified by:
configure in interface org.apache.hadoop.mapred.JobConfigurable

close

public void close()
Specified by:
close in interface Closeable

map

public void map(org.apache.hadoop.io.Text key,
                CrawlDatum value,
                org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.Text,CrawlDatum> output,
                org.apache.hadoop.mapred.Reporter reporter)
         throws IOException
Specified by:
map in interface org.apache.hadoop.mapred.Mapper<org.apache.hadoop.io.Text,CrawlDatum,org.apache.hadoop.io.Text,CrawlDatum>
Throws:
IOException


Copyright © 2006 The Apache Software Foundation