org.apache.nutch.crawl
Class Generator.CrawlDbUpdater

java.lang.Object
  extended by org.apache.hadoop.mapred.MapReduceBase
      extended by org.apache.nutch.crawl.Generator.CrawlDbUpdater
All Implemented Interfaces:
Closeable, org.apache.hadoop.mapred.JobConfigurable, org.apache.hadoop.mapred.Mapper<org.apache.hadoop.io.WritableComparable,org.apache.hadoop.io.Writable,org.apache.hadoop.io.Text,CrawlDatum>, org.apache.hadoop.mapred.Reducer<org.apache.hadoop.io.Text,CrawlDatum,org.apache.hadoop.io.Text,CrawlDatum>
Enclosing class:
Generator

public static class Generator.CrawlDbUpdater
extends org.apache.hadoop.mapred.MapReduceBase
implements org.apache.hadoop.mapred.Mapper<org.apache.hadoop.io.WritableComparable,org.apache.hadoop.io.Writable,org.apache.hadoop.io.Text,CrawlDatum>, org.apache.hadoop.mapred.Reducer<org.apache.hadoop.io.Text,CrawlDatum,org.apache.hadoop.io.Text,CrawlDatum>

Update the CrawlDB so that the next generate won't include the same URLs.


Constructor Summary
Generator.CrawlDbUpdater()
           
 
Method Summary
 void configure(org.apache.hadoop.mapred.JobConf job)
           
 void map(org.apache.hadoop.io.WritableComparable key, org.apache.hadoop.io.Writable value, org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.Text,CrawlDatum> output, org.apache.hadoop.mapred.Reporter reporter)
           
 void reduce(org.apache.hadoop.io.Text key, Iterator<CrawlDatum> values, org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.Text,CrawlDatum> output, org.apache.hadoop.mapred.Reporter reporter)
           
 
Methods inherited from class org.apache.hadoop.mapred.MapReduceBase
close
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface java.io.Closeable
close
 
Methods inherited from interface java.io.Closeable
close
 

Constructor Detail

Generator.CrawlDbUpdater

public Generator.CrawlDbUpdater()
Method Detail

configure

public void configure(org.apache.hadoop.mapred.JobConf job)
Specified by:
configure in interface org.apache.hadoop.mapred.JobConfigurable
Overrides:
configure in class org.apache.hadoop.mapred.MapReduceBase

map

public void map(org.apache.hadoop.io.WritableComparable key,
                org.apache.hadoop.io.Writable value,
                org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.Text,CrawlDatum> output,
                org.apache.hadoop.mapred.Reporter reporter)
         throws IOException
Specified by:
map in interface org.apache.hadoop.mapred.Mapper<org.apache.hadoop.io.WritableComparable,org.apache.hadoop.io.Writable,org.apache.hadoop.io.Text,CrawlDatum>
Throws:
IOException

reduce

public void reduce(org.apache.hadoop.io.Text key,
                   Iterator<CrawlDatum> values,
                   org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.Text,CrawlDatum> output,
                   org.apache.hadoop.mapred.Reporter reporter)
            throws IOException
Specified by:
reduce in interface org.apache.hadoop.mapred.Reducer<org.apache.hadoop.io.Text,CrawlDatum,org.apache.hadoop.io.Text,CrawlDatum>
Throws:
IOException


Copyright © 2006 The Apache Software Foundation