de.fuberlin.wiwiss.marbles.loading
Class DereferencerBatch

java.lang.Object
  extended by de.fuberlin.wiwiss.marbles.loading.DereferencerBatch
All Implemented Interfaces:
DereferencingListener

public class DereferencerBatch
extends java.lang.Object
implements DereferencingListener

Starting with one URL, the DereferencerBatch handles the nested retrieval of data by following known predicates in retrieved data, and processing retrieval results with data providers.

Author:
Christian Becker

Constructor Summary
DereferencerBatch(CacheController cacheController, DereferencingTaskQueue uriQueue, java.util.Collection<DataProvider> dataProviders, org.openrdf.model.Resource mainResource, int maxSteps)
          Constructs a new DereferencerBatch
 
Method Summary
 void dereferenced(DereferencingResult result)
          Called by DereferencerThread once data has been retrieved.
 java.util.List<org.apache.commons.httpclient.URI> getRetrievedURLs()
           
 boolean hasPending()
          Determines whether any requests are pending
 boolean hasPending(int maxLevel)
          Determines whether requests are pending below a specified step level
 void loadURL(org.apache.commons.httpclient.URI url, int step, int redirectCount, boolean forceReload)
          Loads URL if not yet loaded
 void processLinks(int step, org.openrdf.model.Resource... contexts)
          Identifies known links from loaded data and submits them to loadURL(URI, int, int, boolean)
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DereferencerBatch

public DereferencerBatch(CacheController cacheController,
                         DereferencingTaskQueue uriQueue,
                         java.util.Collection<DataProvider> dataProviders,
                         org.openrdf.model.Resource mainResource,
                         int maxSteps)
Constructs a new DereferencerBatch

Parameters:
cacheController -
uriQueue -
dataProviders -
mainResource -
maxSteps -
Method Detail

loadURL

public void loadURL(org.apache.commons.httpclient.URI url,
                    int step,
                    int redirectCount,
                    boolean forceReload)
             throws org.apache.commons.httpclient.URIException
Loads URL if not yet loaded

Parameters:
url - The URL to load
step - The distance from the focal resource
redirectCount - The number of redirects performed in the course of this individual request
forceReload - Set this to true if the URL should be loaded even if a valid copy is already in the cache
Throws:
org.apache.commons.httpclient.URIException

hasPending

public boolean hasPending(int maxLevel)
Determines whether requests are pending below a specified step level

Parameters:
maxLevel - Maximum step level to consider
Returns:
true, if requests are pending

hasPending

public boolean hasPending()
Determines whether any requests are pending

Returns:
true, if requests are pending

dereferenced

public void dereferenced(DereferencingResult result)
Called by DereferencerThread once data has been retrieved. Handles insertion into cache, processes redirects, and initiates following of known links for the retrieved URL using processLinks(int, Resource...)

Specified by:
dereferenced in interface DereferencingListener

processLinks

public void processLinks(int step,
                         org.openrdf.model.Resource... contexts)
Identifies known links from loaded data and submits them to loadURL(URI, int, int, boolean)

Parameters:
step - Current step level
contexts - Contexts that are to be considered to find links

getRetrievedURLs

public java.util.List<org.apache.commons.httpclient.URI> getRetrievedURLs()