What are the best practices for combining analyzers in Lucene? -
i have situation i'm using standardanalyzer in lucene index text strings follows:
public void indextext(string suffix, boolean includestopwords) { standardanalyzer analyzer = null; if (includestopwords) { analyzer = new standardanalyzer(version.lucene_30); } else { // stop_words exclude them. set<string> stopwords = (set<string>) stop_word_listener.getstopwords(); analyzer = new standardanalyzer(version.lucene_30, stopwords); } try { // index text. directory index = new ramdirectory(); indexwriter w = new indexwriter(index, analyzer, true, indexwriter.maxfieldlength.unlimited); this.addtexttoindex(w, this.gettexttoindex()); w.close(); // read index. indexreader ir = indexreader.open(index); text_termvectormapper ttvm = new text_termvectormapper(); int docid = 0; ir.gettermfreqvector(docid, propertiesfile.getproperty(text), ttvm); // set output. this.setwordfrequencies(ttvm.getwordfrequencies()); w.close(); } catch(exception ex) { logger.error("error message\n", ex); } } private void addtexttoindex(indexwriter w, string value) throws ioexception { document doc = new document(); doc.add(new field(text), value, field.store.yes, field.index.analyzed, field.termvector.yes)); w.adddocument(doc); }
which works combine stemming using snowballanalyzer well.
this class has 2 instance variables shown in constructor below:
public text_indexer(string texttoindex) { this.texttoindex = texttoindex; this.wordfrequencies = new hashmap<string, integer>(); }
can tell me how best achieve code above?
thanks
mr morgan.
lucene provides org.apache.lucene.analysis.analyzer
base class can used if want write own analyzer.
can check out org.apache.lucene.analysis.standard.standardanalyzer
class extends analyzer.
then, in youranalyzer, you'll chain standardanalyzer , snowballanalyzer using filters analyzers use, this:
tokenstream result = new standardfilter(tokenstream); result = new snowballfilter(result, stopset);
then, in existing code, you'll able construct indexwriter own analyzer implementation chains standard , snowball filters.
totally off-topic:
suppose you'll need setup custom way of handling requests. implemented inside solr.
first write own search component extending searchcomponent , defining in solrconfig.xml, this:
<searchcomponent name="yourquerycomponent" class="org.apache.solr.handler.component.yourquerycomponent"/>
then write search handler (request handler) extending searchhandler, , define in solrconfig.xml:
<requesthandler name="yourrequesthandlername" class="org.apache.solr.handler.component.yourrequesthandler" default="true"> <!-- default values query parameters --> <lst name="defaults"> <str name="echoparams">explicit</str> <int name="rows">1000</int> <str name="fl">*</str> <str name="version">2.1</str> </lst> <arr name="components"> <str>yourquerycomponent</str> <str>facet</str> <str>mlt</str> <str>highlight</str> <str>stats</str> <str>debug</str> </arr> </requesthandler>
then, when send url query solr, include additional parameter qt=yourrequesthandlername, result in request handler being used request.
Comments
Post a Comment