What are the best practices for combining analyzers in Lucene? -

- July 15, 2012

i have situation i'm using standardanalyzer in lucene index text strings follows:

public void indextext(string suffix, boolean includestopwords)  {             standardanalyzer analyzer = null;       if (includestopwords) {         analyzer = new standardanalyzer(version.lucene_30);     }     else {          // stop_words exclude them.         set<string> stopwords = (set<string>) stop_word_listener.getstopwords();               analyzer = new standardanalyzer(version.lucene_30, stopwords);     }      try {          // index text.         directory index = new ramdirectory();         indexwriter w = new indexwriter(index, analyzer, true, indexwriter.maxfieldlength.unlimited);                     this.addtexttoindex(w, this.gettexttoindex());         w.close();          // read index.         indexreader ir = indexreader.open(index);         text_termvectormapper ttvm = new text_termvectormapper();          int docid = 0;          ir.gettermfreqvector(docid, propertiesfile.getproperty(text), ttvm);          // set output.         this.setwordfrequencies(ttvm.getwordfrequencies());         w.close();     }     catch(exception ex) {         logger.error("error message\n", ex);     } }  private void addtexttoindex(indexwriter w, string value) throws ioexception {     document doc = new document();     doc.add(new field(text), value, field.store.yes, field.index.analyzed, field.termvector.yes));     w.adddocument(doc); }

which works combine stemming using snowballanalyzer well.

this class has 2 instance variables shown in constructor below:

public text_indexer(string texttoindex) {     this.texttoindex = texttoindex;     this.wordfrequencies = new hashmap<string, integer>(); }

can tell me how best achieve code above?

thanks

mr morgan.

lucene provides org.apache.lucene.analysis.analyzer base class can used if want write own analyzer.
can check out org.apache.lucene.analysis.standard.standardanalyzer class extends analyzer.

then, in youranalyzer, you'll chain standardanalyzer , snowballanalyzer using filters analyzers use, this:

tokenstream result = new standardfilter(tokenstream); result = new snowballfilter(result, stopset);

then, in existing code, you'll able construct indexwriter own analyzer implementation chains standard , snowball filters.

totally off-topic:
suppose you'll need setup custom way of handling requests. implemented inside solr.

first write own search component extending searchcomponent , defining in solrconfig.xml, this:

<searchcomponent name="yourquerycomponent" class="org.apache.solr.handler.component.yourquerycomponent"/>

then write search handler (request handler) extending searchhandler, , define in solrconfig.xml:

  <requesthandler name="yourrequesthandlername" class="org.apache.solr.handler.component.yourrequesthandler" default="true">     <!-- default values query parameters -->         <lst name="defaults">             <str name="echoparams">explicit</str>                    <int name="rows">1000</int>             <str name="fl">*</str>             <str name="version">2.1</str>         </lst>          <arr name="components">             <str>yourquerycomponent</str>             <str>facet</str>             <str>mlt</str>             <str>highlight</str>                         <str>stats</str>             <str>debug</str>          </arr>    </requesthandler>

then, when send url query solr, include additional parameter qt=yourrequesthandlername, result in request handler being used request.

more searchcomponents.
more requesthandlers.

Search This Blog

JNI

What are the best practices for combining analyzers in Lucene? -

Comments

Post a Comment

Popular posts from this blog

razor - Is this a bug in WebMatrix PageData? -

c# - How to set Z index when using WPF DrawingContext? -

visual c++ - Using relative values in array sorting ( asm ) -