//SearchToHTML copyright (c) 1999-2000 David Faden. All rights reserved.
//The applet and code are distributed as linkware...
//If you use this applet or a variant on its code,
//you must include a link to The Gilbert Post, 
//http://www.geocities.com/Athens/Parthenon/1911/
//on your site.
//
//The Gilbert Post and David Faden take no responsibility
//for anything bad that happens as a result of using this applet.
//Please send reports of problems to gilbertnews@hotmail.com, anyway, though.

import java.awt.*;
import java.applet.*;
import java.io.*;
import java.net.*;
import java.util.*;

//These classes are designed to work with Netscape 3.x+ and so use JDK 1.0
//SearchToHTML also requires that the user have JavaScript turned on.
//I suggest writing the applet tag with JavaScript.

//
// 5/31/1999 fixed a bug in showPage(URL url) (discovered by Dave Langers) that 
// caused the page to be loaded in both
// the target and top (default) windows
// Also, fixed a bug that might cause the progress bar to show 100% (after an errored 
// SearchThread called foundNoMatch(int i)) while files were still being searched.
//
// 6/2/1999 fixed a very stupid bug (discovered by Dave Langers)
// on my part that made it impossible for queries
// containing uppercase characters to ever be found -- the scanned lines were
// converted to lowercase while the keywords/phrases were not
//
// 6/4/1999 renamed to AdvSiteSearcher to differentiate from older SiteSearcher
// Plan to release one more version of SiteSearcher using SearchSieves
//
// 6/9/1999 made many major revisions: renamed SearchThread to DocSearcher because
// it is no longer a subclass of Thread (it instead implements Runnable), polished
// the use of SearchSieves, began tentative support for caching, added the ability
// to demand exact matches, and to ignore text in between lesser than and greater than
// signs (probably HTML)
//
// 6/11/1999 began work on a stripped down version of AdvSiteSearcher
// designed to be much smaller (hopefully a little faster) and to output HTML
// via JavaScript (but without using direct communication; no com.netscape.JSObject needed)
//
// 6/12/1999 renamed it to SearchToHTML...fixed a "bug" that caused the applet to 
// ignore the user defined height and width.
// Also finished converting AdvSiteSearcher to SearchToHTML, uncommented SearchSieve.reset(),
// Renamed the new specialized version of DocSearcher, HDocSearcher...added ability to capture
// the context of a match, the title (of an HTML doc), and the closest anchor to a match.
//
// 7/28/1999 added text changing parameters...mostly for internationalization...
// Also, gave up and cut out the code that was supposed to draw a componentless progressbar.
//
// 4/12/2000 fixed a "Y2K bug reported by several alert users...  I am not sure what 
// I was thinking when I wrote the portion of code calling Date.getYear()... Perhaps that it
// returns the decade? Anyway, in reality, getYear() returns the number of years
// since 1900. Files with modification dates beyond 1999 were listed with dates greater than
// 99 (100 for 2000).
// Note: the whole Date class is deprecated in JDK 1.1
// The code actually changed is found in HDocSearcher.java.
// 
// 4/12/2000 added code that causes the HDocSearcher's runner Thread to wait
// when it is not "doing anything." This should be more efficient than in the
// previous incarnation, where runner would sleep, then periodically wake up to 
// see if there was anything to search.
// 
// 5/3/2000 Patrick Fourneret reported that escape sequences were visible in
// the titles of his search results.  Originally, I actually considered this a
// feature - making sure that the title would display correctly on all browsers.
// After having this "feature" pointed out though, I see that it is actually a 
// problem. The fix was simple - I simply stopped calling makeHTMLSafe on the title.
// I also corrected the spelling of "exclude."  I fixed this several months ago,
// but apparently I ended up switching to an older code base somehow. 
//
// Jul 12 2000 I added a kludgy method to HDocSearcher which will
// finish extracting the title from a document even if a match is
// found within the title. I had been reminded of this behavior several
// times before, but it was Danny Narayan's complaint that spurred me to action.
// See HDocSearcher.finishTitle(StringBuffer).
//
// Jul 12 2000 changed the names of SearchToHTML's methods foundMatch and 
// foundTitle to receiveMatch and receiveTitle respectively. The former names
// seemed unfortunately confusing. Added a new method boolean hasTitle(int index)
// to SearchToHTML which returns whether the title of the document at index 
// has been found.
//
// Jul 12 2000 corrected an embarassing error in ReadMe.html - most of the
// text was recycled from SiteSearcher's ReadMe. Unfortunately, some portions
// of the text that don't apply to SearchToHTML made it through. 
//
// Jul 13 2000 I seem to be writing a lot broken sentences in this bug log.
// But that okay.
// Changed the name of the method "foundNoMatch" to "receiveNoMatch." Again,
// I think that the former name was misleading. Added two new parameters to
// deal with expanded context capabilities: leadingContextLength and
// trailingContextLength - leadingContextLength is very misleadingly named.
// I will probably change it tomorrow. The new parameters I was alluding to
// are "leadingcontextlength" and "trailingcontextlength". Not yet documented! 
// I added a new method to HDocSearcher.java: appendTrailingContext(StringBuffer)
// and changed HDocSearcher's constructor in connection with the new trailing context
// stuff.
//
// Jul 14 2000 Added two new parameters: "xhtml_chkbx_checked" and "exact_chkbx_checked"
// Setting each of these to true will initially "check" the corresponding checkbox
// in the applet's user interface. (This worked well with the LineSearcherApplet.)
// Sorry, I've forgotten who suggested this.
//
// Jul 15 2000 Fixed "bugs" in HDocSearcher.java that would cause an 
// ArrayOutOfBoundsException to be thrown if leadingContextLength==0. Previous to a few
// edits ago, I had required that this value be greather than zero so the code's
// assumption had been a safe one. 
//
// Jul 19 2000 Cleaned up and updated the documentation.
//
// Jul 24 2000 Added a new parameter "max_num_matches" - no more than max_num_matches
// documents will be returned as matches to a search. The default value is the 
// the total number of documents. This parameter was suggested by Danny Narayan.
// Uncovered another bug: boolean searching was actually always false as a search was
// underway because search() called stopAllSearches() _after_ setting searching to true
// and stopAllSearches sets search to false. I should scuttle this code and get on with
// the next generation of applets.
//
// Jul 25 2000 Fixed a bug in appendTrailingContext(StringBuffer). The fix required that
// the method not append directly from the input stream to the context (this was the source
// of the problem) so I renamed appendTrailingContext(StringBuffer) to getTrailingContext().
//
// August 17, 2000 Modified the SearchSieve class - see SearchSieve.java for details.
// Updated and made compliant the ReadMe.

public class SearchToHTML extends Applet {
 private HDocSearcher[] workers;
 private String[] urls;
 private String[] pageinfo;//size, last modified
 private String[] titles;
 private int numreported=0; 
 private int numWorkers=0;
 
 /**
  * The number of matches reported for the current search.
  */
 private int numOfMatches=0;
 
 /**
  * The maximum number of matches that will be reported.
  */
 private int maxNumOfMatches;
 
 private Button b;
 private Checkbox HTMLbox,Exactbox;
 private TextField searchbox;
 private String target;
 private URL docbase;
 private boolean displayMessage;
 private String message;//Message to be displayed in applet
 //to let the user know what's happening before the GUI is finished being set up
 private static final String searchTokenSeparators="\" \t\r\n,";
 private Insets insets=new Insets(2,2,2,2);
 private StringBuffer results=new StringBuffer();
 private String resultspage;
 private boolean waitForAll;
 private String searchbase=null;
 private boolean searching=false;
 
 /**
  * The text for the button to start a search.
  */
 private String search_btn_txt;
 
 /**
  * The text for the button to stop a search.
  */
 private String stop_btn_txt;
 
 /**
  * Initialize the applet.
  * <br>
  * Read and parse parameters, give meaningful values
  * to class variables, and set up the user interface.
  */
 public void init() {
   //first initialize the variables
   URL docbase=getDocumentBase();
   searchbase=docbase.getProtocol()+"://"+docbase.getHost()+docbase.getFile();
   if(searchbase.lastIndexOf('.')>searchbase.lastIndexOf('/')) {
     searchbase=searchbase.substring(0,searchbase.lastIndexOf('/')+1);
   }
   
   int leadingContextLength=15;
   String contextLenStr=getParameter("contextsize");
   if (contextLenStr==null)
       contextLenStr=getParameter("leadingcontextlength");
   if(contextLenStr!=null) {
     try {
       leadingContextLength=Integer.parseInt(contextLenStr);
       if (leadingContextLength<0) 
           leadingContextLength=0;
     }
     catch(NumberFormatException nfe) {
       System.out.println("  Problem with contextsize/leadingcontextlength parameter.");
       nfe.printStackTrace();
     }
   }
   
   int trailingContextLength=0; //Keep default behavior of previous incantations.
   String trailingLenStr = getParameter("trailingcontextlength");
   if (trailingLenStr!=null) {
       try {
           trailingContextLength=Integer.parseInt(trailingLenStr);
           if (trailingContextLength<0)
               trailingContextLength=0;
       }
       catch (NumberFormatException nfe) {
           System.out.println("  Invalid value for trailingcontextlength parameter.");
           nfe.printStackTrace();
       }
   }
   
   String files=getParameter("files");
   if (files!=null) {
     StringTokenizer st=new StringTokenizer(files,"\n\r \t,",false);
     int num=st.countTokens();
     urls=new String[num];
     workers=new HDocSearcher[num];
     pageinfo=new String[num];
     titles=new String[num];
     numWorkers=num;
     String maxNumStr = getParameter("max_num_matches");
     if (maxNumStr==null)
        maxNumOfMatches = numWorkers;
     else {
        try {
            maxNumOfMatches = Integer.parseInt(maxNumStr);
            if (maxNumOfMatches>numWorkers)
                maxNumOfMatches = numWorkers;
            //XXX! Allow ridiculous, negative numbers.
            //Why would someone want to do this? Who knows?
        }
        catch (NumberFormatException nfe) {
            System.err.println("Invalid value for \"max_num_matches\" parameter");
            nfe.printStackTrace();
        }
     }
     String currToken;
     URL cURL=null;
     for(int i=0;i<num;i++) {
        currToken=st.nextToken();
        pageinfo[i]="";
        titles[i]="";
        urls[i]=new String(currToken);
        try {
          cURL=new URL(docbase,currToken);
          workers[i]=new HDocSearcher(this,cURL,i,leadingContextLength, trailingContextLength);
        }
        catch(MalformedURLException mued) {
          urls[i]="";
          cURL=null;
          //XXX! waste an Object
          //This needs to change!
          workers[i]=new HDocSearcher(this,cURL,i,0,0);
          workers[i].setErrored();
          System.out.println(mued);
        }
     }
   }
   else { 
     displayMessage=true; 
     System.out.println("SearchToHTML Applet can\'t start");
     System.out.println("Missing required parameter: files"); 
     message="Can\'t continue: missing the \"files\" parameter.";
     repaint();
     return;
   }
   target=getParameter("target");
   if (target==null) 
       target="_top";
   resultspage=getParameter("resultspage");
   if (resultspage==null) 
       resultspage="searchresults.html";
   waitForAll=("true".equalsIgnoreCase(getParameter("waitforall")));
   if ("_top".equals(target) || "_self".equals(target)) 
       waitForAll=true;
       
   //Set up GUI
   
   //Parameters to allow control over the text in the applet
   //  Though this needed ability is very easy to implement, I'm still faced with
   //  the dilemma of what to name the parameters. Perhaps this is a sign of my insanity,
   //  but I worry about whether to name them for their functionality (like "search_btn_txt") 
   //  or their English versions (like "Search_en")...for now, functionality:
   //  search_btn_txt
   //  stop_btn_txt
   //  xhtml_chkbx_txt
   //  exact_chkbx_txt
   //  searchbox_label_txt
   search_btn_txt=getParameter("search_btn_txt","Search");
   stop_btn_txt=getParameter("stop_btn_txt","Stop");
   String xhtml_chkbx_txt=getParameter("xhtml_chkbx_txt","Exclude HTML");
   String exact_chkbx_txt=getParameter("exact_chkbx_txt","Exact matches only");
   String searchbox_label_txt=getParameter("searchbox_label_txt","Search for:");
   
   //get color parameters
   Color color=null;
   if((color=getColor(getParameter("bgcolor")))!=null) setBackground(color);
   else setBackground(Color.gray);
   if((color=getColor(getParameter("fgcolor")))!=null) setForeground(color);
   else setForeground(Color.black);
   //Lots of Panels
   setLayout(new GridLayout(2,1));//searchbox,checkboxes
   Panel ptop=new Panel();
   ptop.setLayout(new BorderLayout());
   Panel p=new Panel();
   p.add(new Label(searchbox_label_txt,Label.RIGHT));
   p.add(searchbox=new TextField(20));
   if((color=getColor(getParameter("searchboxbgcolor")))!=null) {
     searchbox.setBackground(color);
   }
   else searchbox.setBackground(Color.white);
   if((color=getColor(getParameter("searchboxfgcolor")))!=null) {
     searchbox.setForeground(color);
   }
   else searchbox.setForeground(Color.black);
   String initsearchwrds=getParameter("startwords");
   if(initsearchwrds!=null) searchbox.setText(initsearchwrds);
   //Color buttonbgcolor,buttonfgcolor;
   p.add(b=new Button(search_btn_txt));
   if((color=getColor(getParameter("buttonbgcolor")))!=null) {
     b.setBackground(color);
   }
   if((color=getColor(getParameter("buttonfgcolor")))!=null) {
     b.setForeground(color);
   }
   ptop.add("Center",p);
   add(ptop);
   Panel p2=new Panel();
   p2.setLayout(new FlowLayout(FlowLayout.CENTER));
   p2.add(HTMLbox=new Checkbox(xhtml_chkbx_txt));
   p2.add(Exactbox=new Checkbox(exact_chkbx_txt));
   if((color=getColor(getParameter("checkboxbgcolor")))!=null) {
     HTMLbox.setBackground(color);
     Exactbox.setBackground(color);
   }
   //else let default thing happen
   if((color=getColor(getParameter("checkboxfgcolor")))!=null) {
     HTMLbox.setForeground(color);
     Exactbox.setForeground(color);
   }
   
   //Check the checkboxes...
   if ("true".equals(getParameter("xhtml_chkbx_checked")))
       HTMLbox.setState(true);
   if ("true".equals(getParameter("exact_chkbx_checked")))
       Exactbox.setState(true);
   
   //else let default happen
   add(p2);
   searchbox.requestFocus();
   validate();
 }
 
 public static Color getColor(String s) {
   if(s==null) return null;
   s=s.toLowerCase();
   if(s.startsWith("#")) {
     if(s.length()!=7) return null;
     else {
       try {
         int num=Integer.parseInt(s.substring(1,7),16);//parse a hex. string to dec.
         return new Color(num);
       }
       catch(NumberFormatException e) {
         return null;
       }
     }
   }
   else if("black".equals(s)) return Color.black;
   else if("blue".equals(s)) return Color.blue;
   else if("darkblue".equals(s)) return Color.blue.darker().darker().darker();
   else if("lightblue".equals(s)) return Color.blue.brighter().brighter().brighter();
   else if("cyan".equals(s)) return Color.cyan;
   else if("darkgray".equals(s)) return Color.darkGray;
   else if("lightgray".equals(s)) return Color.lightGray;
   else if("green".equals(s)) return Color.green;
   else if("gray".equals(s)) return Color.gray;
   else if("magenta".equals(s)) return Color.magenta;
   else if("orange".equals(s)) return Color.orange;
   else if("pink".equals(s)) return Color.pink;
   else if("red".equals(s)) return Color.red;
   else if("white".equals(s)) return Color.white;
   else if("yellow".equals(s)) return Color.yellow;
   else return Color.getColor(s);
 }
 
 //Maybe this should be changed to getParameter(String,Object)
 //so it can handle all of our needs?
 //Compiler is choking on name of "default"...it's not a 
 //java reserved word, is it?
 public String getParameter(String name, String alt) {
   String val=getParameter(name);
   if(val!=null) return val;
   return alt;
 }
 
 //public so that HDocSearcher can use it...
 public final static String makeHTMLSafe(String track) {
   StringBuffer train=new StringBuffer();
   char c;
   for(int i=0;i<track.length();i++) {
      c=track.charAt(i);
      if(c=='<') train.append("&lt;");
      else if(c=='>') train.append("&gt;");
      else if(c=='&') train.append("&#38;");
      else train.append(c);
   }
   return train.toString();
 }
 
 //public so that HDocSearcher can use it...
 public final static String replaceChar(String track,char out,String in) {
   StringBuffer train=new StringBuffer();
   char c;
   for(int i=0;i<track.length();i++) {
      c=track.charAt(i);
      if(c==out) train.append(in);
      else train.append(c);
   }
   return train.toString();
 }
 
 public Insets insets() {
   return insets;
 }
 
 public void paint(Graphics g) {
   //Don't worry too much about making this pretty...
   //It's just to let the viewer (probably a disappointed webmaster)
   //know that there's something (probably bad) going on.
   if (displayMessage) 
       g.drawString(message,3,size().height/2);
 }
 
 public void reset() {
   stopAllSearches();
   numreported=0;
 }
 
 public void stopAllSearches() {
   searching=false;
   for (int i=0;i<workers.length;i++) 
       workers[i].stopSearch();
 }
 
 //Append with following format:  URL:title:anchor:pageinfo:context,[next entry]
 public synchronized void receiveMatch(int i,String anchor,String context) {
   //XXX!
   if (!searching)
       return;
   numreported++;
   numOfMatches++;
   results.append(URLEncoder.encode(urls[i])+":"+URLEncoder.encode(titles[i])+
":"+URLEncoder.encode(anchor)+":"+URLEncoder.encode(makeHTMLSafe(pageinfo[i]))+":"+
URLEncoder.encode(context)+",");
   if (numreported>=numWorkers || numOfMatches>=maxNumOfMatches || !waitForAll) {
     try {
       String tempstr=results.toString();
       tempstr=replaceChar(tempstr,'+',"%20");
       URL tempURL=new URL(searchbase+tempstr);
       getAppletContext().showDocument(tempURL,target);
     }
     catch (MalformedURLException muei3) {
         muei3.printStackTrace();
     }
   }
   if (numreported>=numWorkers || numOfMatches>=maxNumOfMatches) {
     if (!search_btn_txt.equals(b.getLabel())) 
         b.setLabel(search_btn_txt);
     searching=false;
     //XXX!
     stopAllSearches();
   }
 }
 
 /**
  * Receive the title of the document at index.
  *
  * @param title The title.
  * @param index The index of the document.
  */
 public final void receiveTitle(String title, int index) {
	 //XXX! Allows the title to be overwritten by successive calls.
    titles[index]=title;
 }
 
 /**
  * Have we got the title of the document at index?.
  *
  * @param index A document index.
  * @return Whether the title has been receive.
  */
 public final boolean hasTitle(int index) {
    //It's okay to directly compare the Objects because
    //the compiler makes a table of static Strings so
    //"" will match with the default value assigned in init().
    if (titles[index]=="" || titles[index]==null)
 	    return false;
    else
        return true;
 }
 
 //An HDocSearcher that is errored can only ever call this once
 //because its errored flag is set
 public synchronized void receiveNoMatch(int i) {
   if(workers[i].isErrored()) {
     numWorkers--;
   }
   else 
      numreported++;
   if (!searching)
       return;
   if(numreported>=numWorkers) {
     searching=false;
     try{ 
       URL tempURL=new URL(searchbase+replaceChar(results.toString(),'+',"%20"));
       if(!search_btn_txt.equals(b.getLabel())) b.setLabel(search_btn_txt);
       getAppletContext().showDocument(tempURL,target);
     }
     catch(MalformedURLException muei3) {System.out.println(muei3);}
   }
   repaint();
 }
 
 //called after an HDocSearcher opens a connection for the
 //first time and can get some extra info like last modified date
 //and file size
 public void addInfo(int i,String info) {
   pageinfo[i]=info;
 }
 
 public boolean action(Event evt, Object arg) {
   if(evt.target == searchbox) {
     search(searchbox.getText(),Exactbox.getState(),HTMLbox.getState());
     b.setLabel(stop_btn_txt);
   }
   else if(evt.target == b) {
     if(search_btn_txt.equals(b.getLabel())) {
       b.setLabel(stop_btn_txt);
       search(searchbox.getText(),Exactbox.getState(),HTMLbox.getState());
     }
     else {
       b.setLabel(search_btn_txt);
       stopAllSearches();
       reset();
     }
   }
   return true;
 }
 
 protected void search(String s,boolean bexact,boolean cutHTML) {
   numOfMatches=0;
   repaint();
   reset();
   searching=true;
   results.setLength(0);
   results.append(resultspage);
   results.append('?');
   results.append(URLEncoder.encode(makeHTMLSafe(s)));//Decode with JavaScript unescape(s)...
   results.append(',');
   if(bexact) results.append('y');
   else results.append('n');
   if(cutHTML) results.append('y');
   else results.append('n');
   //results.append(',');
   StringTokenizer st=new StringTokenizer(s,searchTokenSeparators,true);
   Vector tempv=new Vector();
   String currWord="";//this probably should be a StringBuffer
   String currToken="";
   boolean insideQuote=false;
   while(st.hasMoreTokens()) {
     currToken=st.nextToken();
     if(searchTokenSeparators.indexOf(currToken)!=-1) {
       if("\"".equals(currToken)) {
         insideQuote=!insideQuote;
         if (insideQuote) 
             currWord="";
         else {
           if (currWord.length()>0) 
               tempv.addElement((new String(currWord)).toLowerCase());
           currWord="";
         }
       }
       else if (insideQuote) 
           currWord+=currToken;
     }
     else if (!insideQuote) {
       tempv.addElement((new String(currToken)).toLowerCase());
       currWord="";
     }
     else if (insideQuote) {
       currWord+=currToken;
     }
   }
   if (currWord.length()>0) tempv.addElement((new String(currWord)).toLowerCase());
   String[] ss=new String[tempv.size()];
   tempv.copyInto(ss);
   int workerLen = workers.length;
   for (int i2=0;i2<workerLen;i2++) 
   	   workers[i2].searchFor(ss,bexact,cutHTML);
 }
 
 public void stop() {
   stopAllSearches();
   for (int i=0;i<workers.length;i++) 
       workers[i].stopRunning();
   //don't tie up CPU time when we're hidden or about to die
 }
 
}
