:: Palácio Nacional de Sintra, Portugal
Home Portfolio Projects Surtell Family
 

Source Code Library
Google Highlighting

Introduction

:: An Electronics in Meccano article with search terms highlightedIf you do a search on Google, next to each search result it returns you will see a link to Google's cache.

When you click the link, Google displays the resulting webpage with your search terms highlighted so you can quickly see if the page is what you are looking for.

The ASP script described on this page was written to implement similar search result highlighting on my Electronics in Meccano (EiM) website.

When someone's search on a search engine results in a link to an EiM article or message board entry, the EiM website highlights their search terms in an appropriate colour. The search result highlighting also appears when the internal EiM search engine is used.

Features

  » Easily configurable highlight colour and font using HTML styles.
  » Can recognise search terms from any search engine if its 'signature' has been defined.
  » Filters 'stopwords' such as 'the' and 'and'.
  » Parses HTML code drawn from a file and only searches its text, leaving HTML commands intact.

Explaining the Code

The rest of this article describes how the search highlighting code works, using the Electronics in Meccano website 'article.asp' page as an example. To aid readability, extracts from the full script are given - you will find the complete script at the end of the page, and you can download the script as part of the Source Code Library zip file below.

 

:: Download source_code_library.zip ::
source_code_library.zip (6kb)

Download Source Code Library

This zip file contains the Search Terms Highlighting script and all the other scripts on Surtell.com.

Note that frequently the code lines are two long to fit on one line, so I have split them into two or more lines using the _ character. The code will run, but you can put these lines of code back onto one line if you wish.

Before you go any further, I would suggest that you view an example search by following the instructions below:

  » Click here to open the EiM Search page.
  » In the 'Look for' box enter '555 astable circuit' and then click 'Go'.
  » Click on the result entitled 'The 555 Astable Circuit' to view this article.
  » You should see your seach terms highlighted.

Retrieving the Search Terms

In order to determine if the user has reached the article.asp page from a search engine, we first need to retrieve the Referer information using HTTP_REFERER server variable.

referer = LCase(request.serverVariables("HTTP_REFERER"))

The Referer string contains the URL of the last page the user visited, including any query parameters. So, if the user reached our page from a Google search, the Referer might be 'http://www.google.com?q=555+astable+circuit'. This string can be parsed to determine if the referer is a search engine, and if so, what the search terms were.

To find out what the name of the search engine, we need to see if the website name 'google' is in the Referer string. Once we know that the search engine is Google, we can then look for the query which starts after the 'q='. The search terms will be all the text after 'q=', until the end of the string, or we encounter a '&' which indicates another parameter in the query.

The 'google' and 'q=' are the 'signatures' of the Google search engine. If a search is done on Google, the word 'google' will always be in the URL, as will be the 'q' parameter.

So that any number of search engines can be identified, the signatures for multiple search engines are placed in arrays as shown. Most of the popular search engines are there, and you could add more if you wish, including the signature of your own internal site search engine if you have one (like the 'eleinmec' signature shown).

dim searchEngineSignature(8), searchEngineQueryField(8)

searchEngineSignature(0) = "eleinmec"  : searchEngineQueryField(0) = "yider"
searchEngineSignature(1) = "google"    : searchEngineQueryField(1) = "q"
searchEngineSignature(2) = "msn"       : searchEngineQueryField(2) = "q"
searchEngineSignature(3) = "yahoo"     : searchEngineQueryField(3) = "p"
searchEngineSignature(4) = "netscape"  : searchEngineQueryField(4) = "query"
searchEngineSignature(5) = "aol"       : searchEngineQueryField(5) = "query"
searchEngineSignature(6) = "ask"       : searchEngineQueryField(6) = "q"
searchEngineSignature(7) = "altavista" : searchEngineQueryField(7) = "q"
searchEngineSignature(8) = "looksmart" : searchEngineQueryField(8) = "qt"

The code below checks the Referer to see of any of the search engine signatures can be found. If one is, the search terms are extracted and each word ends up as an element in the searchTerms array.

Note the use of the URLDecode function which decodes any encoded character strings in the Referer to their ASCII character. For example, instances of the double-quote character " would be represented in the Referer as %22.

for searchEngineIndex = 0 to UBound(searchEngineSignature)
  if instr(referer, searchEngineSignature(searchEngineIndex)) > 0 _
    and instr(referer, searchEngineQueryField(searchEngineIndex) & "=") > 0 then
    'Referer has been found - extract search terms
    referer = mid(referer, instr(referer, searchEngineQueryField(searchEngineIndex) & "=") + _
      len(searchEngineQueryField(searchEngineIndex)) + 1)
    if instr(referer, "&") then
      referer = left(referer, instr(referer, "&") - 1)
    end if
    referer = replace(URLDecode(referer), """", "")
    
    searchTerms = split(referer, " ")
    exit for
  end if
next

Filtering Stopwords

Now that we have extracted the search terms and they are in the searchTerms array, we could now go ahead and highlight the search terms in the HTML document. However, if the search query was "where can I buy the 555 astable circuit in the UK?" then a fair amount of the page would be highlighted since words like 'where' and 'the' are common. This would be irritating for the user, so we first need to filter out these 'stopwords'.

I have stored a long list of stopwords in the stopWords variable. These are taken from four separate sources and so should be comprehensive. The code below finds any stopwords in the searchTerms array and simply replaces them with an empty string.

stopWords = " a able about above according ... ... yourself yourselves you've z zero"

if not isEmpty(searchTerms) then
  for searchTermsIndex = 0 to UBound(searchTerms)
    if instr(stopWords, " " & searchTerms(searchTermsIndex) & " ") then
      searchTerms(searchTermsIndex) = ""
    end if
  next
end if

Displaying the Search Terms

Before the article HTML is parsed, the user is reminded of their search terms above the article. Note the 'Remove Highlighting' link back to the same page - when the page loads if this link is clicked, the Referer will now be different and highlighting will therefore not occur.

At this point the hiliteStart and hiliteEnd variables are defined. These contain the HTML code which will make the words on the page look highlighted. Like Google's cache highlighting, I have used the <b> tag and set a background colour so that highlighted text appears in bold with a light green background. Depending on your site's content, you might also want to add a 'color:#000000' parameter to the style to make sure the highlighted text is black.

hiliteStart = "<b style=""background:#ECFFF0"">"
hiliteEnd   = "</b>"

if not isEmpty(searchTerms) then
  response.write "<p class=""small""><b style=""color:#303090"">" & _
    "Your search terms have been highlighted: </b>"
  for searchTermsIndex = 0 to UBound(searchTerms)
    response.write hiliteStart & _
      server.htmlEncode(searchTerms(searchTermsIndex)) & hiliteEnd & " "
  next
  response.write "&nbsp;&nbsp;&nbsp;&nbsp;" & _
    "<a href=""article.asp?" & ArticleID & """>Remove highlighting</a></p>"
end if

Highlighting the Search Terms

Finally, the article HTML can be parsed and the search terms highlighted. In reality the article HTML is read from a file into the variable articleText; however, in the example below it is simply a few lines of HTML.

The code checks all the text in articleText for the search terms and applies the highlighting. It leaves any HTML tags that it finds intact.

articleText = "<p>Here is my example article text about the 555 astable circuit " & _
  with some of it in <b>bold</b> and some of it<br>on the next line.</p>"

if not isEmpty(searchTerms) then
  for searchTermsIndex = 0 to UBound(searchTerms)
    if len(searchTerms(searchTermsIndex)) then
      'Parse for each search term
      textStart = 0
      textEnd   = 0

      do while textEnd <= len(articleText)
        'Extract a piece of text
        textStart = instr(textEnd + 1, articleText, ">") + 1
        textEnd   = instr(textStart, articleText, "<")
        if textEnd = 0 then
          textEnd = len(articleText) + 1
        end if

        'Replace occurrances of the searchTerm in the extracted text
        if len(mid(articleText, textStart, textEnd - textStart)) then
          searchStart = 0
          do
            searchStart = instr(searchStart + 1, _
              mid(articleText, textStart, textEnd - textStart), _
              searchTerms(searchTermsIndex), vbTextCompare)
            if searchStart then
              articleText = left(articleText, textStart + searchStart - 2) & hiliteStart & _
                mid(articleText, textStart + searchStart - 1, _
                len(searchTerms(searchTermsIndex))) & hiliteEnd & _
                mid(articleText, textStart + searchStart + _
                len(searchTerms(searchTermsIndex)) - 1)
              searchStart = searchStart + _
                len(searchTerms(searchTermsIndex) & hiliteStart & hiliteEnd)
              textEnd = textEnd + len(hiliteStart & hiliteEnd)
            end if
          loop while searchStart
        end if
      loop
    end if
  next
end if

Note that articleText must start with an HTML tag otherwise the highlighting might not be accurate. If you know that the articleText is just plain text, you can simply concatenate a tag to the start of it to ensure that the highlighting will work correctly, as shown below. The tag does not even have to be valid HTML!

articleText = "<x>" & articleText

Now all that is needed is to write articleText to the browser and the script is finished.

response.write articleText 

The Complete Script

The script below includes two lines to set up example HTML in articleText and Google as the Referer for testing purposes. You will need to remove these lines when you are satisfied that the script is working correctly.

<%
option explicit

'************************************
'* SEARCH TERMS HIGHLIGHTING SCRIPT *
'************************************

'Written by Tim Surtell and downloaded from www.surtell.com
'(C) 2004 Tim Surtell with the exception of
'the URLDecode function (C) Yvom Snippet & Code Library
'This code may be freely distributed providing that the credit above is retained

'Define variables
dim articleText
dim referer, searchTerms, searchTermsIndex
dim textStart, textEnd, searchStart, hiliteStart, hiliteEnd
dim searchEngineSignature(8), searchEngineQueryField(8), searchEngineIndex, stopWords

'Set search engine query highlight variables and stop words
hiliteStart = "<b style=""background:#ECFFF0"">"
hiliteEnd   = "</b>"
stopWords  = " a able about above according accordingly across actually after " & _
  "afterwards again against ain't all allow allows almost alone along already " & _
  "also although always am among amongst an and another any anybody anyhow " & _
  "anyone anything anyway anyways anywhere apart appear appreciate " & _
  "appropriate approximately are area areas aren't around as a's aside ask " & _
  "asked asking asks associated at available away awfully b back backed " & _
  "backing backs be became because become becomes becoming been before " & _
  "beforehand began behind being beings believe below beside besides best " & _
  "better between beyond big both brief but by c came can cannot cant can't " & _
  "case cases cause causes certain certainly changes clear clearly c'mon co " & _
  "com come comes concerning consequently consider considering contain " & _
  "containing contains corresponding could couldn't course c's currently d " & _
  "de definitely described despite did didn't differ different differently " & _
  "do does doesn't doing done don't down downed downing downs downwards due " & _
  "during e each early edu eg eight either else elsewhere en end ended " & _
  "ending ends enough entirely especially et etc even evenly ever every " & _
  "everybody everyone everything everywhere ex exactly example except f " & _
  "face faces fact facts far felt few fifth find finds first five followed " & _
  "following follows for former formerly forth found four from full fully " & _
  "further furthered furthering furthermore furthers g gave general " & _
  "generally get gets getting give given gives giving go goes going gone " & _
  "good goods got gotten great greater greatest greetings group grouped " & _
  "grouping groups h had hadn't happens hardly has hasn't have haven't " & _
  "having he hello help hence her here hereafter hereby herein here's " & _
  "hereupon hers herself he's hi high higher highest him himself his hither " & _
  "hopefully how howbeit however i i'd ie if ignored i'll i'm immediate " & _
  "important in inasmuch inc indeed indicate indicated indicates inner " & _
  "insofar instead interest interested interesting interests into inward is " & _
  "isn't it it'd it'll its it's itself i've j just k keep keeps kept kg kind " & _
  "km knew know known knows l la large largely last lately later latest " & _
  "latter latterly least less lest let lets let's like liked likely little " & _
  "long longer longest look looking looks ltd m made mainly make making man " & _
  "many may maybe me mean meanwhile member members men merely might min ml " & _
  "mm more moreover most mostly mr mrs much must my myself n name namely nd " & _
  "near nearly necessary need needed needing needs neither never " & _
  "nevertheless new newer newest next nine no nobody non none noone nor " & _
  "normally not nothing novel now nowhere number numbers o obtain obtained " & _
  "obviously of off often oh ok okay old older oldest on once one ones only " & _
  "onto open opened opening opens or order ordered ordering orders other " & _
  "others otherwise ought our ours ourselves out outside over overall own " & _
  "p part parted particular particularly parting parts per perhaps place " & _
  "placed places please plus point pointed pointing points possible present " & _
  "presented presenting presents presumably previously probably problem " & _
  "problems provides put puts q que quite qv r rather rd re really " & _
  "reasonably regarding regardless regards relatively respectively resulted " & _
  "resulting right room rooms s said same saw say saying says second " & _
  "secondly seconds see seeing seem seemed seeming seems seen sees self " & _
  "selves sensible sent serious seriously seven several shall she should " & _
  "shouldn't show showed showing shown shows side sides significant " & _
  "significantly since six small smaller smallest so some somebody somehow " & _
  "someone something sometime sometimes somewhat somewhere soon sorry " & _
  "specified specify specifying state states still sub such suggest sup sure " & _
  "t take taken tell tends th than thank thanks thanx that thats that's the " & _
  "their theirs them themselves then thence there thereafter thereby " & _
  "therefore therein theres there's thereupon these they they'd they'll " & _
  "they're they've thing things think thinks third this thorough thoroughly " & _
  "those though thought thoughts three through throughout thru thus to today " & _
  "together too took toward towards tried tries truly try trying t's turn " & _
  "turned turning turns twice two u un und under unfortunately unless " & _
  "unlikely until unto up upon us use used useful uses using usually v value " & _
  "various very via viz vs w want wanted wanting wants was wasn't way ways " & _
  "we we'd welcome well we'll wells went were we're weren't we've what " & _
  "whatever what's when whence whenever where whereafter whereas whereby " & _
  "wherein where's whereupon wherever whether which while whither who " & _
  "whoever whole whom who's whose why will willing wish with within without " & _
  "wonder won't work worked working works would wouldn't www x y year " & _
  "years yes yet you you'd you'll young younger youngest your you're yours " & _
  "yourself yourselves you've z zero "

'Define seach engine query signatures
searchEngineSignature(0)  = "eleinmec"   : searchEngineQueryField(0) = "yider"
searchEngineSignature(1)  = "google"     : searchEngineQueryField(1) = "q"
searchEngineSignature(2)  = "msn"        : searchEngineQueryField(2) = "q"
searchEngineSignature(3)  = "yahoo"      : searchEngineQueryField(3) = "p"
searchEngineSignature(4)  = "netscape"   : searchEngineQueryField(4) = "query"
searchEngineSignature(5)  = "aol"        : searchEngineQueryField(5) = "query"
searchEngineSignature(6)  = "ask"        : searchEngineQueryField(6) = "q"
searchEngineSignature(7)  = "altavista"  : searchEngineQueryField(7) = "q"
searchEngineSignature(8)  = "looksmart"  : searchEngineQueryField(8) = "qt"

'Check referer to see if there are search engine search terms
referer = LCase(request.serverVariables("HTTP_REFERER"))

'*** Set test referer - remove after testing!
referer = "http://www.google.co.uk?q=555+astable+circuit"

for searchEngineIndex = 0 to UBound(searchEngineSignature)
  if instr(referer, searchEngineSignature(searchEngineIndex)) > 0 _
    and instr(referer, searchEngineQueryField(searchEngineIndex) & "=") > 0 then
    'Referer has been found - extract search terms
    referer = mid(referer, instr(referer, searchEngineQueryField(searchEngineIndex) & "=") _
      + len(searchEngineQueryField(searchEngineIndex)) + 1)
    if instr(referer, "&") then
      referer = left(referer, instr(referer, "&") - 1)
    end if
    referer = replace(URLDecode(referer), """", "")

    searchTerms = split(referer, " ")
    exit for
  end if
next

'Check search terms for stop words
if not isEmpty(searchTerms) then
  for searchTermsIndex = 0 to UBound(searchTerms)
    if instr(stopWords, " " & searchTerms(searchTermsIndex) & " ") then
      searchTerms(searchTermsIndex) = ""
    end if
  next
end if

'If there has been a search, display search terms
if not isEmpty(searchTerms) then
  response.write "<p class=""small""><b style=""color:#303090"">" & _
    "Your search terms have been highlighted: </b>"
  for searchTermsIndex = 0 to UBound(searchTerms)
    response.write hiliteStart & _
      server.htmlEncode(searchTerms(searchTermsIndex)) & hiliteEnd & " "
  next
  response.write "&nbsp;&nbsp;&nbsp;&nbsp;" & _
    "<a href=""thispage.asp"">Remove highlighting</a></p>"
end if

'*** Set test articleText - remove after testing!
articleText = "<p>Here is my example article text about the 555 astable circuit " & _
  "with some of it in <b>bold</b> and some of it<br>on the next line.</p>"

'Highlight any search terms by parsing HTML for text content
if not isEmpty(searchTerms) then
  for searchTermsIndex = 0 to UBound(searchTerms)
    if len(searchTerms(searchTermsIndex)) then
      'Parse for each search term
      textStart = 0
      textEnd   = 0
      do while textEnd <= len(articleText)
        'Extract a piece of text
        textStart = instr(textEnd + 1, articleText, ">") + 1
        textEnd   = instr(textStart, articleText, "<")

        if textEnd = 0 then
          textEnd = len(articleText) + 1
        end if

        'Replace occurrances of the searchTerm in the extracted text
        if len(mid(articleText, textStart, textEnd - textStart)) then
          searchStart = 0
          do
            searchStart = instr(searchStart + 1, _
              mid(articleText, textStart, textEnd - textStart), _
              searchTerms(searchTermsIndex), vbTextCompare)
            if searchStart then
              articleText = left(articleText, textStart + searchStart - 2) & hiliteStart & _
                mid(articleText, textStart + searchStart - 1, _
                len(searchTerms(searchTermsIndex))) & hiliteEnd & _
                mid(articleText, textStart + searchStart + _
                len(searchTerms(searchTermsIndex)) - 1)
              searchStart = searchStart + _
                len(searchTerms(searchTermsIndex) & hiliteStart & hiliteEnd)
              textEnd = textEnd + len(hiliteStart & hiliteEnd)
            end if
          loop while searchStart
        end if
      loop
    end if
  next
end if

response.write articleText
 
Function URLDecode(strString)
    strString = Replace(strString, "%2F", "/")
    strString = Replace(strString, "%7C", "|")
    strString = Replace(strString, "%3F", "?")
    strString = Replace(strString, "%21", "!")
    strString = Replace(strString, "%40", "@")
    strString = Replace(strString, "%5C", "\")
    strString = Replace(strString, "%23", "#")
    strString = Replace(strString, "%24", "$")
    strString = Replace(strString, "%5E", "^")
    strString = Replace(strString, "%26", "&")
    strString = Replace(strString, "%25", "%")
    strString = Replace(strString, "%2A", "*")
    strString = Replace(strString, "%28", "(")
    strString = Replace(strString, "%29", ")")
    strString = Replace(strString, "%7D", "}")
    strString = Replace(strString, "%3A", ":")
    strString = Replace(strString, "%2C", ",")
    strString = Replace(strString, "%7B", "{")
    strString = Replace(strString, "%2B", "+")
    strString = Replace(strString, "%2E", ".")
    strString = Replace(strString, "%2D", "-")
    strString = Replace(strString, "%7E", "~")
    strString = Replace(strString, "%2D", "-")
    strString = Replace(strString, "%5B", "[")
    strString = Replace(strString, "%5F", "_")
    strString = Replace(strString, "%5D", "]")
    strString = Replace(strString, "%60", "`") 
    strString = Replace(strString, "%3D", "=")
    strString = Replace(strString, "%27", "'")
    strString = Replace(strString, "+", " ")
    strString = Replace(strString, "%22", Chr(34))
    URLDecode = strString
End Function
%>

References

Written on 27th January 2004

 
Top of page
:: 14080 hits - 3.2% of site total :: :: 14080 hits - 3.2% of site total ::
© 2001 - 2017 Tim Surtell