Explaining the code
The rest of this article describes how the search highlighting code works, using the Electronics in Meccano website ‘article.asp’ page as an example. To aid readability, only extracts from the full script are given — you will find the complete script at the end of the article.
Note that frequently the code lines are two long to fit on one line, so I have sometimes split them into two or more lines using the _ character. The code will run, but you can put these lines of code back onto one line if you wish.
Before you go any further, I would suggest that you view an example search by following the instructions below:
- Click here to open the Electronics in Meccano website’s search page.
- In the ‘Look for’ box enter ‘555 astable circuit’ and then click on the ‘Go’ button.
- Click on the result entitled ‘The 555 Astable Circuit’ to view this article.
- You should see your search terms highlighted.
Retrieving the search terms
In order to determine if the user has reached the article.asp page from a search engine, we first need to retrieve the Referer information using HTTP_REFERER server variable.
referer = LCase(request.serverVariables("HTTP_REFERER"))
The Referer string contains the URL of the last page the user visited, including any query parameters. So, if the user reached our page from a Google search, the Referer might be ‘http://www.google.com?q=555+astable+circuit’. This string can be parsed to determine if the referer is a search engine, and if so, what the search terms were.
To find out what the name of the search engine, we need to see if the website name ‘google’ is in the Referer string. Once we know that the search engine is Google, we can then look for the query which starts after the ‘q=’. The search terms will be all the text after ‘q=’, until the end of the string, or we encounter a ‘&’ which indicates another parameter in the query.
The ‘google’ and ‘q=’ are the ‘signatures’ of the Google search engine. If a search is done on Google, the word ‘google’ will always be in the URL, as will be the ‘q’ parameter.
So that any number of search engines can be identified, the signatures for multiple search engines are placed in arrays as shown. Most of the popular search engines are there, and you could add more if you wish, including the signature of your own website’s search engine if you have one (like the ‘eleinmec’ signature shown).
dim searchEngineSignature(8), searchEngineQueryField(8)
searchEngineSignature(0) = "eleinmec" : searchEngineQueryField(0) = "yider"
searchEngineSignature(1) = "google" : searchEngineQueryField(1) = "q"
searchEngineSignature(2) = "msn" : searchEngineQueryField(2) = "q"
searchEngineSignature(3) = "yahoo" : searchEngineQueryField(3) = "p"
searchEngineSignature(4) = "netscape" : searchEngineQueryField(4) = "query"
searchEngineSignature(5) = "aol" : searchEngineQueryField(5) = "query"
searchEngineSignature(6) = "ask" : searchEngineQueryField(6) = "q"
searchEngineSignature(7) = "altavista" : searchEngineQueryField(7) = "q"
searchEngineSignature(8) = "looksmart" : searchEngineQueryField(8) = "qt"
The code below checks the Referer to see of any of the search engine signatures can be found. If one is found, the search terms are extracted and each word ends up as an element in the searchTerms array.
Note the use of the URLDecode function which decodes any encoded character strings in the Referer to their ASCII character. For example, instances of the double-quote character ‘"’ in the Referer would be represented as ‘%22’.
for searchEngineIndex = 0 to UBound(searchEngineSignature)
if instr(referer, searchEngineSignature(searchEngineIndex)) > 0 and instr(referer, searchEngineQueryField(searchEngineIndex) & "=") > 0 then
'Referer has been found - extract search terms
referer = mid(referer, instr(referer, searchEngineQueryField(searchEngineIndex) & "=") + len(searchEngineQueryField(searchEngineIndex)) + 1)
if instr(referer, "&") then
referer = left(referer, instr(referer, "&") - 1)
end if
referer = replace(URLDecode(referer), """", "")
searchTerms = split(referer, " ")
exit for
end if
next
Filtering stopwords
Now that the search terms have been extracted and they are in the searchTerms array, the search terms could now be highlighted in the HTML document. However, if the search query was ‘where can I buy the 555 astable circuit in the UK?’ then a fair amount of the page would be highlighted since words like ‘where’ and ‘the’ are common. This would be irritating for the visitor, so these ‘stopwords’ need to be filtered out first.
I have stored a list of stopwords in the stopWords variable. These are taken from three separate sources and so should be comprehensive. The code below finds any stopwords in the searchTerms array and simply replaces them with an empty string.
stopWords = " a able about above according ... ... yourself yourselves you've z zero"
if not isEmpty(searchTerms) then
for searchTermsIndex = 0 to UBound(searchTerms)
if instr(stopWords, " " & searchTerms(searchTermsIndex) & " ") then
searchTerms(searchTermsIndex) = ""
end if
next
end if
Displaying the search terms
Before the article HTML is parsed, the visitor is reminded of their search terms above the article. Note the ‘Remove Highlighting’ link back to the same page — if this link is clicked, the Referer will change and highlighting will therefore not occur.
At this point the hiliteStart and hiliteEnd variables are defined. These contain the HTML code which will make the words on the page look highlighted. I have used the <b> tag and set a background colour so that highlighted text appears in bold with a light green background. Depending on your website’s content, you might also want to add a ‘color:#000000’ property to the style to make sure the highlighted text is black.
hiliteStart = "<b style=""background:#ECFFF0"">"
hiliteEnd = "</b>"
if not isEmpty(searchTerms) then
response.write "<p class=""small""><b style=""color:#303090"">" & "Your search terms have been highlighted: </b>"
for searchTermsIndex = 0 to UBound(searchTerms)
response.write hiliteStart & server.htmlEncode(searchTerms(searchTermsIndex)) & hiliteEnd & " "
next
response.write " " & "<a href=""article.asp?" & ArticleID & """>Remove highlighting</a></p>"
end if
Highlighting the search terms
Finally, the article HTML can be parsed and the search terms highlighted. In reality the article HTML is read from a file into the variable articleText; however, in the example below it is simply a few lines of HTML.
The code checks for the search terms and applies the highlighting, leaving any HTML tags that it finds intact.
articleText = "<p>Here is my example article text about the 555 astable circuit with some of it in <b>bold</b> and some of it<br>on the next line.</p>"
if not isEmpty(searchTerms) then
for searchTermsIndex = 0 to UBound(searchTerms)
if len(searchTerms(searchTermsIndex)) then
'Parse for each search term
textStart = 0
textEnd = 0
do while textEnd <= len(articleText)
'Extract a piece of text
textStart = instr(textEnd + 1, articleText, ">") + 1
textEnd = instr(textStart, articleText, "<")
if textEnd = 0 then
textEnd = len(articleText) + 1
end if
'Replace occurrances of the searchTerm in the extracted text
if len(mid(articleText, textStart, textEnd - textStart)) then
searchStart = 0
do
searchStart = instr(searchStart + 1, mid(articleText, textStart, textEnd - textStart), searchTerms(searchTermsIndex), vbTextCompare)
if searchStart then
articleText = left(articleText, textStart + searchStart - 2) & hiliteStart & mid(articleText, textStart + searchStart - 1, len(searchTerms(searchTermsIndex))) & hiliteEnd & mid(articleText, textStart + searchStart + len(searchTerms(searchTermsIndex)) - 1)
searchStart = searchStart + len(searchTerms(searchTermsIndex) & hiliteStart & hiliteEnd)
textEnd = textEnd + len(hiliteStart & hiliteEnd)
end if
loop while searchStart
end if
loop
end if
next
end if
Note that articleText must start with an HTML tag otherwise the highlighting might not be accurate. If you know that the articleText is just plain text, you can simply concatenate a tag to the start of it to ensure that the highlighting will work correctly, as shown below. The tag does not even have to be valid HTML!
articleText = "<x>" & articleText
Now all that is needed is to return the text in articleText to the browser.
response.write articleText
The complete script
The complete script below includes two lines which set example HTML in articleText and Google as the Referer for testing purposes. You will need to remove these lines when you are satisfied that the script is working correctly.
<%
option explicit
'************************************
'* SEARCH TERMS HIGHLIGHTING SCRIPT *
'************************************
'Written by Tim Surtell and downloaded from www.surtell.com
'(C) 2004 Tim Surtell with the exception of
'the URLDecode function (C) Yvom Snippet & Code Library
'This code may be freely distributed providing that the credit above is retained
'Define variables
dim articleText
dim referer, searchTerms, searchTermsIndex
dim textStart, textEnd, searchStart, hiliteStart, hiliteEnd
dim searchEngineSignature(8), searchEngineQueryField(8), searchEngineIndex, stopWords
'Set search engine query highlight variables and stop words
hiliteStart = "<b style=""background:#ECFFF0"">"
hiliteEnd = "</b>"
stopWords = " a able about above according accordingly across actually after " & _
"afterwards again against ain't all allow allows almost alone along already " & _
"also although always am among amongst an and another any anybody anyhow " & _
"anyone anything anyway anyways anywhere apart appear appreciate " & _
"appropriate approximately are area areas aren't around as a's aside ask " & _
"asked asking asks associated at available away awfully b back backed " & _
"backing backs be became because become becomes becoming been before " & _
"beforehand began behind being beings believe below beside besides best " & _
"better between beyond big both brief but by c came can cannot cant can't " & _
"case cases cause causes certain certainly changes clear clearly c'mon co " & _
"com come comes concerning consequently consider considering contain " & _
"containing contains corresponding could couldn't course c's currently d " & _
"de definitely described despite did didn't differ different differently " & _
"do does doesn't doing done don't down downed downing downs downwards due " & _
"during e each early edu eg eight either else elsewhere en end ended " & _
"ending ends enough entirely especially et etc even evenly ever every " & _
"everybody everyone everything everywhere ex exactly example except f " & _
"face faces fact facts far felt few fifth find finds first five followed " & _
"following follows for former formerly forth found four from full fully " & _
"further furthered furthering furthermore furthers g gave general " & _
"generally get gets getting give given gives giving go goes going gone " & _
"good goods got gotten great greater greatest greetings group grouped " & _
"grouping groups h had hadn't happens hardly has hasn't have haven't " & _
"having he hello help hence her here hereafter hereby herein here's " & _
"hereupon hers herself he's hi high higher highest him himself his hither " & _
"hopefully how howbeit however i i'd ie if ignored i'll i'm immediate " & _
"important in inasmuch inc indeed indicate indicated indicates inner " & _
"insofar instead interest interested interesting interests into inward is " & _
"isn't it it'd it'll its it's itself i've j just k keep keeps kept kg kind " & _
"km knew know known knows l la large largely last lately later latest " & _
"latter latterly least less lest let lets let's like liked likely little " & _
"long longer longest look looking looks ltd m made mainly make making man " & _
"many may maybe me mean meanwhile member members men merely might min ml " & _
"mm more moreover most mostly mr mrs much must my myself n name namely nd " & _
"near nearly necessary need needed needing needs neither never " & _
"nevertheless new newer newest next nine no nobody non none noone nor " & _
"normally not nothing novel now nowhere number numbers o obtain obtained " & _
"obviously of off often oh ok okay old older oldest on once one ones only " & _
"onto open opened opening opens or order ordered ordering orders other " & _
"others otherwise ought our ours ourselves out outside over overall own " & _
"p part parted particular particularly parting parts per perhaps place " & _
"placed places please plus point pointed pointing points possible present " & _
"presented presenting presents presumably previously probably problem " & _
"problems provides put puts q que quite qv r rather rd re really " & _
"reasonably regarding regardless regards relatively respectively resulted " & _
"resulting right room rooms s said same saw say saying says second " & _
"secondly seconds see seeing seem seemed seeming seems seen sees self " & _
"selves sensible sent serious seriously seven several shall she should " & _
"shouldn't show showed showing shown shows side sides significant " & _
"significantly since six small smaller smallest so some somebody somehow " & _
"someone something sometime sometimes somewhat somewhere soon sorry " & _
"specified specify specifying state states still sub such suggest sup sure " & _
"t take taken tell tends th than thank thanks thanx that thats that's the " & _
"their theirs them themselves then thence there thereafter thereby " & _
"therefore therein theres there's thereupon these they they'd they'll " & _
"they're they've thing things think thinks third this thorough thoroughly " & _
"those though thought thoughts three through throughout thru thus to today " & _
"together too took toward towards tried tries truly try trying t's turn " & _
"turned turning turns twice two u un und under unfortunately unless " & _
"unlikely until unto up upon us use used useful uses using usually v value " & _
"various very via viz vs w want wanted wanting wants was wasn't way ways " & _
"we we'd welcome well we'll wells went were we're weren't we've what " & _
"whatever what's when whence whenever where whereafter whereas whereby " & _
"wherein where's whereupon wherever whether which while whither who " & _
"whoever whole whom who's whose why will willing wish with within without " & _
"wonder won't work worked working works would wouldn't www x y year " & _
"years yes yet you you'd you'll young younger youngest your you're yours " & _
"yourself yourselves you've z zero "
'Define seach engine query signatures
searchEngineSignature(0) = "eleinmec" : searchEngineQueryField(0) = "yider"
searchEngineSignature(1) = "google" : searchEngineQueryField(1) = "q"
searchEngineSignature(2) = "msn" : searchEngineQueryField(2) = "q"
searchEngineSignature(3) = "yahoo" : searchEngineQueryField(3) = "p"
searchEngineSignature(4) = "netscape" : searchEngineQueryField(4) = "query"
searchEngineSignature(5) = "aol" : searchEngineQueryField(5) = "query"
searchEngineSignature(6) = "ask" : searchEngineQueryField(6) = "q"
searchEngineSignature(7) = "altavista" : searchEngineQueryField(7) = "q"
searchEngineSignature(8) = "looksmart" : searchEngineQueryField(8) = "qt"
'Check referer to see if there are search engine search terms
referer = LCase(request.serverVariables("HTTP_REFERER"))
'*** Set test referer - remove after testing!
referer = "http://www.google.co.uk?q=555+astable+circuit"
for searchEngineIndex = 0 to UBound(searchEngineSignature)
if instr(referer, searchEngineSignature(searchEngineIndex)) > 0 and instr(referer, searchEngineQueryField(searchEngineIndex) & "=") > 0 then
'Referer has been found - extract search terms
referer = mid(referer, instr(referer, searchEngineQueryField(searchEngineIndex) & "=") + len(searchEngineQueryField(searchEngineIndex)) + 1)
if instr(referer, "&") then
referer = left(referer, instr(referer, "&") - 1)
end if
referer = replace(URLDecode(referer), """", "")
searchTerms = split(referer, " ")
exit for
end if
next
'Check search terms for stop words
if not isEmpty(searchTerms) then
for searchTermsIndex = 0 to UBound(searchTerms)
if instr(stopWords, " " & searchTerms(searchTermsIndex) & " ") then
searchTerms(searchTermsIndex) = ""
end if
next
end if
'If there has been a search, display search terms
if not isEmpty(searchTerms) then
response.write "<p class=""small""><b style=""color:#303090"">" & "Your search terms have been highlighted: </b>"
for searchTermsIndex = 0 to UBound(searchTerms)
response.write hiliteStart & server.htmlEncode(searchTerms(searchTermsIndex)) & hiliteEnd & " "
next
response.write " " & "<a href=""thispage.asp"">Remove highlighting</a></p>"
end if
'*** Set test articleText - remove after testing!
articleText = "<p>Here is my example article text about the 555 astable circuit " & "with some of it in <b>bold</b> and some of it<br>on the next line.</p>"
'Highlight any search terms by parsing HTML for text content
if not isEmpty(searchTerms) then
for searchTermsIndex = 0 to UBound(searchTerms)
if len(searchTerms(searchTermsIndex)) then
'Parse for each search term
textStart = 0
textEnd = 0
do while textEnd <= len(articleText)
'Extract a piece of text
textStart = instr(textEnd + 1, articleText, ">") + 1
textEnd = instr(textStart, articleText, "<")
if textEnd = 0 then
textEnd = len(articleText) + 1
end if
'Replace occurrances of the searchTerm in the extracted text
if len(mid(articleText, textStart, textEnd - textStart)) then
searchStart = 0
do
searchStart = instr(searchStart + 1, mid(articleText, textStart, textEnd - textStart), searchTerms(searchTermsIndex), vbTextCompare)
if searchStart then
articleText = left(articleText, textStart + searchStart - 2) & hiliteStart & mid(articleText, textStart + searchStart - 1, len(searchTerms(searchTermsIndex))) & hiliteEnd & mid(articleText, textStart + searchStart + len(searchTerms(searchTermsIndex)) - 1)
searchStart = searchStart + len(searchTerms(searchTermsIndex) & hiliteStart & hiliteEnd)
textEnd = textEnd + len(hiliteStart & hiliteEnd)
end if
loop while searchStart
end if
loop
end if
next
end if
response.write articleText
Function URLDecode(strString)
strString = Replace(strString, "%2F", "/")
strString = Replace(strString, "%7C", "|")
strString = Replace(strString, "%3F", "?")
strString = Replace(strString, "%21", "!")
strString = Replace(strString, "%40", "@")
strString = Replace(strString, "%5C", "\")
strString = Replace(strString, "%23", "#")
strString = Replace(strString, "%24", "$")
strString = Replace(strString, "%5E", "^")
strString = Replace(strString, "%26", "&")
strString = Replace(strString, "%25", "%")
strString = Replace(strString, "%2A", "*")
strString = Replace(strString, "%28", "(")
strString = Replace(strString, "%29", ")")
strString = Replace(strString, "%7D", "}")
strString = Replace(strString, "%3A", ":")
strString = Replace(strString, "%2C", ",")
strString = Replace(strString, "%7B", "{")
strString = Replace(strString, "%2B", "+")
strString = Replace(strString, "%2E", ".")
strString = Replace(strString, "%2D", "-")
strString = Replace(strString, "%7E", "~")
strString = Replace(strString, "%2D", "-")
strString = Replace(strString, "%5B", "[")
strString = Replace(strString, "%5F", "_")
strString = Replace(strString, "%5D", "]")
strString = Replace(strString, "%60", "`")
strString = Replace(strString, "%3D", "=")
strString = Replace(strString, "%27", "'")
strString = Replace(strString, "+", " ")
strString = Replace(strString, "%22", Chr(34))
URLDecode = strString
End Function
%>