The Internet is a powerful tool for research, containing information about law, business, government, science, medicine and many other things. There is a Web site for just about anything anyone would want to know. Unfortunately, with the proliferation of Web sites today, any search may return hundreds of results, many of which are not relevant and many of which are not reliable.
This article will discuss some advanced search techniques in Google that can improve the relevance of search results. It will also examine ways to determine the reliability of a Web site.
Popular search engines like Google do not support the kind of precision search techniques found in familiar legal research tools like Westlaw and Lexis; however, they do provide some advanced searching tools that can improve the relevance of search results.
A basic search on Google returns sites that contain all of the search terms entered. These results are displayed in batches of 10 results, ranked in a way that should place the most relevant results at the top, based on the order and proximity of the search terms and the prominence and frequency of those terms in the result site. Google ranking also takes into account the number of other Web sites that link to a site, an indication of reliability. This basic result set is sufficient for many searches.
Google's advanced search feature, available by clicking Advanced Search next to the search box on Google's home page, allows fine-tuning of these results for more precision if the basic search returns too many irrelevant results.
The Find Results area at the top of the advanced search area allows simple definitions of the relation between words in the search. This is not as precise as Westlaw or Lexis' connectors, but allows simple "and," "or," "and not" and exact-phrase searching.
The Find Results area also allows setting the Google search to return 20, 30, 50 or even 100 results per page instead of the usual 10. This means less clicking through the result list if the search is likely to return many marginally relevant results.
Google can search for terms in a specific Web site. Most Web sites have search capabilities, but some sites do not, and some sites' built-in searches are not sufficient. The Domain feature on the advanced search page can limit search results to pages in a particular domain (Web site), effectively creating a Google search of the site. For example, the Delaware State Bar Association's Web site contains ethics opinions, but has no means to search them. Google's advanced search can search for the phrase "conflict of interest" and limit search results to the domain dsba.org. This search currently returns 40 results, including many Delaware ethics opinions and several magazine articles and other pages.
This site-search feature can also exclude results from a particular site. For example, when searching for objective information about a company, it might be useful to exclude information from the company's own Web site. This can be done by choosing the word "Don't" from the pull-down menu next to Domain, so that it says, "Don't return results from the site or domain."
There are several other useful tools available in Google's advanced search page, and it is worth experimenting with them. One other particularly useful trick, not found on the advanced search page but documented under Advanced Search Tips, is Google's "fill in the blanks" feature. If one or more asterisks are included in a search, Google will find pages that fill in the blanks, answering simple questions. For example, for the search "the prime minister of France is *," Google's first result in a recent search was an answer from a reliable source: François Fillon, found in the CIA's Web site. Other results will be Web pages that contain this phrase with the asterisk filled in.
Google now has several specialized databases that can provide better targeted results. Most of these databases can use advanced search techniques similar to the ones above to provide further search refinement.
From a legal perspective, one of the most useful of these databases is Google U.S. Government, which searches only federal and state government Web sites. The first result of a search for 1040 is the current 1040 form from the Internal Revenue Service's Web site; in the main Web search, on the other hand, the first result is a page owned by a company that makes tax software. Be aware that the Google U.S. Government search page has two search buttons: Search Government Sites and Search the Web. The Search Government Sites button must be clicked to restrict the search appropriately; clicking the Search the Web button or pressing the Enter key will return the same results as a normal Google search.
Google News searches current news from a wide variety of sources around the world. Be aware, however, that Google News sometimes places unreliable sources alongside respected sources like The New York Times and CNN. The source should be considered carefully when using this database.
For intellectual property practitioners, Google recently added a patent search for more than 7 million patent documents, including some very old ones that might otherwise be hard to find. This database is still considered to be "in beta" (under development), so there are still kinks to be worked out of it, but it looks very promising.
For the historically minded, Google Books searches an enormous and rapidly growing collection of scanned books. Most of the books in this collection are in the public domain -- quite old -- but can be useful for a historical perspective. Many of these books can be downloaded in PDF.
Google Groups collects Usenet newsgroup archives from 1981 to the present. These archives are merely message boards; however, messages can be useful to show what was known or how words were used at a certain time. For example, the trademark litigation between Microsoft and Lindows turned on whether the word "windows" was used generically for software before Microsoft Windows was released. Newsgroup messages from an earlier date could demonstrate such generic use.
There are many other specialized databases on Google, and others are frequently added. A list of databases can be found by clicking the More link at the top of the Google home page.
CONSIDER THE SOURCE
These Google techniques will find relevant Web sites, but relevant sites are not necessarily reliable sites. The site may look polished and professional, it may seem persuasive, but it could have been written by a prankster, a person with an ax to grind or simply a person who didn't have all the facts.
The best way to determine a Web site's reliability is to consider the source and to determine who is responsible for the information and who considers the information to be reliable.
Official government Web sites are easily identified. If the domain name (the first part of the Web address) ends in .gov, .mil or .state.xx.us (where xx is a two-letter postal code), then the site is owned by a federal or state government or agency, and the information posted there is effectively a government document.
Domain names that end in .edu can only be owned by bona fide educational institutions, but the information on the site could be written by the institution, a professor or a student. Information about the institution found on such sites can generally be considered reliable; other information is only as reliable as the person who posted it.
Other domain names, such as those ending in .com, .org or .net, can be registered by anyone and indicate nothing about the site's owner. Trademark law does not apply to domain registrations; anyone can register any name if the registrant has a good-faith reason. For example, the domain nissan.com is not owned by Nissan Motor Co. Ltd.; it is owned by a man named Uzi Nissan.
At one time, the mere existence of a Web site with a registered domain name indicated reliability. Domain registration and Web hosting were expensive, so the existence of a Web site with a registered domain showed that the site owner was serious enough to spend money. Today, it is possible to register a domain and obtain hosting for less than $10 a year, so registration alone indicates nothing about the site's reliability.
However, if the site has a registered domain name, it is possible to identify the owner, which should give some indication of the reliability of the site. For example, this Web page contains information about the Hague Convention on Service Abroad. It looks reliable, but who is responsible for it? To identify the owner, one must first identify the registrar, then look up the contact information through the registrar.
InterNIC provides a means to look up the registrar for most domains. InterNIC is the official Web site of the Internet Corp. for Assigned Names and Number (ICANN), the organization responsible for managing and coordinating domain names and accrediting the domain name registrars. On its WhoIs page, registrations can be looked up by domain names, in this case, http://www.hcch.net.
Sometimes, InterNIC's search gives complete contact information for the domain owner, completing the task. More commonly, however, it shows only the registrar, and one must go to the registrar's site for registrant contact information. For the domain hcch.net, InterNIC shows only the registrar. The referral URL provided by the InterNIC search is the registrar's site, where registrant contact information can be found. For the domain hcch.net, the registrar's site is www.networksolutions.com.
Every registrar has its own WhoIs search, usually linked from the home page of the registrar's site and usually labeled "WhoIs" (one word). On the registrar's site, WhoIs provides a means to look up contact information for the registrant (owner), technical contact and administrative contact for the domain. For hcch.net, the registrant is the Hague Conference on Private International Law, certainly a reliable source of information about Hague Conventions! This site can safely be considered reliable.
WHO LINKS TO THE DOMAIN?
Sometimes, the name of the owner alone is not sufficient to determine whether a site is reliable. If so, the next best way to assess reliability is to find out who links to the site.
Fortunately, Google's advanced search provides a means to search for sites that link to a particular site. Under Page-Specific Search, one of the options is Links, which will search for other Web sites that link to a specific page. It is best to search both with and without the "www," because this may return different results. The search should be performed for both the Web site and the specific page.
This result list should be checked carefully: Linking to a site does not necessarily constitute an endorsement. Web sites sometimes link to another site when criticizing it. The Web site providing the link may be unreliable. The ownership of the linked site may have changed since the link was created.
For example, several years ago the respected source www.FindLaw.com had a link for Icelandic law that actually went to a pornographic Web site. The original Icelandic law site's domain registration expired, and a pornographer purchased the domain name. The link remained on FindLaw for several days after the ownership change.
Note also that searching for links may not return all links to a particular site. For example, the currency conversion sites www.XE.com and www.OANDA.com are recommended by the Internal Revenue Service's Web site, but these links do not appear in a Google search for sites that link to them.
In sum, the Internet provides a wealth of information, but with so much information available, it can be difficult to find precisely the right information. Once the right information is found, it is difficult to be sure whether that information is reliable. Google's advanced searching techniques and targeted databases make it easier to find what one is looking for. Once the information is found, researching the ownership of the site and the sites that link to that site can indicate whether the site is reliable.
Tracey R. Rich is the library and information services administrator for Young Conaway Stargatt & Taylor in Wilmington, Del.