Searching the Web: New Domains for Inquiry

Bertram Bruce
University of Illinois at Urbana-Champaign
United States




This column is reprinted from the Technology department of the Journal of Adolescent & Adult Literacy (JAAL). It contains the following sections:




Author's Message

Many teachers today recognize the importance of online data sources for all kinds of research and writing projects. Some now permit students to include online sources in their work, and others go so far as to require the use of online sources.

There is a cornucopia of resources online. Reference tools include encyclopedias, dictionaries, and collections of quotes; libraries of poetry, short stories, images, and music; critical studies and research articles on every conceivable topic; information about authors and historical figures; government and public policy data; current events; and much, much more. Most teachers quickly see the problems that arise from such bounty. Issues of plagiarism, pornography, commercialism, and simple time wasting soon rear up regardless of the topic. When the cornucopia spills out 100,000 Web sites of dubious quality and relevance, it seems much less bounteous.

This month's Technology column addresses why it is important to think more critically about Web searching. Questions of quantity become important. As the Web grows rapidly, unpredictably, unevenly, and without the familiar monitors provided by textbook companies or district curriculum guides, how should we think about its use? For a start, how do the size of the Web and the quality of material on it affect searching? Given these issues, what are some good approaches to search the Web effectively? What tools are available and how can they be used?

These questions point to even more fundamental issues. Perhaps we need to move from a conception of searching the Web to find a piece of information to one in which a search is embedded in how we think. This leads to perhaps the most important question: How can searching become not only “looking up,” but truly productive inquiring?

Back to top
Back to listing of all JAAL Technology columns available in Reading Online


Issue of the Month:
Searching Is the Journey, Not Just the Arrival

When students search the Web, it often seems that the problems are greater than the rewards. We seek ways to control those searches to avoid objectionable materials, plagiarism, or aimlessness, but in the process we may miss what is most valuable about the Web. Let us consider some questions about searches, which may help resolve this quandary.

Why is it important to think critically about Web searching?

We tend to think first of Web searching as a simple process of looking up some item of information. For certain purposes, that conception is quite appropriate. For example, if I want to find general information about William Shakespeare, I type Shakespeare into my search engine. I get 660,000 Web pages back, but among the top 10 is the Folger Shakespeare Library in Washington, D.C. (administered by the Trustees of Amherst College), which has all sorts of interesting information, including lesson plans for teaching Shakespeare.

But, suppose I want to enter into the critical debates about Shakespearean authorship. Among the top 10 is the home page of the Shakespeare Oxford Society. This group claims to be

I am inclined to believe their claim and am intrigued to examine their arguments. As I explore their Web site I am impressed by the care shown with their presentation, the detail of their documents, the source citations, and the opportunities for feedback. But, as a novice in this area I cannot be certain that this is the most credible starting point for my inquiry. Is this enterprise considered to be a fringe group? Are there more credible sources, perhaps even espousing the same argument?

Despite these concerns, I am relieved in some ways. Although the organization offers books and videos for sale through the site, these commercial aspects appear supportive of the generally academic mission. I don't see here troubling signs of racism or pornography that permeate the Web. My usual worry that the site may be superseded by a more recent one is allayed by the fact that their latest update is the date of my visit. Thus, although I do not know the authors or much about their domain of study, I find the site to be worth further investigation. If I can believe what I read there, I've found a timely resource with all sorts of useful information and links for further study.

What I have discovered here is a potentially useful source, but although I have spent some time examining it I still have doubts about how to interpret what I read there. When I return to the list of 660,000 documents that the search engine provided, I feel a bit overwhelmed. Will I have to spend this much time on every document and still not know what to make of it all? Will my students cope with this any better than I do?

I have discovered something else. For certain kinds of queries, my search is far from a simple “look-up.” Instead, it appears to be part of the general process of inquiry, which is tentative and fallible. There is no absolute starting point nor any sure way to reach the end, assuming such a point exists. I need to muster all my resources for critical thinking to navigate the Web, but I may reap enormous benefits in the process.

How does the size of the Web affect searching?

The enormous size of the Web (see Data View) is a mixed blessing. Hundreds of millions of pages hold forth the promise of having the text or images we seek, but the sheer volume of material gets in its own way. I recently searched for the U.S. Department of Commerce's report “Falling Through the Net,” which is about the racial and income inequities in access to new information and communication technologies. My search engine offered up articles about World Cup soccer and the performance of an accomplished goalie defending her team's net. At other times I have found obsolete versions of material that exists elsewhere on the Web, but is unknown to the search engines. Very often I find it difficult to get past the many commercial sites that have engineered their Web pages to appear first no matter how I specify a search query.

Given the number of Web pages, it is surprising that one can find anything at all, much less do so in a matter of seconds. Improved search engines make that possible, especially when the user understands how the search engines work and puts some effort into selecting a good set of keywords.

How does the quality of material on the Web affect searching?

I heard a teacher say recently that she discourages students from using the Web for research because the quality of material there is so poor. Although I would not abandon the Web because of its negative features, I can certainly sympathize. In fact, I can imagine that she might provide a list like the one below to support her point.

What tools are available for searching and how can they be used?

Every technology arises out of the problems of previous technologies. This is a cycle we see operating with Web searches: The Web solves the problem of managing diverse, distributed sets of documents. That solution in turn makes it possible to post documents easily for all to read. This leads to a profusion of documents, many of which are poorly written and irrelevant for particular purposes. Search engines and search directories arise to solve the problem of managing the enormous quantity of material. Document designers then manipulate the pages that the search engines see so that their documents rank highest. Filters are developed to screen out unwanted material. Documents are designed to defeat the filters, and so on. A sampling of these tools are described in the Glossary.

What are some good approaches to searching effectively?

Much work is now underway to build better search engines, search directories, filters, jump sites, portals, and other technologies to enable more productive use of the Web. But what can an individual do to improve the experience of using the Web?

There are Web sites (of course!) devoted to this question. For example, Terry Gray has a useful review of some of the top search engines at daphne.palomar.edu/TGSEARCH/ and provides search tips specific to each engine. The Community Learning Network has information about many search engines and subject directories and a good set of FAQs about searching at www.cln.org/searching_home.html. Instead of going into great detail, I'll highlight three basic principles: (a) Understand how the Web and searches work, (b) select appropriate tools, and (c) use those tools effectively.

On the first point, it must be said that no one fully understands the Web, and even if a few did they would find their knowledge quickly dated as the technology and Web content evolved. Nevertheless, it helps to know some basic facts about how the Web functions and how search tools can help navigation.

For example, search engines do not go out and look at every Web page to answer a query. Many pages are hidden from the search engines behind organizational firewalls. Moreover, it would take far too long to examine every page as each query arises. Instead, the search engine builds a search index that enables fairly rapid searches. A consequence of this is that the user is not searching the Web, but the index, and is thus dependent on the quality of the index, its organizational scheme, and how recently it has been updated. Among other things, that means recent additions to the Web may not appear as the result of a search. A recent study (Lawrence & Giles, 1999; online document) found that the best of the search engines finds only 16% of the relevant Web pages, not counting those behind firewalls. These issues need to be understood when interpreting the results of a search.

The second point is that the choice of search engine or search directory is a major factor in how effective a search may be. For example, to find information about a book it may be more effective to search the database of an online bookseller than to search the entire Web. But, if the book is out of print it will not help to search the site of a bookseller offering only current titles.

Sites such as SearchIQ provide some information about the relative performance of different search tools, but there is no substitute for trying out different tools with the types of questions under investigation and then looking critically at the types of search results produced. It is also important to understand how a particular tool works and what assumptions it makes. A tool that aims to bring up frequently accessed sites may be appropriate if you plan to shop online and want to find popular commercial sites, but it is less appropriate if you want novel perspectives on understanding some issue of international relations.

Many people recommend metasearch engines such as Cyber411 at www.cyber411.com, which combines the result of 16 search engines. But, the larger number of hits may not offset the extra time that each search requires and the redundancy. This is particularly so because often if the desired sites do not appear in the first 10 items they might as well not appear at all.

The third point is to develop means for using these tools effectively. Each search engine has its own syntax for specifying Boolean expressions. Usually, a phrase in quotes means to find that phrase exactly as written. Thus, typing “best search engine” to AltaVista yields nearly 8,000 sites containing that phrase in quotes. Typing the three words best search engine without the quotes yields the same result, but typing search engine best produces 4.4 million Web pages, the intersection of the 1.5 million Web pages containing the term search engine and the 17.5 million containing the word best.

It is difficult to lay out general rules for doing searches because the approach depends on the problem being investigated. Perhaps one good general rule is this: If a search produces many irrelevant documents it is important to understand why that happened and not simply to decry the bloated Web world.

How can searching become not only looking up, but truly productive inquiring?

There are two problems with conceiving of Web searches as simply the looking up of information. The first is that we are often frustrated. The answers may be out there, but if we search inappropriately we get useless data. Most interesting questions require some effort ahead of time to be formulated well. It is worth giving a try to sites such as AskJeeves at www.askjeeves.com, but more often you will need to rethink the question in order to find the answers you seek.

The second problem is that the view of Web searching as simply finding information limits the key to its importance for education or other life activities. The joy and true value of the Web lie in the way it can open up our questions. We ask one thing, but the Web leads us to ask more questions and to become aware of how much we do not know. This suggests an alternative to the common practice of asking students to cite one library source and one online source for an essay. We could turn the Web's unruliness into a virtue. Instead, we might say, “Use the Web to find the answer to such-and-such question. Now, report on three things you learned that you had never imagined before you did that search.”

The Web search engines are very important and useful resources, and they are playing a major role in the information age. However, they currently lack comprehensiveness and timeliness. The current state of search engines can be compared to a phone book that is updated irregularly, and has most of the pages ripped out (Lawrence & Giles, 1999; online document).

Back to top
Back to listing of all JAAL Technology columns available in Reading Online


Data View: How Big Is the Web?

Like many simple questions, this one turns out to be more complicated than it might at first appear. A good activity for students would be for them to define what they mean by the terms big and Web and then to search the Web for data or analyses to help them answer that question. Different approaches could lead to varying results, which in turn might call for justification of their strategies and critical thinking (see Murphy, 1998; online document, registration required for access). There are a number of things we might count.

Win Treese has an Internet index newsletter at http://new-website.openmarket.com/intindex/index.cfm that regularly reports interesting items about the size and growth of the Web in the manner of Harper's Index. You can visit the site or become a subscriber to the index.

Comprehensive information about search engines, specialized search engines, and metasearch engines, as well as general information about searching and tips for searching are available at SearchIQ. This site provides independent reviews and rankings to inform the selection of search tools. Their reviews employ criteria such as overall relevancy of listings and organization by relevancy, ability to find sites for both broad and specific topics, comprehensiveness, lack of redundancy, logical grouping of listings, and speed.

Although the “IQ scores” that SearchIQ assigns are a novel feature of the site, I recommend looking more at their descriptions of the search engines and how they work. A low-scoring engine could easily be better for some purposes than the top-ranked engine.

Back to top
Back to listing of all JAAL Technology columns available in Reading Online


Glossary

Authority -- a Web site that is linked from many other pages (see Hub).

Boolean expression -- an expression that evaluates to true or false; for example, used in a Web search, the expression travel and France is true for every Web page that contains both travel and France. Expressions that contain logical operators such as and, or, and not are Boolean, but all Web searches implicitly involve Boolean expressions.
Back

Case sensitive -- the property of paying attention to upper- and lower-case letters; each search engine has its own policy about this (e.g., is White House the same as white house?).

Domain name -- a name that identifies an IP address(es). For example, the domain name www.ed.gov represents a numerical address signifying a location in cyberspace. A domain name is the first part of the URL used to identify a Web page.
Back

Filter -- a program that takes a list of documents and removes those that meet certain prespecified criteria; family filters are used to remove objectionable Web material, other filters are used to focus a search to retrieve the most relevant items, and any filter will occasionally let through unwanted items and screen out desirable ones.
Back

Firewall -- a system that creates a partition between a private network and the larger Internet; it may restrict access both to and from the Internet.
Back

Host -- a term used to refer to any single machine on the Internet, but a single machine can act like multiple systems, each with its own domain name and IP address, and so the definition now typically includes virtual hosts as well.
Back

Hub -- a Web site with many links to other sites (see Authority).

Metasearch engine -- a computer program, such as Dogpile, that collects the results from several search engines at once. This is especially valuable because no search engine indexes more than one sixth of the Web.
Back

Ranking function -- a means used by a search engine to order documents found in a search in terms of potential relevance, quality, or other criteria.

Search directory -- a database that organizes documents according to categories and, usually, subcategories; it provides an alternative to general searching for finding particular items.
Back

Search engine -- a computer program that returns a list of the documents that satisfy a Boolean search expression; it's usually used to refer to programs that search for Web documents.
Back

Search index -- a large database of document locations based on the words contained in each document; the index facilitates efficient, meaningful searches and is created by a program within the search engine.
Back

Specialty search engine -- a search engine that searches a limited database of documents, such as the telephone white pages; such an engine can be made more efficient for limited purposes and is more likely to return only the sorts of data that a user would want.

Spiders (search robots) -- a computer program sent out by a search engine to find as many documents on the Web as it can.

Terabyte -- a trillion bytes of information, enough to represent a trillion characters; about 100 fairly large personal computer hard drives would be needed to hold this much information.
Back

Webopedia -- a good online glossary of computer terms.

Back to top
Back to listing of all JAAL Technology columns available in Reading Online


References

Guernsey, L. (1999, July 8). Seek -- But on the Web, you might not find. New York Times on the Web. Available: www.nytimes.com
Back
Back (2nd citation)

Lawrence, S., & Giles, C.L. (1999). How big is the Web? How much of the Web do the search engines index? How up to date are the search engines? Available: http://www.neci.nec.com/~lawrence/websize.html
Back
Back (2nd citation)
Back (3rd citation)

Members of the Clever Project. (1999). Hypersearching the Web. Scientific American, 280(6), 54-60. Available: www.sciam.com/1999/0699issue/0699raghavan.html
Back

Murphy, J. (1998, July 5). It's not the size that counts, but how you measure it. New York Times on the Web. Available: www.nytimes.com
Back

Back to top
Back to listing of all JAAL Technology columns available in Reading Online



Reading Online, www.readingonline.org
Posted November 1999
Published simultaneously in the Journal of Adolescent & Adult Literacy
© 1999-2000 International Reading Association, Inc. ISSN 1096-1232