TY - JOUR
T1 - The retrieval of highly scattered facts and architectural images
T2 - Strategies for search and design
AU - Bhavnani, Suresh K.
N1 - Funding Information:
This research was supported in part by the National Science Foundation, Award #EIA-9812607. The views and conclusions contained in this document should not be interpreted as representing the official policies, either expressed or implied, of NSF or the U.S. Government. I thank M. Bates for guiding me towards the formal study of information distributions, and appreciate the diligent statistical analysis done for this project by F. Peck and R. Hu. I also thank A. Abernathy, D. Carter, C. Bichakjian, G. Furnas, T. Johnson, R. Little, R. Price, J. Nardine, F. Reif, V. Strecher, R. Thomas, and G. Vallabha, for their contributions to the data collection and analysis, and to the anonymous reviewers for their suggestions.
PY - 2005/12
Y1 - 2005/12
N2 - The development of huge sources of information in online domains like healthcare, e-commerce, and design, coupled with powerful search engines, suggests that finding comprehensive information about a topic is straightforward. However, recent studies show that while novices can easily find information for questions that have specific answers (e.g. What is a melanoma?), they have difficulty in finding answers for questions requiring a comprehensive understanding of a topic (e.g. What are the risk and prevention factors for melanoma?). This article argues that an important explanation for this difficulty is the phenomenon of information scatter: as the number of information sources about a specific topic increases, the information across the sources begins to follow a Zipf-like distribution, where a few sources have a large amount of information, and many sources have very little information. To illustrate the phenomenon of information scatter, this article presents examples from an ongoing study of how facts related to common healthcare topics are distributed across high-quality sources. These results are compared to results from a small study to explore how images of buildings designed by a well-known architect are distributed across high-quality image sources. The results from both studies suggest that the distributions of facts and images across relevant sources are Zipf-like, and pinpoint the kind of search knowledge needed to address such scatter. These results suggest the need for the development of systems and training that are "distribution conscious", to assist users in finding comprehensive information about topics across information domains.
AB - The development of huge sources of information in online domains like healthcare, e-commerce, and design, coupled with powerful search engines, suggests that finding comprehensive information about a topic is straightforward. However, recent studies show that while novices can easily find information for questions that have specific answers (e.g. What is a melanoma?), they have difficulty in finding answers for questions requiring a comprehensive understanding of a topic (e.g. What are the risk and prevention factors for melanoma?). This article argues that an important explanation for this difficulty is the phenomenon of information scatter: as the number of information sources about a specific topic increases, the information across the sources begins to follow a Zipf-like distribution, where a few sources have a large amount of information, and many sources have very little information. To illustrate the phenomenon of information scatter, this article presents examples from an ongoing study of how facts related to common healthcare topics are distributed across high-quality sources. These results are compared to results from a small study to explore how images of buildings designed by a well-known architect are distributed across high-quality image sources. The results from both studies suggest that the distributions of facts and images across relevant sources are Zipf-like, and pinpoint the kind of search knowledge needed to address such scatter. These results suggest the need for the development of systems and training that are "distribution conscious", to assist users in finding comprehensive information about topics across information domains.
UR - http://www.scopus.com/inward/record.url?scp=26444527945&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=26444527945&partnerID=8YFLogxK
U2 - 10.1016/j.autcon.2004.12.007
DO - 10.1016/j.autcon.2004.12.007
M3 - Article
AN - SCOPUS:26444527945
SN - 0926-5805
VL - 14
SP - 724
EP - 735
JO - Automation in Construction
JF - Automation in Construction
IS - 6
ER -