Atomizing of works
Mar. 16th, 2005 10:29 amOne of the purposes for which I use this blog is to put up important pieces of information, mostly extracted from other works, so that they can be found by Google and lead people to where that information may be found. I was appalled to read, in the American Libraries 'email forum' "Google at the Gate" (Mar 2005. 36:3, p. 40 et seq.) that Michael Gorman feels this kind of information provision was bad. He said, " Since scholarly books are, with few exceptions, intended to be read cumulatively and not consulted for snippets of information...If you cannot find what you want and if you are lucky enough to find something, it is a paragraph or two wrenched out of context; where is the advance in that? Also, no amount of "research on search engines" is going to overcome the fundamental fact that free-text searching is inherently inferior to controlled-vocabulary systems and will be so until we have computers with the capabilities of human brains..."
But the problem is that catalogers like Gorman have shirked their responsibilities. The average full length book is indexed (by non-MLS catalogers) by the Library of Congress with 3-5 subject headings which strive to be 'at the level of specificity' of the entire book. That means that a book on the history of beverages, which may include a few precious paragraphs on, say, the contents the herb mixture 'gruit' used in the medieval breweries, will be indexed in ways that make it impossible to tell that data is in there. Should the majority of the text talk about modern carbonated beverages, the indexing will make it seem useless if a user stumbles over it. Even if it does not, if the book is sufficiently complex, the tiny number of subject headings may well make the book unfindable. But to save time and money, subject headings are limited, and heads of Cataloging have copy catalogers take their cataloging records from LC.
Gorman seems to claim that in order to find that data, the user must be forced to read the entire book, cover to cover, even though most of the text may well deal with late 20th century beverages. We know ourselves that many scholarly books we find helpful are collections of essays. But most catalogers, especially the Library of Congress, don't include the titles and subjects of the chapters or essays in a book-- thus making that information unfindable in catalogs.
Serious researchers in the humanities and the social sciences, which Gorman appears not to be, find that you need to use the library catalog system to find an area of the library, or an author, covering your area of interest, and browse and skim through works in that area in the hope of finding material on your subject. Our library catalogs are, to our users, exactly what Gorman accuses Google of: "it is pathetic as an information-retrieval system-utterly lacking both recall and precision, the essential criteria for efficiency in such systems... supposed to have complex algorithms but still produces piles of rubbish for almost all searches."
And we have given up the control of our systems to companies which tell us that there is no way they can make even simple changes to our interfaces, such as adding a 'next page' button to the end of a page of references-- though changes to the pictures/icons and the colors of the text are available. To quote Gorman again, "You can put lipstick on a pig, but it's still a pig." Controlled-vocabulary searching can be much better than free text searching, but if controlled vocabulary searching isn't available to the level of the information, free text searching is essential to locate information.
Instead of bashing Google, Gorman should be out there campaigning for better controlled vocabulary searching. And I'm going to go back to putting up snippets of information and letting Google index them.
But the problem is that catalogers like Gorman have shirked their responsibilities. The average full length book is indexed (by non-MLS catalogers) by the Library of Congress with 3-5 subject headings which strive to be 'at the level of specificity' of the entire book. That means that a book on the history of beverages, which may include a few precious paragraphs on, say, the contents the herb mixture 'gruit' used in the medieval breweries, will be indexed in ways that make it impossible to tell that data is in there. Should the majority of the text talk about modern carbonated beverages, the indexing will make it seem useless if a user stumbles over it. Even if it does not, if the book is sufficiently complex, the tiny number of subject headings may well make the book unfindable. But to save time and money, subject headings are limited, and heads of Cataloging have copy catalogers take their cataloging records from LC.
Gorman seems to claim that in order to find that data, the user must be forced to read the entire book, cover to cover, even though most of the text may well deal with late 20th century beverages. We know ourselves that many scholarly books we find helpful are collections of essays. But most catalogers, especially the Library of Congress, don't include the titles and subjects of the chapters or essays in a book-- thus making that information unfindable in catalogs.
Serious researchers in the humanities and the social sciences, which Gorman appears not to be, find that you need to use the library catalog system to find an area of the library, or an author, covering your area of interest, and browse and skim through works in that area in the hope of finding material on your subject. Our library catalogs are, to our users, exactly what Gorman accuses Google of: "it is pathetic as an information-retrieval system-utterly lacking both recall and precision, the essential criteria for efficiency in such systems... supposed to have complex algorithms but still produces piles of rubbish for almost all searches."
And we have given up the control of our systems to companies which tell us that there is no way they can make even simple changes to our interfaces, such as adding a 'next page' button to the end of a page of references-- though changes to the pictures/icons and the colors of the text are available. To quote Gorman again, "You can put lipstick on a pig, but it's still a pig." Controlled-vocabulary searching can be much better than free text searching, but if controlled vocabulary searching isn't available to the level of the information, free text searching is essential to locate information.
Instead of bashing Google, Gorman should be out there campaigning for better controlled vocabulary searching. And I'm going to go back to putting up snippets of information and letting Google index them.