JILT 1996 (3) - Wiebina Heesterman
A Comparative Review of
Information Retrieval Software
Wiebina Heesterman
University of Warwick
W.Heesterman@warwick.ac.uk
Contents
- 1. Introduction
- 2. Suitable for the purpose
- 3. FolioViews
- 3.1 Manuals provided
- 3.2 Creating a database
- 3.3 Retrieving data
- 3.4 Output
- 3.5 Strong points of FolioViews
- 3.6 Weak points in FolioViews
- 4. AskSam Professional
- 4.1 Manuals provided
- 4.2 Creating a database
- 4.3 Retrieving data
- 4.4 Output
- 4.5 Strong points of AskSam
- 4.6 Weak points in AskSam
- 5. ISYS 4.02
- 5.1 Manuals provided
- 5.2 Creating a database
- 5.3 Retrieving data
- 5.4 Output
- 5.5 Strong points of Isys
- 5.6 Weak points in Isys
- 6. Conclusions
- Download
Note: In this review you may click on any illustration for a full size screen shot
Date of publication: 30 September 1996.
Citation: Heesterman W (1996) 'A Comparative Review of Information Retrieval software', Applications, 1996 (3) The Journal of Information, Law and Technology (JILT). <http://elj.warwick.ac.uk/elj/jilt/sw/3wiebina/>. New citation as at 1/1/04: <http://www2.warwick.ac.uk/fac/soc/law/elj/jilt/1996_3/heesterman/>
1. Introduction
Lawyers may need to organise and store large amounts of data, both structured and unstructured: cases, survey data, bibliographic data, letters, email, journal articles and so on. Information management and retrieval software can lend a helping hand in these tasks. Three years ago the Law Technology Journal carried a review of information retrieval software. Three years is a long time in the world of software development. Some of these packages have changed beyond recognition in the meanwhile. We feel it is time to take a fresh look at this type of software.
Four packages that can do the job are: Folio Views, AskSam, Isys and Idealist. As Blackwell Scientific wasn't prepared to submit a full version of the Idealist software for the review, we felt that comparing a curtailed demonstration version with fully-fledged professional competitors would not do the software justice. It has therefore not been included in the review.
When evaluating software, the following criteria are important:
- How suitable is the software for its purpose?
- How easy is it to use, both when entering data and in retrieval?
- On what platforms does it run, is it multi-user software suitable for use on a network, and can it be used on entry-level machines as well as high-powered equipment?
- How good is the documentation? Or perhaps even more important: is the software sufficiently intuitive to let users find some useful data or do they first have to plough through long reams of documentation?
Testing of AskSam Professional and FolioViews 3.1 took place on a 486 DX-100 with 20Mb of memory under Windows 95. Isys 4.02 was tested under Windows 3.1 on a 486 DX 33 notebook with 8 Mb memory. All packages will run on a 386 machine with 4 Mb of memory, but are quite slow with this configuration. Structured data consisting of a set of some 50 cases with a number of existing fields were used for testing as well as unstructured data in the form of a medley of journal articles, email letters and short notes.
2. Suitable for the purpose
The first test is whether long multi-page documents such as court cases can be imported without losing the final paragraphs. All three packages coped with importing a 100-page long case. All three support non-fixed fields which may vary in length. Search capabilities are adequate in all three. They can all be used under Windows 95, even though none of them is adapted specifically for 32-bit operating systems.
3. FolioViews
The software is above all aimed at the corporate market, with multi-user databases which can be manipulated by end-users by creating 'shadow' files, i.e. their own personal views and comments reflecting the main database. Comments appear as hypertext links and popup notes.
3.1 Manuals provided
I) A slim 'getting started' volume, that introduces users to various ways of creating bookmarks, popup notes, highlighted sections as well as basic Boolean searches in FolioViews 3.1, the module that is generally made available to end-users. ii) an extremely thorough and hefty manual to using the DOS utilities, which are not represented by icons under Windows and which would generally be reserved for the use of systems managers. Examples are 'Create' used to import existing files in Views databases, or the 'Chop' utility to compress and split a database in chunks to fit on floppy disks for transfer to other machines. The manual contains extensive instructions on everything that needs to be considered before one even begins creating a database. Importing a large number of documents is in the end quite simple, although the multiplicity of instructions makes it seem an arduous task. The same applies to merging two or more databases: they first need to be turned into flat files before they can be reassembled as one database. The manual seems daunting by its insistence on planning, testing and re-testing in order to create a database. The user manual is only available as an on-line information database, which requires users to use the query language to learn how to retrieve data most effectively.
3.2 Creating a database
As well as typing in data, word processed documents such as cases, can be imported at the cursor position by using 'File' - 'Open' from the main menu.. This is slow and inefficient, and users are encouraged to turn word-processed files first into 'flat' files (a 'tagged' file format) and then use the DOS 'Create' utility. There is no icon for this task in the Views31 group, so requires reading the manual first.
Long cases can be imported easily, but marking sections as 'fields' is a time-consuming affair. I found no other way of doing so than by highlighting a section and then giving it a previously defined field name. Putting the cursor in a paragraph and then clicking on an entry in the list of fieldnames did not do the trick. According to the utilities manual, it is possible to facilitate information retrieval by creating a thesaurus as a list of equivalent terms. This involved first the creation of a 'script' file - in DOS - containing equivalent terms, such as:
--------
inheritance: legacy,bequest
death: decease
crime: offence, felony, misdemeanor
freedom: liberty
woman: female
etc.-----
Then the DOS utility 'editlex' has to be run with the following syntax: editlex <main dictionary> <user dictionary> <script file>. These DOS utilities are rather off-putting for users versed in Windows, though not too difficult if one persists.
In addition to more conventional formatting and editing facilities, FolioViews supports different kinds of hypertext, popup note, highlighter and annotation tools. Sections marked by any of these tools are treated as indexed segments that can be searched.
3.3 Retrieving data
When activating the 'query' module a form appears, in which one enters a query of two or more words connected by the usual Boolean 'and','or' or 'not'. To search for a phrase, terms need to be surrounded by quotes otherwise the connector 'and' is assumed. To search for terms occurring in one of the indexes or fields, the query needs to begin with the '[' character. (See illustration).
Proximity searching takes two forms, "ordered" where term2 follows term1 within a certain number of words (e.g. "detention prison"@25), and "unordered" where term1 and term2 are within a number of words from each other (in the form: "detention prison"/25).
'Wild card' searching is represented by '?' for one character, '*' for multiple ones. 'Stem searching' is another kind of wildcard search. Here the query might be 'run%' which would also retrieve other forms of the verb such as 'ran' or 'running'.
Thesaurus searching (in the form of woman$) was no success. Searching was incredibly slow, and the final result was, to say the least, surprising: in addition to the words 'woman' and 'female' specified in the user thesaurus list above, the search also retrieved 'broad' and 'virgin' neither of which featured in the user thesaurus. Presumably FolioViews comes with a thesaurus which I so far haven't been able to change.
3.4 Output
While typing the search terms in the query box, an index of all words present in the database is displayed together with the number of occurrences of the searchterms and their combinations (see illustration right). The number of hits is displayed in the status bar at the bottom of the screen. If any documents are found, there is a choice between 1) the list of contents with the numbers of hits displayed next to the headings with hits' and 2)'words around hits'. To view the searchterms in their context, clicking on a heading marked as containing the search terms, will display the beginning of the document. Clicking 'next hit' or 'previous hit' takes the user to the paragraph containing the search terms.
3.5 Strong points of FolioViews
Hypertext tools are particularly well developed (see illustration left). The 'shadow' file is a good idea for situations where different users want to add personal comments to shared data. Searching facilities are good and I like the way the word index shows right away the result of adding a query term. The facility to spread information databases over multiple floppy disks is still quite useful, although large amounts of data tend nowadays to be distributed on CD-ROM.
3.6 Weak points in FolioViews
- FolioViews makes use of DOS utilities that require attentive reading of the Utilities manual.
- Using the thesaurus is slow, complicated and in the end unsatisfactory.
- The absence of a hard copy user manual.
- The lack of menu-based searching.
4. AskSam Professional
The software is aimed at single users, who are provided with suitable tools to process and store any amount of email and html documents. Provision of a special 'Internet' module helps to make this a painless affair. An 'Office' module to facilitate processing letters, memos, subject and date lists for short notes is also supplied. The application has a useful range of formatting tools and utilities such as a spell checker and a mailmerge option. The package can be used as a multi-user application and there is a network version available though the User Guide's index doesn't mention the word 'network' at all.
4.1 Manuals provided
AskSam Professional comes with 1) a slim "Getting Started Guide", a tutorial to searching, creating databases and creating hypertext links and 2) a full user guide. On-line help is also present.
4.2 Creating a database
First of all users can create databases by entering data from the keyboard as if AskSam were a word-processing package. The data are indexed straight away. One can also begin by creating an input form with fields which by default are less than a line in length, unless defined explicitly as short or as multi-line fields. A number of pre-defined entry forms, for faxes, to-do notes etc. have been supplied with the Office module. Users have a spell checker and text formatting tools available. Documents containing fieldnames, such as data downloaded from on-line sources, may be imported in a new database with the existing fieldnames when ending in a specified field delimit character such as a colon. Data can also be imported in a database with a data entry form. The package has an option to delete duplicate records.
4.3 Retrieving data
Retrieval from the 'Actions' menu gives users a choice of different types of searches such as: Boolean, proximity, multiple fields, numeric search etc., while activating one of these options generates an input form that is generally menu-based. Phrases need to be surrounded by square brackets. (see illustrations).
Users can select terms from a 'picklist' which is either global, or refers to a particular database. There is an option to count the number of occurrences of a word or phrase.
AskSam supports fuzzy and / or case-sensitive searching, (see right)
4.4 Output
The retrieved documents are shown sequentially rather than as a list of documents with hits. It is, however, possible see a list of occurrences of the retrieved word or phrase in context with the choice of: next word, the rest of the line or the complete sentence. Once documents have been retrieved, there are utilities to create mail merge documents using retrieved data; auto-dial and report tools to create report or labels.
4.5 Strong points of AskSam
The software can be used in the same way as conventional word-processing software, although it also acts as a database package with input forms. It offers special Internet and Office facilities. Picklists can either be global or refer to a specific database.
4.6 Weak points in AskSam
- A number of common options, such as 'searching multi-line fields' need to be set at every search for a query to work correctly.
- The retrieved set is not displayed as a list of documents.
5. ISYS 4.02
The package addresses the needs of single users as well as being suitable for use as a corporate database. It consists of two modules, 'Utilities' and 'Data retrieval'. This would allow a systems administrator to restrict users' access to the Query module. There are some inconsistencies, however, as items I had expected to be part of the administration module, were to be found in the Query module, and vice versa. For instance 'Creating synonym rings' is part of the Query module, while the Word Frequency utility is found in the Administration module. Viewing the 'common word list', a list of words which are normally not retrieved, is to be found in the 'Administration and Utilities' module, under 'the Utilities' option, while editing the common word list, is located under the 'Characters and Word's in the 'Options' menu of the 'Administration and Utilities' module.
5.1 Manuals provided
Two ringbound thinnish guides, I) Administration & Utilities, and ii) Query User's Manual are provided, corresponding to the two modules constituting the package. They are easy to follow.
5.2 Creating a database
In contrast to the other packages information is not typed directly into a information database. It works by indexing existing data, in a range of word-processing and database formats, located in specified directories. User can if they so wish index complete directories or even their entire hard disk to keep track of their data. If the original documents are changed or new files are to be added, the 'Update' option in the Utilities module has to be used, either to 'Re-index' the whole database or to 'Add' documents in specified directories. This is quite fast.
5.3 Retrieving data
Isys 'Tools' sub-menu lets users make a choice between different types of query, 'Menu-based', 'Command-based' searching, i.e. using Boolean variables, 'Query-by-Concept' and 'Plain English Query'. The menu (see illustration left) makes it easy for users with minimal computing skills to combine various criteria, and do Boolean or proximity searches without being aware of the fact.
The option of "plain English query" also targeting novice computer users proved less than successful: the query "Look for evidence taken in judicial proceedings" retrieved 14 documents, all containing many instances of the words , proceed, process, proceedings, judiciary, judicial and evidence, spread all over a document, though the phrase "evidence taken at a judicial proceeding" only occurred in one document.
In all types of searches one click on a so-called 'word-wheel', which generates lists of
- all words in a database beginning with certain characters
- words sounding similar to a certain sequence of characters.
The results are reasonable: a query for "Smythe" found "Smith". Cooper found also Kuyper. The 'word wheel' utility only displays up to 9 characters of a word - because of memory restrictions on the 8 Mb machine?. (See illustration right)
Proximity searching: uses either // or of the w/n familiar to Lexis users. // allows users to specify a lower and upper limit of word proximity. E.g. 'ruling /50/ section 148' found a number of cases after I removed 's' from the common word list, and created a synonym ring containing the terms 'section', 's', 's.' and 'S'. I also found I had to remove the word 'right' from the common word list.
Isys doesn't differentiate between * and !: both wild card characters for any number of characters.
Hypertext links can be created between documents, for instance linking the 5 cases found by a query for 'bill of rights'. This involves minimising the retrieved documents, having all displayed on the screen using Windows 'tiling' and then creating the links, by selecting 'Edit' - Annotate - Hypertext links' from the Browse menu. The link icon now featuring in the first document is rather large, obscuring the word 'right'. One can then navigate through the chain of documents by clicking on the forward / back buttons.
Concept trees, constructed rather like an outline document, can be created where the lower levels as well as the top concept can be used for retrieval, for instance, a 'human rights' concept tree. (See illustration left). These are generally combined with other search terms, for instance the Concept "Human Rights" may be refined by a phrase such as "power w/2 (arrest or detain)".
Synonym rings can incorporate terms consisting of more than one word. The constituent parts are correctly dismissed as non-relevant. (See illustration right)
Searches can be stored to run at a later time when the contents of the database have changed.
5.4 Output
The retrieved set is displayed as a list, either of filenames or of the first line of each document. Clicking on one the preferred line opens the corresponding document. The display menu contains buttons to move to the next or previous hit in the same document or to the next or previous document fulfilling the search criteria.
In addition to creating annotations, hypertext links between documents, one can also launch the application in which the document indexed in Isys was originally created. Clicking on the WP icon, launched Word for Windows. ISYS Utilities - Isys comes with a macro for Word for Windows user.
5.5 Strong points of Isys
The range of search facilities is excellent. I particularly like the functionality of the synonym and concept searches.
5.6 Weak points in Isys
- Some confusion over the part of the program where changes need to be made to certain options, such as the common word list and the synonym list.
- The presence of the 'Plain English query module' raises unrealistic expectations.
- The absence of a single 'wild card' character.
Conclusions
The three packages address different sectors of the market. They are all competent, with Isys and AskSam more user-friendly than FolioViews, in particular on the management and utilities side. Isys has the most extensive range of retrieval tools, while AskSam provides facilities that make it easier to process today's type of information. Both FolioViews and AskSam allow users to make changes in databases. FolioViews, intended as a multi-user product, does so by providing end-users with tools to create personal 'views', while it is not clear how this could be possible in a network version of AskSam. Isys takes a different approach: no modifications or additions to files can be entered from the keyboard. Instead changes to data are made in the original documents which are subsequently re-indexed. All packages are windows-based, except FolioViews which is available for the MacIntosh as well as for Windows.
AskSam Professional 3.0 http://www.asksam.com/ |
Price | Single copy: £293.00 ($395) Academic version: n/a 5 user: £1,295.00 ($1,995) 10 user: £2,295.00 All prices are ex VAT |
FolioViews 3.1 Infobase Production Kit http://www.folio.com/ |
Price | Single copy: £795.00 Academic version: £245.00 Additional licences: Normal Version: 1-10 @ £175 ea 11-100 @ £140 ea Academic version 1-10 @ £70 ea 11-100 @ £60 ea All prices are ex VAT |
Isys 4.02 http://www.isysdev.com/isys4.html |
Price | Single copy: £395.00 Academic version: n/a 3 user: £990.00 5 user: £1,550.00 10 user: £2,875.00 All prices are ex VAT |