Cascot: Computer Assisted Structured COding Tool
Home | Details | Online version | Further Information | Purchase | Cascot International |
Cascot is designed to assign a code to a piece of text. In the case of the Standard Occupation Classification (SOC) this piece of text is typically a job title. For the Standard Industrial Classification (SIC) the text is a description of the main product or services provided by an employing establishment. The quality of coding performed by Cascot depends on the quality of the input text.
Ideally the text should contain sufficient information to distinguish it from alternative text descriptions which may be coded to other categories within the classification, but it should not contain superfluous words. This ideal will not always be met but Cascot has been designed to perform a complicated analysis of the words in the text, comparing them to the words in the classification, in order to provide a list of recommendations. If the input text is not sufficiently distinctive it may not be the topmost recommendation that is the correct code.
When Cascot assigns a code to a piece of text it also calculates a score from 1 to 100 which represents the degree of certainty that the given code is the correct one. When Cascot encounters a word or phrases that is descriptive of occupation or industry but lacks sufficient information to distinguish it from other categories (i.e. without any further qualifying terms) Cascot will attempt to suggest a code but the score is limited to below 40 to indicate the uncertainty associated with the suggestion. For example 'Teacher' or 'Engineer'.
For SOC specific information including examples of problematic input text please read these further details.
The performance of Cascot has been compared to a selection of high quality manually coded data. The overall results show that 80% of records receive a score greater than 40 and of these 80% are matched to manually coded data. When using Cascot you can expect this level of performance with similar data, but be aware that the performance is dependent on the quality of your input data.