| Converted from Irvine by Ronny Kohavi | First attribute deleted in names file, data file (was already | deleted in test file). | All features changed to continuous because the features were | generated by distance from prototypes. Note that this is | an artificial domain. C4.5's accuracy goes up (10-fold) | from 73.13 to 80.63 after this change. | | 1. Title: Hayes-Roth & Hayes-Roth (1977) Database | | 2. Source Information: | (a) Creators: Barbara and Frederick Hayes-Roth | (b) Donor: David W. Aha (aha@ics.uci.edu) (714) 856-8779 | (c) Date: March, 1989 | | 3. Past Usage: | 1. Hayes-Roth, B., & Hayes-Roth, F. (1977). Concept learning and the | recognition and classification of exemplars. Journal of Verbal Learning | and Verbal Behavior, 16, 321-338. | -- Results: | -- Human subjects classification and recognition performance: | 1. decreases with distance from the prototype, | 2. is better on unseen prototypes than old instances, and | 3. improves with presentation frequency during learning. | 2. Anderson, J.R., & Kline, P.J. (1979). A learning system and its | psychological implications. In Proceedings of the Sixth International | Joint Conference on Artificial Intelligence (pp. 16-21). Tokyo, Japan: | Morgan Kaufmann. | -- Partitioned the results into 4 classes: | 1. prototypes | 2. near-prototypes with high presentation frequency during learning | 3. near-prototypes with low presentation frequency during learning | 4. instances that are far from protoypes | -- Described evidence that ACT's classification confidence and | recognition behaviors closely simulated human subjects' behaviors. | 3. Aha, D.W. (1989). Incremental learning of independent, overlapping, and | graded concept descriptions with an instance-based process framework. | Manuscript submitted for publication. | -- Used same partition as Anderson & Kline | -- Described evidence that Bloom's classification confidence behavior | is similar to the human subjects' behavior. Bloom fitted the data | more closely than did ACT. | | 4. Relevant Information: | This database contains 5 numeric-valued attributes. Only a subset of | 3 are used during testing (the latter 3). Furthermore, only 2 of the | 3 concepts are "used" during testing (i.e., those with the prototypes | 000 and 111). I've mapped all values to their zero-indexing equivalents. | | Some instances could be placed in either category 0 or 1. I've followed | the authors' suggestion, placing them in each category with equal | probability. | | I've replaced the actual values of the attributes (i.e., hobby has values | chess, sports and stamps) with numeric values. I think this is how | the authors' did this when testing the categorization models described | in the paper. I find this unfair. While the subjects were able to bring | background knowledge to bear on the attribute values and their | relationships, the algorithms were provided with no such knowledge. I'm | uncertain whether the 2 distractor attributes (name and hobby) are | presented to the authors' algorithms during testing. However, it is clear | that only the age, educational status, and marital status attributes are | given during the human subjects' transfer tests. | | 5. Number of Instances: 132 training instances, 28 test instances | | 6. Number of Attributes: 5 plus the class membership attribute. 3 concepts. | | 7. Attribute Information: | -- 1. name: distinct for each instance and represented numerically | -- 2. hobby: nominal values ranging between 1 and 3 | -- 3. age: nominal values ranging between 1 and 4 | -- 4. educational level: nominal values ranging between 1 and 4 | -- 5. marital status: nominal values ranging between 1 and 4 | -- 6. class: nominal value between 1 and 3 | | 9. Missing Attribute Values: none | | 10. Class Distribution: see below | | 11. Detailed description of the experiment: | 1. 3 categories (1, 2, and neither -- which I call 3) | -- some of the instances could be classified in either class 1 or 2, and | they have been evenly distributed between the two classes | 2. 5 Attributes | -- A. name (a randomly-generated number between 1 and 132) | -- B. hobby (a randomly-generated number between 1 and 3) | -- C. age (a number between 1 and 4) | -- D. education level (a number between 1 and 4) | -- E. marital status (a number between 1 and 4) | 3. Classification: | -- only attributes C-E are diagnostic; values for A and B are ignored | -- Class Neither: if a 4 occurs for any attribute C-E | -- Class 1: Otherwise, if (# of 1's)>(# of 2's) for attributes C-E | -- Class 2: Otherwise, if (# of 2's)>(# of 1's) for attributes C-E | -- Either 1 or 2: Otherwise, if (# of 2's)=(# of 1's) for attributes C-E | 4. Prototypes: | -- Class 1: 111 | -- Class 2: 222 | -- Class Either: 333 | -- Class Neither: 444 | 5. Number of training instances: 132 | -- Each instance presented 0, 1, or 10 times | -- None of the prototypes seen during training | -- 3 instances from each of categories 1, 2, and either are repeated | 10 times each | -- 3 additional instances from the Either category are shown during | learning | 5. Number of test instances: 28 | -- All 9 class 1 | -- All 9 class 2 | -- All 6 class Either | -- All 4 prototypes | -------------------- | -- 28 total | | Observations of interest: | 1. Relative classification confidence of | -- prototypes for classes 1 and 2 (2 instances) | (Anderson calls these Class 1 instances) | -- instances of class 1 with frequency 10 during training and | instances of class 2 with frequency 10 during training that | are 1 value away from their respective prototypes (6 instances) | (Anderson calls these Class 2 instances) | -- instances of class 1 with frequency 1 during training and | instances of class 2 with frequency 1 during training that | are 1 value away from their respective prototypes (6 instances) | (Anderson calls these Class 3 instances) | -- instances of class 1 with frequency 1 during training and | instances of class 2 with frequency 1 during training that | are 2 values away from their respective prototypes (6 instances) | (Anderson calls these Class 4 instances) | 2. Relative classification recognition of them also | | Some Expected results: | Both frequency and distance from prototype will effect the classification | accuracy of instances. Greater the frequency, higher the classification | confidence. Closer to prototype, higher the classification confidence. | | 1, 2, 3. Hobby: continuous. Age : continuous. Education: continuous. Marital status: continuous.