Version History: Version 4.1.9 [17/9/2006] - Updated code to compile under Gnu GCC 4.0.2: - adding forward declarations and typename keywords to compensate for changes in resolution time of templated classes and functions - changed references to deprecated *.h libraries to current versions Version 4.1.8 [18/3/2004] - put minimum of 2 on matching expressions/contexts required for secondary contexts/expressions or rules involving types lacking characteristic expressions/contexts. This removes the need for sesp_for_no_characteristics * minimum_contexts_per_type > 100% scsp_for_no_characteristics * minimum_expressions_per_type > 100% rsp_for_no_characteristics * minimum_expressions_per_type > 100% and values set by the 'set support' command have been adjusted accordingly (unnumbered) [14/8/2003] - debugged error in save_load file_var routine, where the binary storage of integers was architecture-dependent. Now integers are always stored least-significant-byte first. - removed extraneous spaces in rule-output in morpheme mode 4.1.7.1 [19/6/2003] - debugged error in 'clear database' routine, where the whole-sentence type was not properly initialized - debugged error in reloading large savefiles, where the array expression::esize was assigned values but not memory space. - debugged minor configuration problems, moving emconfig.h to config.h, and adding #include to show.cc - debugged compilation errors caused by respecifying default parameter values in definitions of declared-but-not-yet-defined functions 4.1.7 [2/8/2002] - Used automake and autoconf to create proper installation and configuration files (optimization level is now a configuration option, as it should be) - Reworked sources and configuration checks to increase portability. The configuration script now compensates for different versions of the readline library, and the code compiles under g++-3.0. - Solved bug where word::table was erroneously locked (after a parse or show-expression command, the program would not learn new words in the first line of a sample) - Reworked generate command to avoid infinite loops - Added 'exit nosave' command to exit without saving or prompt. - Added debug command 'delete', and multiplicities to insert/delete - When optimizing the rule_grammar, calculate size in words rather than number of rules - Replaced shortcircuit_rules setting by check whether this would fit (given current values of ruleset_increase_disallowed) - Changed compression rate calculation so that it uses the actual replaced context/expression pairs instead of the context/expression pairs of the entire induced submatrix - Discarded intention to catch break signal, so that user interrupts return you to the user interface rather than abort the entire program; apparently there is no way to treat signals as exceptions, and recreating the entire exception handling mechanism to handle signals is too much work. Note: if someone knows how to do this, please tell me. - Discarded intention to use editline instead of readline; editline does not appear to offer the option of configuring tab completion to handle commands as well as files. 4.1.6.1 [11/3/2002] - Fixed bug where specifying the '--database' option additionally implied the '--quiet' option - Fixed bug where save files with settings and empty data caused corrupt data structures - Fixed bug with save files in morpheme mode - Fixed bug where words read from files in morpheme mode were prefixed by whitespace - Added rightside map to expression data representation in morpheme mode to optimize calculations - Fixed bug where the xx_for_no_characteristics settings were not saved - Enhanced file save format recognition module to succesfully implement downward-upward file compatibility 4.1.6 [27/6/2001] - manual written - Add support for negative samples (i.e. sentences which should not be contructible), and an 'unlearn' command. Also add a faculty for determining the most uncertain derived sentence and querying an oracle about its correctness. - Added settings for optimizing for structure in parsings, as opposed to optimizing for compactness of the ruleset. - Add quotes, escapes, multi-line commands and multi-command lines to command line interpreter. As a side-effect, it is no longer necessary that redirection symbols are preceded by space and not followed by space: now, it is sufficient if they are not escaped or enclosed in quotes. - Add advanced command line editing, history buffer and tab-completion to command line interpreter, using the Gnu Readline/History libraries - Reimplemented command recognition using regular expression pattern matching. - Use Gnu getopt to get startup command line options and arguments - Used template functions in a number of places to combine similar blocks of code into one block, for improved maintainability. - Redesigned the save-load module for (hopefully) the last time, making it easily extendible, and using template functions so that the same block of code does both reading and writing. - Made source code modules separately compilable (for later linking) in order to speed development. Reorganized modules. Wrote configure and make scripts to make emile and morpheme, providing for versions with different levels of debugging and optimization (see INSTALL.TXT) - Removed minor bug in the algorithm eliminating rules that do not contribute to the covering of secondary expressions of a type. 4.1.5 [19/6/2000] - Changed the representation of grammatical derivation rules to allow rules using arbitrary combinations of subexpressions and type references, such as [1] => the [2] jumped over [3] dog. - Developed a completely new algorithm to create the rules grammar, based on the idea of adding types to the set of used types as long as this results in a decrease of the resulting rule set. The algorithm is incremental in the sense that recompilations will start with the types used before. - Added a 'clear rules' command to reset the markings of used types (i.e. to reset the incrementality of the rules-finding algorithm). - Added separate settings for support percentages for secondary contexts, secondary expressions and rules, for the case where a type has no characteristic contexts/expressions and uses primaries instead. 4.1.4 [27/4/2000] - Changed the representation of collections that are not used for membership queries. A vector representation is sufficient for traversals. Also performed several other memory usage optimizations, for a global decrease in memory usage of 50-60%. - Change the definition of 'characteristic' expression/context used in the program to the correct one. - Added a compression rate measure to the 'show type' output, defined as compression(g) = ( (total length of all primary expressions) + (total length of all primary contexts) ) / (total length of all combinations of primary contexts with primary expressions) and added a 'show types sorted-by-compression' command - added "parse_type _n_|* _phrase_" command to attempt parsing a phrase as an expression of a specified or unknown type - Changed the default value for 'minimum_number_of_expressions' from 4 to 2. - Changed name of 'Chomsky-type rule' to 'binary rule'. - Added a 'show memory details' command to show more detailed memory usage statistics (including memory usage statistics for individual program structures, for debugging purposes). - Change database format version checking to check for a valid _range_ rather than a valid _number_. This should improve downward compatibility of savefiles. - Eliminated the need for a space after the command when using the commands '!' and '?' (synonymes for 'shell' and 'help'). - Added the shortcut '.' as a synonyme for 'script', with no need for a space after the command (as above). - for sake of completeness, added 'show version' command with 'version' and 'ver' shortcuts, - Change version numbers from '3.' to '4.1', as the early prolog version turned out to have been the real EMILE 3.0. Added EMILE 1.0-3.0 to CHANGES.TXT - Corrected bug where Carriage Returns were treated as separate symbols instead of as whitespace. - Set EMILE to ignore 'broken pipe' errors when piping the output of a show command. As it was, piping the output to 'less' and quitting less by pressing 'q' caused Emile to exit. 4.1.3 [7/3/2000] - Added a 'generate new' command which generates only sentences that are not already in the database (and not already generated). - Added a 'set support' command to set _all_ support-related settings to the given value or a derivate thereof. - Added 'set random' and 'show random' commands to control the randomizer - Changed the default support values from 70 & 91 to 50 & 75, as the latter values tend to yield better results. - Fixed bug where the generate command in Word Analysis mode generated extraneous spaces inside the words. 4.1.2 [3/3/2000] - Added 'generate' command to generate sentences based on the derived rules. - Emile has a refinement, that it ignores periods as end-of-sentence markers if they follow an initial. This behaviour is now optional, and controlled by the new settings variable 'ignore_abbreviation_periods'. - Added option to use regular expressions for end-of-sentence markers, controlled by the new settings variable 'regular_expression_as_marker'. Also reworked the normal end-of-sentence-marking system to use the regular-expression engine, which should increase speed. - The parse command now displays grammatical structure - When parsing, Emile may now arbitrarily (re)assign types to single words in order to get a satisfactory parsing. This behaviour is controlled by the new settings variable 'parser_tolerance'. - Added seperate help-screens for shortcut commands and show command. - Added 'show mem' command to monitor memory usage. - Added 'show help' option to show command as a synonym for 'help'. - Added redirection of the show command's output to pipes or files. - Added requirement that for single rules [a]=>[b], the type [a] should have more secondary expressions than the type [b]. This should prevent loops that throw the parser into an infinite regress. - Fixed bug where assigning '0' to a settings variable caused it to revert to its default setting. - Added version history in 'CHANGES.TXT'. - Added brief install instructions in 'INSTALL.TXT'. 4.1.1 [22/1/2000]: - Added '!' to the list of default end-of-sentence markers. - Fixed bug where a line containing only whitespace was not considered to be an empty line (for purposes of ending a multi-line sentence). 4.1 [19/1/2000]: - Implemented new algorithm using characteristic, essential and secondary contexts and expressions, and added corresponding setting variables. - Various bugfixes - Added optional arguments to show command for displaying single types, contexts or expressions. - Updated and expanded documentation of algorithm. - Allowing multi-line sentences is now optional (on by default). - Using multiplicities of sentences is now optional, and off by default. - Added verbosity levels > 1, so debug level information can be enabled at runtime rather than at compilation time. - Added settings 'type_usefulness_required' and 'rule_usefulness_required' to control the eliminating of types/rules that do not contribute enough. - Expanded the help screen to include one-line descriptions of commands. 4.0.4 [10/12/1999]: - Added compilation-time option to analyze morhemes and words instead of phrases and sentences, and optimized data structures for same. - Wrote documentation of algorithm 4.0.1 - 3.0.3 [??] - Various undocumented bug fixes and enhancements 4.0 [9/9/1999]: - First non-alpha version 3.0 [1998] Prolog program described in Pieter Adriaan's article `Learning Shallow Context-Free Languages under Simple Distributions'. Based on a 1-dimensional clustering algorithm. 2.0 [1993?] 1.0 [1993?] Mainly theoretical versions: see `Lanuage Learning from a Categorial Perspective' by Pieter Adriaans.