By asymmetry, we minimally refer to the logical definition of asymmetry, a unidirectional relation r between two terms x and y. Thus, x and y are in asymmetrical relation, iff r(x y) does not imply r(y x). More formally: r is asymmetrical = df
The asymmetry project is part of the research agenda of generative grammar. It is aimed at the characterisation of Universal Grammar: a representation of the initial state of the language faculty, specific to human cognition. In this perspective, the language faculty is biologically inherent, enabling human beings to quickly develop the grammar of the language to which they are exposed and to interpret and generate new expressions of the language.
The theoretical linguistics aspect of this research will contribute to our understanding of what is common to all languages, but not immediately accessible to the human mind, that is to say - the abstract relations inherent to the language faculty (Chomsky, 1975, 1981, 1995; Kayne 1984, 1994). This research will bring greater justification to the existence of Universal Grammar. From this perspective, the languages of the world are specific cases of a single grammar whose initial form includes, according to our theory, asymmetrical relations as well as parameters of variation, allowing typological, historical and dialectal linguistic diversity. The restrictions observed in a great variety of languages on the composition of linguistic elements, their dependency and their linear order, indicate that Universal Grammar does include elementary asymmetrical relations which are crucial and must be systematically explored.
The results of our work will allow for the development of leading research in fundamental linguistics by the formulation of a grammatical model based on asymmetry. This model is developed in collaboration with several researchers from Canada, the United States and Europe, all of whom have made their mark in their respective domain. In addition to having a vast empirical coverage, the proposed model will satisfy the criteria assuring its theoretical validity, including simplicity and completeness.
The computational linguistic aspect of this research will contribute to our understanding of the performance systems, the conceptual-intentional system and the acoustic-perceptual system, and their interaction with the interface representations. These systems assure the efficient use of our implicit knowledge of language. It will allow for the development of a performance model which incorporates asymmetrical grammar and which assures efficient processing of linguistic expressions.
Although asymmetrical relations are used in natural language processing, particularly in syntactical analysis (Marcus, 1980; Berwick and Weinberg, 1984; Berwick, 1991) and in the recent developments of basic analysers, there is not yet any computational model based on elementary asymmetries. Formulating and implementing such a model will fill the gap. Furthermore, this model will be used as theoretical support for a technological application. Our computational linguistic work will help increase the precision of information search and retrieval tools on the Internet, through the partnership of research centres and companies in the field, such as le Centre de recherche informatique de Montréal (CRIM), as well as groups and companies such as Delphes, ConText, Uplift and Lingsoft. This aspect of the project will help to overcome the limits of search tools based only on statistical methods.
The computational model will also provide the fundamental substratum to subsequent research on the properties of intelligence and mental computation aiming to verify the psychological reality of asymmetry and to determine whether asymmetry is a property exclusive to the language faculty or a property of the cognitive system and its various conceptual and perceptual ramifications.
1.1.1 Nature. This major collaborative research project is of a fundamental nature in that it targets the properties of grammar, seen as models of our implicit knowledge of languages, and the properties of the performance systems, seen as systems which assure the use of this knowledge. It also includes a technological application which makes our theory operational.
1.1.2 Objectives. Our general objective is to characterise the asymmetries of natural languages and their processing by the performance systems. Our first specific objective is to define a grammatical model based on asymmetrical relations. In the theoretical linguistics aspect of our research, we want to define a grammatical model in which each component is a specific case of the fundamental asymmetry of grammar. Our second specific objective is to develop a computational model capable of processing natural language asymmetries efficiently. In the computational linguistics aspect of our research, we want to define a computational model integrating the proposed grammatical theory and develop efficient processing systems.
1.1.3 State of the art. In fundamental linguistics, asymmetry has been treated differently in each sub-area of grammar. In syntax, it has been observed that the extraction of a complement contained in an embedded clause is possible, but that the extraction of a subject or an adjunct is excluded (**) or rarely acceptable (*), e.g. Which car did John say that Bill fixed? versus (**) Which mechanic did John say that fixed the car? (*) Which car did John wonder how to fix? versus (**) How did John wonder which car to fix? These facts were treated by differents principles, such as the Empty Category Principle (ECP) (Chomsky, 1981), the Constraint of Extraction Domains (CED) (Huang, 1982), Barriers’ Minimality (Chomsky, 1986), Relativized Minimality (Rizzi, 1990) and the Minimal Link Condition (Chomsky,1995). In morphology, even though an area less discussed about in terms of asymmetrical relations, asymmetries have also been observed. For example, subject and prepositional complements are rare or unacceptable in deverbal compounds, e.g. exam-giving to students by professors, (*) student-giving of exams by professors, (*) professor-giving of exams to students. Similar cases were discussed in terms of singular principles. The First Sister Principle (Roeper and Siegel, 1978) was one of the many conditions proposed in order to express the restrictions. In phonology, asymmetry is part of more than one theory and analysis (Dresher and Hulst, 1995, 1997; Hulst, 1996; Rice, 1992, 1995; Rice and Avery, 1993, 1995).). Thus, for example, in Semitic languages where words are constructed by the intermingling of the different elements of consonant roots, viz., vocal infixes and syllabic patterns, matching principles between roots and patterns were proposed to account for the asymmetry between autosegments and the units bearing autosegments (McCarthy, 1981).
Recent works in syntax aim to derive the observable properties of languages, such as the formation and the movement of components as well as the linear order of their parts, from fundamental asymmetrical relations. It has been proposed (Kayne, 1994) that the linear order of the terminals in linguistic expressions is determined by the asymmetrical c-command relation between the non-terminal categories. This proposal, in turn, limits the basic order of linguistic constituents to a constant order, viz., to the specifier-head-complement order (Kayne, 1994). Furthermore, it has been proposed (Chomsky, 1995) that the syntactic operation MERGE, which constructs head-complement configurations, is asymmetrical in that only one of the two merged categories projects a new category. Chomsky (1995) also proposes that the syntactic operation MOVE is asymmetrical in that only the category targeted by MOVE projects its features in the configuration it is a part of. These proposals allow for a significant reduction of the class of configurations generated by grammar. Likewise, in morphology, it has been proposed that word formation depends on asymmetrical relations, couched in terms of the notion of Head of a Word (Williams, 1981) and Relativized Head of a Word (Di Sciullo and Williams, 1987), as well as in terms of the Adjunct Identification Principle (Di Sciullo, 1997) governing the relations between adjuncts and heads. These proposals impose a limit on the possible word-internal head/non-head and adjunct-adjoined structures. An empirical consequence of the latter proposal is that a prefix cannot be the categorial head of a word. Thus, in to enlighten the issue, the prefix is prepositional and the suffix is verbal. This makes the correct empirical predictions e.g. to encode the message versus (*) to coden the message; to enlarge the road versus (*) to largen the road. Similarly, in phonology, the head/non-head asymmetry plays a crucial role. This is the case, for instance, with regards to vocalic versus consonantal harmony (Dresher and Hulst, 1995; Hulst, 1996; Gafos, 1996).
In as much that computational models include mathematical linguistic theory and strategies for use of this theory, asymmetry plays a role in computational linguistics. Principle-based parsers (Berwick, 1991, and related works) include the axiomatisation of our knowledge of language, through sets of rules or principles, and control strategies representing our use of this knowledge. Different strategies (left to right, ascending or descending, deterministic or non-deterministic, sequential or parallel) are used to associate one or more structures with a linguistic expression as input. The research in this area aims at reducing the possible analyses that can be generated by the parser to the one or the very few that correspond to the actual structure of the linguistic expressions under analysis. In the history of the field, the concept of asymmetry was first indirectly embedded in parsers. This is the case, for example, in the treatment of subjacency in Marcus’ deterministic analyser (1980) and in the treatment of c-command in Berwick and Weinberg’s analyser (1984). Currently, several researchers including E. Stabler, C. Yang, R. Berwick, A. Weinberg, D. Linn and S. Fong, are developing more sophisticated analysers, founded on the satisfaction of feature-based constraints. Although the parse is not entirely based on the recovery of asymmetries, these systems establish a strict separation between the material that c-commands from the material that does not c-command the part of structure under analysis. As soon as the analyser recognises that the material preceding a part of structure that it integrates in the analysis does not c-command this part of the structure, it linearises the material and directs it to interpretation: this imitates canonical analysers, in the sense of Knuth (1973).
The notion of asymmetry, we believe, can be used successfully in Information Processing systems (retrieval and extraction). In this area, it is urgent to find efficient solutions to the problems of users, who consult vast data banks such as the Internet to obtain accurate information on specific subjects. Although it is no longer currently necessary to be familiar with the use of operators or logical connectors to query Internet data, it is evident that search results are often inaccurate, too numerous or too vague because they are solely based on logic and vicinity relations between searched words or sometimes even between character chains composing these words. All other relations between these words are usually omitted, including semantic, syntactic and morpho-syntactic relations. Therefore, even the most reputable Internet search systems like AltaVista and Yahoo, for example, are limited to words, meaning that no preliminary linguistic processing is executed. When other services execute linguistic pre-processing, the index is generally made-up by nothing more than the lemmatisation of the vocabulary. For example, the conceptual search offered by Excite is based on statistical techniques without any systematic appeal to natural language processing techniques. Apart from a few pioneers, it is only recently and on a low scale that certain data search systems have begun to integrate and use morpho-syntactic knowledge in search engines.
We expect information retrieval and extraction tools whose methodology is based on natural language processing techniques to be more powerful and easier to use than those based exclusively on boolean operators or other such functions based on the recognition of chains of characters. The formulation of queries in natural language and the use of morphological and syntactic relations between the words should contribute to the improvement of the performance and usability of search engines.
Around the world, researchers are currently developing data retrieval methods based on natural language processing techniques. Thus, for example, Oracle Corporation in San Francisco has already marketed ConText, a program at the forefront of these data search solutions, as a result of the work of a group of linguists. The system uses the grammar of several languages and has a concept base organised on nine levels and containing 250 000 concepts with ten million cross-references. XLT technology of Xerox/Xsoft in Palo Alto and Intelliscope of Inso are also linguistic-based products which carry out open text analysis in natural language including segmentation, the reduction of words to their base forms, the identification of major categories and the identification of noun phrases. The Lingsoft group in Finland has developed search and extraction tools for Finnish based on morphological and syntactic descriptions by using the Kimmo algorithm (Koskenniemi, 1990), rather than probabilistic models. The Uplift group in Utrecht has shown, through its Dutch text database and by using a Porter algorithm, that the stemming technique, which associates different morphological variants to their root forms (Frakes and Baeza-Yates, 1992), enables reaching a higher rate of accuracy than affix detachment techniques which are not based on morphological analysis.
In Canada, several Ontario companies and laboratories have favoured information retrieval research development at the University of Waterloo. One of the best known results of this work is undoubtedly OpenText, which is currently available on Internet. This robot’s search parameters are boolean operators and proximity-adjacency. The ranking of accuracy is carried out by "Weighted Search" method, which imposes an ordering to unlimited search results. In Québec, Solutions Internet Technologics (SIT) is financing a research project to implement an Internet version of a concept search programme, while Téléuniversité is working on the introduction of a programme called FXS, for Internet and Intranet networks. It is worthy to note other Québec products and companies working in the field, such as Machina Sapiens, Ardilog, Grafnetix, Cederom-Sni, among others.
The partners we have chosen in Montréal and abroad share the similar goal in developing natural language processing search tools. They will accompany us in our research and facilitate the development of a more accurate tool. Although our partnerships are just beginning, since there are still stages of research to cross before developing and marketing our anticipated application, their presence throughout this work will contribute to quickening its development and making its results useful to society.
1.2.1 Importance of Asymmetry in Theoretical Linguistics. The current problem for linguistic theory is to formulate a model which characterises Universal Grammar and its diverse manifestations. The work on asymmetry has highlighted decisive steps towards the resolution of this problem. If phonological asymmetry exists for the same reason as syntactic and morphological asymmetries, and if moreover they are the result of independent grammatical properties, the question that comes immediately to mind is to know whether asymmetry, as we defined above, is a fundamental grammatical relation. A positive answer to this question would notably allow to avoid the problems linked to theories based on symmetrical relations, like the X-bar Theory (Chomsky, 1970), which imposes an identical base form on all syntactical projections. Such theories, including also the Binding Theory, the Theory of Control and the Theory of Government, have reached their limit (Chomsky, 1995), in that they cannot encompass facts which escape the generalisation that they impose and, in some cases, they may lead to non empirically motivated syntactic representations.
The Originality of the Theoretical Linguistics Aspect of this research is to treat the asymmetries of natural languages not as the effect of heterogeneous principles, as is still the case in current theory, but actually as deriving from the basic asymmetry of grammar. In other words, the presence of asymmetry in grammar is not haphazard, but is an effect of the very structure of the language faculty, whose initial state is unique and is a part of human genetic inheritance. This part of the research will shed new light on the nature of language and thought.
The Contribution of our Research to Theoretical Linguistics. Our model will allow for a unified analysis of phenomena that have not been analysed in terms of asymmetry, as in the case in syntax of argument dependency, reflexivity, predication and co-ordination, which have often been treated on the basis of symmetrical relation, such as symmetrical c-command and the notion of co-arguments (Williams, 1997; Higginbotham, 1997). Our model will also allow for a unified treatment of morpho-syntactic properties, as in the case of the distribution of affixes to the right or to the left of roots and heads (Di Sciullo, 1997a; Roeper and Keyser, 1997) for morphologically concatenating languages, like French or English, as well as for the compositional properties in Semitic morpho-phonology between consonantal and template root positions (Prunet and Petros, 1996; Guerssel and Lowenstamm, 1997). In addition, it will allow us to uniformly capture asymmetries between segments that influence the phonological processes in which they participate (Dresher and Hulst, 1995, 1997; Hulst, 1996; Rice, 1992, 1995; Rice and Avery, 1993, 1995).
Our model will enable us to account for the fundamental similarities between languages, despite their apparent differences, thereby assuring greater understanding of the properties of Universal Grammar and their various manifestations. Furthermore, our model will also achieve uniformity and simplicity, since each part of its architecture will be defined in terms of asymmetrical relations. Moreover, it will meet the necessary completeness for its computational implementation.
1.2.2.Importance of Asymmetry in Computational Linguistics is evident, given the central role played by asymmetrical c-command in principle-based parsers. However, the current problem in computational linguistics is to formulate a model that can process linguistic expressions efficiently and quickly. From our perspective, the use of a simplified model, based on the generation and recovery of local asymmetrical relations, constitutes the first step towards the resolution of over-generation and speed.
The current problem in information retrieval and extraction is to improve recall (the ratio of the number of pertinent retrieved articles with respect to the whole set of pertinent articles) while maintaining a high level of precision. Information retrieval based on natural language processing methods offers a new solution to this problem, since it is now known that retrieval based solely on statistical techniques has reached its limit. It has also notably been shown that the use of a stemming algorithm during data search indexing and query analysis provides a part of the solution to the search tool efficiency problem (Frakes and Baeza-Yates, 1992; Krovetz, 1993; Popovic and Willett, 1992; Kraaij and Pohlmann, 1995; Pohlmann and Kraaij, 1997). Another part of the solution to the current problem is the indexing of nominal expressions, which helps to delimit the meaning of the data to be retrieved. These are reasons to believe that the integration of asymmetrical relations into search tools will increase its precision.
The originality of the Computational Aspect of this research is the integration of the asymmetrically-based grammar into the computational model and into a data search tool. Asymmetrical relations will guide the processing of linguistic expressions and the information they bear.
The originality of this aspect of the project resides in the use of computational operations oriented by natural language asymmetries rather than ad hoc heuristics. This will lead to major advancements in the engineering and optimisation of processing systems. These programmes will have the following advantages: the reduction of singular principles to a single asymmetrical concept will simplify the architecture of the system; the relation of asymmetrical concepts to the types of objects generated will allow for the decrease of over-generation; finally, asymmetrical recognition processing will increase the speed of analysis. Moreover, the technological application of our work will improve the precision of a search and retrieval tool.
Our contribution to computational linguistics will consist in formulating a computational model based on the recovery of local asymmetrical relations. The advantages of this type of model are as follows: system architecture is simplified and the characteristic over-generation of heterogeneous principle-based systems is decreased. Asymmetry-based parsing will increase the speed of processing. Our contribution will also provide search engines with the necessary theoretical and linguistic tools to improve their performance. The integration of asymmetry-based morpho-syntactic analysers during indexing will increase the quality of extraction and retrieval engines based on stochastic methods.
1.2.3. Contribution to the Evolution of Social Practices. Our project will help advance our knowledge in the humanities and contribute to the evolution of social practices. The competence model that we are developing will offer supplementary justifications for the unique character of the human species. The language faculty’s asymmetrical properties are a part of all languages regardless of the ethnic or geographic origin of the people who speak them or their historical evolution. This means that despite language diversity, there are regular phenomena that depend on asymmetrical relations of the language faculty, which shed new light on links unifying language, grammar and thought. The performance model we are developing will mirror our cognitive capacity for learning, producing and understanding linguistic expressions. This model will provide the fundamental substratum for subsequent research on the properties of intelligence and mental computation, aiming to verify the psychological reality of asymmetry, and determine whether asymmetry is an exclusive property to the language faculty or a property of the cognitive system and its various conceptual, analytical, inferential, aesthetic, perceptual - auditory, visual, olfactory and tactile ramifications. Finally, the integration of fundamental and computational linguistic results will offer a realistic possibility to weave the links between scientific and social practices. The development of a more precise language search tool will help reduce the inadequacies of currently search engines on the Web.
1.2.4.Integration of Interdisciplinary or Multidisciplinary Perspective. It is important to note the added scientific importance of this Major Collaborative Research Project. The co-operation of specialised researchers from various theoretical and computational linguistic fields will give the necessary multidisciplinary and interdisciplinary perspectives for the completion of this project which aims at formulating a complete linguistic model based on asymmetry. The co-operation of computational linguistic researchers working in sub-fields of this discipline will help construct a language processing model recovering the asymmetries. This will make the formulation of new models possible. These new models will not merely be a collection of individual knowledge but new knowledge. Our project will help advance concerted research methods in that theoretical linguistic researchers and electrical engineering specialists as well as computer scientists and analysts will interact in formulating hypotheses, verifying, and identifying their predictions and consequences. The perspectives of both research fields are complementary and will reciprocally help one another.
1.2.5.Link and Pertinence of our On-Going Work This Collaborative Research Project on asymmetry is a continuation of the work we carried out in the scope of our CRSH major research grant (1992-1997). This work allowed us to make precise the architecture of the grammar and has led us to attributing a central role to elementary asymmetrical relations in the model. We have established the existence of asymmetry of form between various kinds of linguistic expressions. These formal asymmetries are characteristic of structures interpreted by the performance systems, which we called "canonical target configurations". These constitute the necessary support for interpreting lexical and syntactical expressions. We have already shown that our hypotheses helped to achieve a vaster empirical coverage than concurrent hypotheses (cf. Di Sciullo, 1996 a,b, 1997b). We have established that asymmetry determines the categorical nature of compound words (Di Sciullo, 1993-1996; Di Sciullo and Klipple, 1994; Di Sciullo and Tremblay, 1996) as well as the semantic role of constituents, i.e. possessor/possessee, source/target, agent/patient, whether for rigid word order languages (Di Sciullo and Gruber, 1994), or for fairly flexible word order languages such as Modern Greek (Di Sciullo and Ralli, 1995). We also showed that asymmetrical relations also determine anaphoric relations within complex lexical expressions (Di Sciullo, 1997d). We would now like to define asymmetry formally within the overall scope of grammar.
The asymmetry project is also related to our work on Interfaces, funded by FCAR (1995-1998). Thus, it constitutes the development of the theory of variation that we proposed and from which we have considered a number of consequences for the linear ordering of constituents in Romance, Germanic and Hellenic languages (Di Sciullo and Ralli, 1995, Canac-Marquis and Tremblay, 1997; Di Sciullo 1997c). We have provided proof, based on verbal structural analyses of French, Italian, English and modern Greek, to the effect that asymmetrical relations guarantee optimal interface processing by the performance systems and are kept constant in variation between languages and dialects (Di Sciullo, 1997c; Déchaine and Tremblay, in print). We would like to now expand our hypotheses to a greater set of data.
This Major Collaborative Research Project is also related to our work on the Interaction of the Theory of Grammar and Parsing Theory (FCAR, 1984 -1987) and on Argument Structure (SSRCH,1986 - 1992). The results of those projects helped us to establish the base for computational implementation of our theory. Our past work allowed us to formulate two prototypes for morphological analysis. The first, MORPHO-PARSE, takes complex words, like propagation, as input and gives as output a representation of the word identifying its derived category, argument structure and other semantic properties which must be satisfied in phrasal expressions, as in : la propagation des traits par le processeur. The analyser is deterministic, in Marcus’ sense (1980), the analysis is ascendent and is carried out from left to right. The analyser’s structure includes the morphological theory developed by Di Sciullo and Williams (1987). The second analyser, SYN-PARSE, is a parser for a fragment of French syntax and integrates the theory of argument structure that we proposed (Di Sciullo (1990). We are now working on the formulation of a computational model oriented by asymmetry and to its integration into information retrieval and extraction systems.
1.3.1.Fundamental Linguistics. Our theoretical framework is a part of the Chomskian tradition in linguistics and is formulated within the Minimalist Program (Chomsky, 1995). We present a unified model of grammar, including a single computational space, where the modularity resides in the relativisation of the Interpretability Condition to the interface properties (Di Sciullo, 1997b). The sub-principles of GB theory (Chomsky, 1981) related to interpretation, such as Theta Theory, Binding Theory, Control Theory are subsumed under the Interpretation Under Asymmetry Condition, defined on the basis of asymmetrical relation. In our model, the derivation of the linguistic objects (words, sentences/phrase) proceeds simultaneously in the morpho-syntactical and morpho-phonological computational space (Di Sciullo, 1996b). Each derivation must satisfy the composition principles specific to the components (morphology, syntax, phonology) which generate them, thereby subsuming the conditions on derivations (Chomsky, 1995), and is governed by the conceptual necessity to obtain canonical target configurations at the interfaces with grammar and the performance systems (conceptual and acoustic). Linguistic variation is reduced to morphological variation, when given the differences in marked features between singular grammars (Di Sciullo, 1997c).
(1)
Syntax Computational Space
Morphology / | \
Phonology Interfaces Interpretability Condition
Conceptual . Acoustic
Our model is a natural development of the Principles and Parameters framework (Chomsky, 1979), which in the early 1980s eliminated notions such as "construction" and "rules" for the notion of ‘interactive principles’ applying in the derivation of the linguistic expressions. We propose to move on to a subsequent stage of explanation, where singular principles are subsumed under the Interpretability Condition.
Our specific hypotheses are the following:
(2)
b. Configurational Asymmetry
c. Configuration Interpretability
According to our first hypothesis, Asymmetrical Definition of Features, the features of the grammar are defined on the basis of the asymmetry. Consequently, grammatical categories will not only be defined in terms of features (Chomsky 1970, 1995), but also in terms of configurations (Hale, 1995; Di Sciullo 1997e). This allows us to cover a larger set of facts than theories of categories based on either features only or configurations only. Furthermore, this leads to a uniform definition of the feature systems of the grammar including syntactic, lexical, functional, case and phi features, argument features, as well as semantic and phonological features. This hypothesis also leads us to consider that under-specified and marked features are also defined under asymmetry. A theoretical consequence of our hypothesis is that it contributes to limit the sets of grammatical features since the features not defined under asymmetrical relation are excluded from this set.
Our second hypothesis, Configurational Asymmetry, limits the form of the configurations generated by the grammar. Only configurations whose elements are in asymmetrical relation are admissible by Universal Grammar. It leads us to take the configurations in (3) as the base on which asymmetry is established. The specifier-head configuration is the target configuration for feature agreement, the head-complement configuration is the target configuration for feature selection, and the adjunct-head configuration is the target configuration for feature identification.
(3)
/ \ / \ / \
Spec X X Compl Adjunct X
/ \
X
The feature agreement, selection and identification are asymmetrical relations in the sense we specified above. One consequence of this hypothesis is that allow for a more accurate formulation of the interface Interpretability Condition. The linear order properties of the elements of linguistic expressions at the phonetic interface (Phonetic Form (PF)) will de derived from the local asymmetrical relations; this is also the case for the scope of quantifiers, variable linking and argument saturation at the conceptual interface (Logical Form (LF)).
Our third hypothesis, Configurational Interpretability, offers a systematic approach to the processing of the interfaces by the performance systems. According to this hypothesis, only certain configurations of features are interpreted at the interfaces. This hypothesis is promising in that it attributes to the performance systems constraints which are not specific to the grammar. In this perspective, we would like to consider the hypothesis that the performance systems, seen as processing systems, reduce the complexity of the interfaces, while keeping universal asymmetrical relations constant. A consequence of this hypothesis is that some derived configurations will not be interpreted by the performance systems. The conceptual system will only interpret Spec-head-complement structures for clausal interpretation (truth value), adjunct-head structures for morphological interpretation (conceptualisation) and complement structures for phonological interpretation (acoustic-perception). Thus, we derive our previous results, for instance that the Spec-head-complement projections which are part of the derivation of complex words are not interpreted conceptually at the interfaces (Di Sciullo, 1995) and that only `canonical target configurations’ are interpreted at the interfaces (Di Sciullo, 1996). Another consequence of our hypothesis is that cases of ungrammaticality and incorrect analysis of the "garden path" type, e.g. the horse raced past the barn fell, are reduced to cases where expected asymmetrical relations are not obtained, and the cases of multiple analysis (lexical ambiguity, multiple attachment, and structure paradoxes, e.g. [[atomic] [scientist]]/[[atomic] [scientist] ist] are reduced to cases where more than one expected asymmetrical relations between constituents obtains.
1.3.2.Computational Linguistics. Our approach to computational linguistics is compatible with the principle-based computational model (Berwick and Weinberg, 1984; Berwich 1991) and particularly with its recent developments (Berwick, 1995).
Notwithstanding the superiority of principle-based parsing systems (Barton, 1984; Berwick, 1987, 1991), there are still two main problems associated with these systems as they were developed in the 1980s and early 1990s, mainly over-generation and speed. Over-generation is a consequence of the heterogeneity of GB principles (Chomsky, 1998-1986), as well as their application to several levels of representations, e.g. D-structures (deep) and S-structures (surface), such that no singular principle was capable of sufficiently constraining the ultimate syntactic structure of a phrase under analysis. Fong (1991) showed that hundreds of possible X-bar structures exist for the analysis of simple phrases. Furthermore, the number of structures increases exponentially under the Theory of Movement. The slowness of parsing is a consequence of over-generation since the deduction chains linking the base principles to the surface forms are very long.
Over-generation and speed problems can be significantly reduced by a system that incorporates the grammar model (1) above, where there is only one interface Interpretability Condition, and where only terms in asymmetrical relations are accepted by the parser. The specific hypotheses that we present in this section of the asymmetry project are the following.
(4)
b. Processing by Asymmetry
Incremental Procedure
The system processes linguistic information incrementally.
According to our first hypothesis, the computational system universally integrates the asymmetrical properties of each component of the grammar and the Interpretability Condition based on elementary asymmetry. The system also integrates that which is particular to specific grammars, namely a lexicon and parameter of variation. The lexicon is a structured group of lexical entries, each including a complete list of features that cannot be derived from independent properties of the grammar. The system integrates variation parameters in the form of a structured set of designated features. A consequence of our hypothesis is that it maintains a maximally simple relation between competence and performance, while ensuring a robust empirical coverage of linguistic variation.
According to our second hypothesis, the operations of the computational system are oriented by the recovery of asymmetrical relations, whether they be operations of categorisation, operations of right or left attachments of constituents, or operations specifying dependencies. A consequence of this hypothesis is that the Interpretability Condition replaces ad hoc heuristics in the computation. Thus, for syntactic analysis, the computational system will include category and projection licensing rules, based on the recognition of elementary asymmetrical relations. For example, in the determiner system, the French determiner la, has the formal features [+D, +N, -V; +fem, +sg], it projects a category [+D, +N; +fem, +sg], and the category physique, has the formal features [ +N, -V; +fem, +sg], and projects a category [+N, +fem, +sg], assuming that only the + features are interpretable at a given interface and are therefore the only ones which may project a higher level of category. Maximal categories will therefore be defined by asymmetry of positions as well as by asymmetry of features. The attachment of the determiner la to the noun physique will be done under feature asymmetry, given that the asymmetrical relation of agreement (D, N) is unidirectional, and goes from the determiner la, which includes the [+sg] feature, to the noun physique which is specified for the [+sg] feature. The system will attach the maximal D projection and the maximal N projection by the operation MERGE. The category formed is +D, since, by definition, Merge creates a new category - contrary to the operation of adjunction - and according to our system, only the +D trait is new category feature in the case at hand and is interpretable since it is +. This constitutes an implementation of the asymmetry specific to MERGE. Thus, the Interpretability Condition based on the recovery of asymmetrical relations, legitimises the attachments, this will also be the case for linking dependentcies.
According to our third hypothesis, the structure is parsed incrementally when the Interpretability Condition is satisfied. A consequence of this hypothesis is that the class of analysers defined by the computational model we envision obtains maximal information from the linear string. If a word is to the left of another in a sequence, then either it is in asymmetrical relation with this word, and the parsed constituent is accepted, or it is not and the two words do not form a constituent. This provides a determining algorithm for the analyser. The absence of backtracking assures the efficiency of the algorithm. The recovery of feature asymmetry significantly helps to reduce several cases of categorical ambiguity. For example, the determiner la is a definite article in the expression [la physique], and a clitic pronoun in the expression [Elle veut la connaître ]. This form has the +D feature, however the analyser creates a Spec-head structure in the case of [la physique], and an adjunct-head structure in the case of [la connaître]. In the latter case, the fact that verbal features are projected is a consequence of the fact that the merger of la to the verb connaître does not create a new category but allows the verb to satisfy its internal argument. The adjunct structure is created as soon as the Interpretability Condition relative to argument saturation is satisfied. This increases the speed of analysis since categorisation, attachment and dependencies are created as the Interpretability Condition is satisfied. This contrasts with the generate-and-test strategy wherein a structure cannot be accepted until after several successive stages of heterogeneous principles verification.
We therefore expect to construct natural language processing systems which directly integrate the asymmetrical properties of grammar for categorisation, attachment, and dependencies, and which will produce outputs rapidly, as it is the case of the "human processor" which incrementally understands expressions to which she/he is exposed: he/she interprets them quickly as the parts they are constituted of are heard or read.
1.3.3.Information Retrieval and Extraction. The approach that we take with respect to Information Retrieval and Extraction is based on natural language processing. According to this approach, retrieval calls upon natural language morpho-syntactic knowledge (Savoy, 1993; Arampatzis, Tsaris and Koster, 1997; Pohlmann and Kraaij, 1997).
It is possible to optimise search and retrieval engines by applying natural language analyses to queries, particularly morpho-syntactic analysis, and terms expansions that provide the information to construct nominal structures and reduce ambiguity.
The Kimmo algorithm, a language independent morphological analysis is well adapted for morpho-syntactic processing based on asymmetry and can be used in this context to optimise search and retrieval. This is what we propose to do. A notable advantage of Kimmo is that it is the only formal morphological processing algorithm, all other available algorithms are purely ad hoc, as in the case of the Porter algorithm for instance. The use of the Kimmo algorithm for stemming and of asymmetry-based analysers for query expansion will help minimise the size of indices and maximise search accuracy.
Asymmetry-based grammar will also help increase retrieval quality, be it i) at engine input, during indexing, or ii) in search criteria, during motor search access. The first of these two fields, that is stemming algorithms, standardises the word in its masculine singular form for French and singular form for English. Generally, these algorithms are ad hoc and offer mediocre results. However, with the Kimmo algorithm 1.x and 2.x associated with a specialised grammar, the success rate of word transformation to its standardised form is nearly 100%. We propose to define a grammar based on morphological feature asymmetries, which optimises standardisation during indexing and provides derivatives of standard forms, during motor search access.
We will verify our hypotheses on the basis of the properties of a great variety of languages, and in co-operation with the team members identified hereunder.
(5) Fundamental Linguistics Group:
R. Canac Marquis, Simon Fraser U. Y. Roberge, Toronto U; ; M. Guerssel, UQAM, M. Hale (Concordia U)
Collaborators: K. Hale, MIT; J. Higginbotham, Oxford U; E. Williams, Princeton U; Moro, San Raffaele U., T. Roeper, UMASS;
M. Ambar, Lisbon U; Zribi-Hertz and J. Guéron, Paris VII; U, M. Rivero, U Ottawa, A. Ralli, U Patras.
Computational Linguistics Group:
Co-researchers: A.M. Di Sciullo, UQAM; H. Mili. UQAM, P. Gabrini, UQAM; Y., Nie , Ude Montréal
Collaborators: R. Berwick, MIT; S. Fong, NEC Princeton; A. Weinberg, Maryland U; E. Werhli, Geneva U.,
C. Yang, MIT. D Gafos, CUNY. H. van der Hulst, University of Conecticut, A. Fassi Fehri, Rabat U