|
This program is part of a major collaborative research project (MCRP) on natural language processing, on which the theory of the asymmetry of grammar was based. The theory of asymmetry targets the properties of interpretable linguistic representations by the performance systems (Di Sciullo, 1999, in press).We defined the motivation for the theory of parsing by the recovery of local asymmetrical relations. We developed a prototype of morphological analyzer, which is a transparent implantation of the theory of morphological asymmetry (Di Sciullo, 1995). This prototype analyzes the argument structures of complex lexical expressions automatically (Di Sciullo and Fong, 1999, demonstrator available). We equally developed an aspectual-conceptual parser for complex morphological expressions, which is integrated into a system of information processing by way of a specialized module (Di Sciullo 1998, demonstrator available). We are currently trying to develop the empirical coverage of our theory, the robustness of the prototypes and their integration in a search engine. The research program on natural language processing will lead to an increased structuring and expansion of our current network. Our discussion is however broader and more complex than what is currently being debated in the MCRP on asymmetry, which essentially focuses on some fundamental grammatical properties of languages (notably asymmetry and other parameters of variation). The MCRP emphasizes the central role of linguistics in the optimization of data processing. By widening our field of research (fundamental and computational linguistics), we should be able to optimize as a whole the processing of all natural languages. We also want to broaden our field by validating the psycho-linguistic reality of asymmetry, as well as the bio-linguistic properties of languages. Our model will be thus independently justified, and our technology capable of simulating the processing of languages by humans. This project is in fact a unique opportunity to develop in Quebec, a work of excellence, at the very forefront of linguistic theory, computational linguistics, and bio-linguistics. Over the span of a few years, fast and spectacular developments have marked the evolution of information and communication industries, in both the elaboration of technical tools and software applications. An eloquent illustration would indeed be the case of the Internet. We all know that other generations of tools and products are already in preparation and will eventually reach the market. Our work will be ready at the right time to meet the demands of both the public and the industry. The results of our research will have applications in the following fields:
1— PROBLEMATICS and OBJECTIVES OF THE PROGRAMOur general objective in this project on Natural language processing is to establish a complete model of asymmetries specific to the grammars of natural languages and their processing by the performance systems. Our program comprises three aspects: fundamental, psycholinguistics and computational, integrated in a common framework, that of modularity. The fundamental assumptions will be tested in experiments and integrated in computational systems. Moreover, the results of the psychological reality of asymmetry and automatic processing will make it possible to refine the theory. Enhancement of knowledge Our discoveries led us to identify a basic property of languages without which communication is impossible. This property is asymmetry, which is a unidirectional relation between pairs of elements of linguistic expressions. The ability to treat the asymmetries of languages automatically is tantamount to adding more knowledge to information technology. The results of our research open the way for a new generation of technologies for the automatic understanding of languages, leading to a variety of useful applications to the society regarding the ease of electronic communication, whatever the language may be. We are contributing to the enhancement of knowledge in the treatment of natural languages, specifically on the properties of universal grammar and the parameters of variation (Chomsky, 1999; Hale, 1998; Kayne, 1994). The model of grammar that we propose derives unidirectional properties of languages such as linear order, dependences between components and grammatical agreement, of a small number of basic asymmetrical relations (Di Sciullo, 1996, 1997, 1999). These relations cover the form of expressions generated by grammar, be they syntactic, morphological, phonological or semantic. The computational model that we are developing integrates the grammar based on asymmetry and gives place to optimal treatments, imitating the speed of the treatment of languages by human beings. The computational model and its technological applications are carried out in conjunction with specialists from MIT and NEC in Princeton (Di Sciullo and Fong, 1999), who work on the subtle point of the field, with the support of local partnerships, and international associates. The model of performance is the image of our cognitive capacity to learn, produce and understand linguistic expressions naturally. This model is developed in conjunction with specialists in psycholinguistics from the Université de Montréal, McGill University and Max Planck Institute in Holland (Di Sciullo and Matsuo, 1999); Di Sciullo and Roeper, 2000).
In the psycholinguistic aspect of the project, we shall be more interested in presenting the psychological reality of our hypotheses on the asymmetry of universal grammar. It is crucial for us to access a cognitive system of treatment in order to verify our hypotheses on asymmetry by experimental data on the development, the comprehension and the production of natural languages in real time. Many recent works on the comprehension of natural languages present independent evidence justifying our hypothesis (Petitto, under press; Carroll, 1999; Pinker, 1998; Harcourt and Brace, 1993). Some other works assume that the cognitive system includes principles for the treatment (parsing) of structures and principles of decision in the case of in-determination. All these studies point to the same direction, namely the validation of the hypothesis according to which the cognitive system treats the linguistic expressions in terms of abstract asymmetries.
Application of knowledge The application of our theories, essentially, to the automatic treatment of the contents supported by natural languages comes at the right time. It is known that the rates of performance of current search engines or automatic translation systems are far from being optimal. We equally know that retrieval based solely on statistical techniques has reached its limits (Frakes and Baeza-Yates, 1992). Thus, for example, the tools of information retrieval whose methodology is based on the properties of natural languages are more powerful and easier to use than those based on the exploitation of the Boolean operators and other functions based on the location and the recognition of the character strings. In fact, the integration of the modules directed by the asymmetric properties of grammars of natural languages is capable of optimizing any system, which aims at data processing conveyed by the language since they handle the specific properties of the language.
Complementarity of methodologies The specific methodologies for each aspect of our project are the following: the methodology for fundamental linguistics consists of formulating from the observation and analysis of linguistic elements, hypotheses (categories and abstract grammatical relations) whose predictions enable us to explain more phenomena than the existing hypotheses. The common methodological approach used in computational linguistics mainly consists in translating a set of principles and parameters into a formal grammar, which is used by linguistic algorithms in view of effecting a representation of natural language expressions. As for technological applications, the methodology consists in adapting algorithms of movement for the development of specific products (generation of texts, analysis and generation of conversations, email, electronic trade, etc.). The psycholinguistics methodology consists of elaborating texts, which verify the validity of linguistic hypotheses, by administering them to subjects and interpreting the results. These methodologies are complementary: the computational linguistic methodology assures the complete formalization of linguistic theory; technological applications make the theory operational, while the psycholinguistic methodology supplies external support. The computational linguistics methodology, the psycholinguistics methodology and technological application rest on the advances in fundamental linguistics.
Integration of Interdisciplinary or Multidisciplinary Perspectives It is important to underline the added scientific import stemming from the research on asymmetry. The co-operation of specialized researchers from various fundamental, computational and psycholinguistic fields will contribute to the multidisciplinary and interdisciplinary perspectives needed for the completion of this project. The perspectives of these three research fields are complementary and will reciprocally help one another. The integration of fundamental, computational and psycholinguistic products developed in language industries in Quebec (Delphes Technologies, Alis, Bell, Locus, etc.) will offer a realistic possibility to weave the links between scientific and social practices. The development of technological applications capable of treating the contents of natural language expressions in a more precise and effective way will make it possible to make up for the inadequacies of information technologies which do not treat the content in a systematic way. The Originality of the project The originality of the theoretical linguistic aspect lies in the fact that we want to treat the asymmetries of natural languages not as the effect of heterogeneous principles, as is still the case in current theory, but actually as deriving from basic asymmetry of grammar, which is realized in a more specific way in each module. In other words, the presence of asymmetry in grammar is not haphazard, but is part of human genetic inheritance. This part will shed light on the nature of language and thought. The originality of the computational aspect resides in the use of computational operations oriented by natural language asymmetries rather than ad hoc heuristics. This will lead to major advancements in the engineering and optimization of processing systems. These programs will have the following advantages: the reduction of singular principles to a central asymmetrical concept will simplify the architecture of the system; the relation of asymmetrical concepts to the types of objects generated will allow for the decrease of over-generation; finally, asymmetry-based parsing will increase the speed of analysis. The technological application of our work will improve the conceptual precision of search and retrieval tools.
The originality of the psycholinguistic aspect is the development of a model of performance. The model will be the image of our cognitive capacity to learn, produce and understand linguistic expressions naturally. This model will provide the fundamental substratum on the properties of intelligence and mental computation, aiming at verifying the psychological reality of asymmetry. It will also determine if asymmetry is an exclusive property of the language faculty or a property of the cognitive system and its various ramifications: conceptual, analytic, inferential, aesthetic, perceptual, auditive, visual, olfactive, and tactile.
2 — MODULES AND OBJECTIVES Our objective is to refine the components of our grammatical model, develop a computational one and verify the psychological validity of it. The lexical module will imply research in the computational modeling of the lexicon as a cognitive system of storage of lexical items and will also bring about the development of technological applications which include electronic dictionaries. This will increase our understanding of the logic of lexical representations and the mnemonic capacity of the language faculty. Lexical features form part of the same repertory, given that they are constantly in demand by the various modules of grammar; moreover, these features are crucial in linguistic variation. The modeling of the lexicon has real applications for all that concerns the automatic treatment of natural languages. The electronic dictionaries which currently exist do not adapt easily to other architectures because they contain too much or too little information. For example: (i) Word NET and EURO WORDNET aim at treating semantic information associated with lexical elements on the basis of categories; (ii) the sets of thematic roles, which are not sufficiently abstract to treat the conceptual dimension of linguistic expressions, can give place to errors of treatment. For instance, the verb to weigh has the same set of grid (X, y), but more than one aspectual value (activity or state), e.g., Max weighed the apples vs. Max already weighed 100kg. We propose a more sophisticated lexical model which will easily recognize abstract categories, such as aspectual categories which are generally not visible or audible nor found in written linguistic expressions, but are however, necessary for their interpretation. The morphological module will imply research in the computational modeling of morphology as a cognitive system which processes sub-lexical symbols. This will enhance our understanding of the logic of morphological features and the computational capacity of the language faculty. The explicit modeling of the morphological system has potential applications especially for information retrieval systems which incorporate word identifications (tokens), word segmentations (stemmer), and identifiers of parts of speech (POS taggers). The Systems which carry out these operations only on the basis of statistical calculations cannot overcome the errors incurred by the economy of the morphological systems of languages. As such, a given morpheme can have different values in the same immediate context: this is the case of - s, which is a mark of plural in French (le-s) or a mark of tense (tu aimes-s ç a). On the contrary, a richer model which recognizes these abstract morphological categories is able to give rise to a family of products of superior generation concerning the automatic treatment of derived lexical items. This, does not only concern their analysis, but also their generation, which includes the recognition and the generation of new words. The syntactic module will imply researches in the computational modeling of syntax as a cognitive system which processes syntactic constituents. This will contribute to our understanding of the type of ordering specific to constituents of the highest order; and the computational capacity of the language faculty capable of generating and understanding an infinite number of phrases which are most of the time not edited. The syntactic system’s explicit modeling has automatic translation applications which cannot be reduced to, as we already know, word for word translations. This modeling has already shown its potential advantages in search engines, which integrate certain facilities relating to the syntactic component’s geometry (super tagging). Its contribution is critical all multi-lingual processing in which the order of words is one of the most conspicuous effects of the parametric variation between languages. The integration of modules capable of treating the syntactic properties in the uni-lingual and multi-lingual processing systems’ architecture is likely to optimize their performance. The recourse to abstract syntactic categories makes it possible to avoid potential processing errors which usually confront natural language processing systems solely based on probability calculations. The handling of syntactic features associated with lexical items through grammatical operations not only makes it possible to generate, but also recognizes language phrases, by avoiding errors associated with statistical processing particularly based on typographical signals. The semantic module will involve researches in computational modeling of semantics as a cognitive system which processes semantic symbols. This will contribute to our understanding of the logic of semantic features and the computational capacity of the language faculty. The semantic system’s explicit modeling has potential applications for the processing of aphasia and other language disorders. The Lingraphica, which was applied to speech disorders, is an example of a program based on linguistic properties. This software was patented by the American US Food and Drug Administration Company. The software is based on lattices of semantic categories.The well-known systems of processing are formulated on the basis of the hierarchical semantic relations network (IS A) between features. Moreover the data processing systems are generally based on key word scans, and often generate errors in either search or extraction, since the postulate on which the processing is based, is erroneous. The significance of the expressions of which they are part is supported by abstract categories and configurations. These abstract categories, are generally evacuated in both search and extraction systems because their frequency in texts in the form of grammatical words (stop words) makes the statistical processing difficult. However, grammatical words or functional categories carry the semantic relations specific to natural languages (ex: she comes to/from Quebec); the interpretation of words is variable since their inherent polysemia makes them liable to semantic type change (Type shifting) (She speaks in/of North American French). We propose to develop the semantics of functional categories and to integrate it into existing technologies. Based on the semantics of functional categories, we will develop a system from which the interpretation of whole sentences can be derived compositionally from the interpretation of semantic relations The phonological module of the project will imply research in the computational modeling of phonology as a cognitive system which processes phonological symbols. This will contribute to our understanding of the organization of phonological features and the computational capacity of the language faculty. Given their relatively concrete nature in comparison with the features implied in morphological and syntactic computation, phonological features are relatively easier to study. This is the case, for example, of the asymmetry of phonological features and under-specification. The explicit modeling of the phonological system has potential applications for the processing of aphasia and other language disorders. We envisage the development of products, which will extend technologies such as that of the Lingraphica program to the phonological domain for the treatment of speech disorders. It may not be obvious that the existence of phonological models based on symbols be the prerequisite for the development of speech analysis systems. However, the statistical analysis for technologies of speech-to-text and text-to-speech cannot define models of variation based on abstract categories. Without the recognition of such categories, variation belongs to an undifferentiated continuum. Thus, we propose the incorporation of existing technologies of synthesis and analysis into a richer model, which recognizes these abstract categories. A field in which this is more crucial is the role of intonation in human-machine interactions. It is well known that intonation can change not only the status of the components of the speech (e.g. Zev SEES Stan vs. Zev sees STAN, etc.), but also the value of the truth of the proposals when intonation reflects attitudes such as sarcastic remarks. It also follows that an analysis with a value of truth which is compatible with the context but which does not take into account the intonation can easily lead to errors in speech processing. This module can help technologies to repair signals just as human beings have the capacity of doing, more often in an unconscious way. Thus, our research program goes further than simply tracking the errors of words. We propose a complete approach to natural language processing which equally incorporates a model of interaction between phonology and syntax, and phonology and semantics. Important dates: 2000-2001 Research and conceptual development of the different modules of our natural language processing system: 2001-2002 Research and multilingual development of (universal/parameterized) for each module 2002-2003 Development of marketable products associated with each module Establishment of the Federation and 2003-2004 Specialized developments in the fields of health, finance and education
3 —THE RESEARCH TEAM
This project will enable us to establish the best research team on natural language processing in Quebec, Canada and overseas. This team of researchers will be under the direction of Anna-Maria Di Sciullo (UQAM), who is internationally recognized in the fields of grammatical theory and computational linguistics. She has published several books, notably, that on, On the Definition of Word (MIT Press), and numerous articles in highly recognized linguistic journals. She has headed several research groups sponsored by FCAR and CRSH of which is the Major Collaborative Research Project (MCRP) on asymmetry. The award of 1.8M$ grant for the MCRP shows the high quality of her research projects and her capacity to coordinate multidisciplinary research groups both at the into the Royal Society of Canada this year, an added honor among others, of which is the Award of Excellence in Research which was bestowed upon her by the Board of Directors of the Université du Québec (see www.asymmetryproject.uqam.ca). The researchers who will work in this collaborative research project on natural languages are recognized internationally for their work in their respective research fields. Some members of this research team are also affiliated to other multidisciplinary research groups such as the MCRP on Mental lexicon (Université de Montréal), LOT (Holland) and the Max Planck institute. Each mini-project includes a researcher from Quebec and prestigious external collaborators from MIT, University of Massachusetts, Princeton University, University of Delaware, LOT and the Research Centre in Linguistics Lucien Tesniè re, France.
LEXICAL MODULE: Anna-Maria Di Sciullo (UQAM, known for her works on argument structure and aspectual-lexical structure); Yves Roberge (University of Toronto, known for his works on linguistic variation and lexical properties); Edwin Williams (Princeton University, known for his works on lexicon). MORPHOLOGICAL MODULE: Anna-Maria Di-Sciullo)UQAM, known for her works on morphology of natural languages); Tom Roeper (University of Massachusetts, Amherst and Max Planck Institute, known for his works in morphology and acquisition); Angela Ralli (University of Patras, known for her works in Modern Greek morphology); Edwin Williams (Princeton University, known for his works in the morphology of natural languages). SYNTACTIC MODULE: Anna-Maria Di Sciullo (UQAM, known for her works in the syntax of Romance languages); Mark Hale (Concordia University, known for his works in historical syntax); Daniella Isac (UQAM, known for her works in Romanian syntax); Réjean Canac-Marquis (Simon Fraser University, known for his works in syntax-semantics); Edwin Williams (Princeton University, known for his works in syntax); Manuela Ambar (University of Lisbon, known for her works in Portuguese syntax); Jacqueline Gué ron (Université de Paris 111, Sorbonne-Nouvelle, known for her works in syntax).
SEMANTIC MODULE: Anna-Maria Di Sciullo (UQAM, known for her works on linking relations - operator-variable); Manuel Espagnol-Echevarria (Université Laval, known for his works in syntax-semantics); Jim Higginbotham (University of Southern California, known for his works in non-lexical semantics); Eric Reuland (Utrecht University, director of LOT, known for his works on Linking theory). PHONOLOGICAL MODULE: Charles Reiss (Concordia University, known for his works in the phonology of natural languages); Mark Hale (Concordia University, known for his works in the phonology of natural languages); Christopher Miller (UQAM, known for his works in the phonology of sign languages); William James Idsardi (University of Delaware, known for his works in phonology and computational linguistics).
Anna-Maria Di Sciullo (UQAM, known for her morphological analyzers which are driven by asymmetry); Phillipe Gabrini (UQAM, known for his works in software engineering; Robert Bernick (Massachusetts Institute of Technology, known for the model of parsing driven by principles, holds the Guggenheim and Egerton Faculty Award, MIT); Sandiway Fong (NEC Princeton University, known for his PAPPI system for syntactic parsing driven by principles); Eric Werhli (Université de Genève, known for the elaboration of GB analyzers for French); Sylvaine Cardey (Université de Besanç on, director of the Research Centre in Linguistics Lucien Tesnière, known for her works in linguistic data-processing).
Gonia Jarema (Université de Montréal, director of the Major Collaborative Research Project on Mental Lexicon); Laura Petitto (McGill University, known for her biolinguistic studies on language and the brain, she won the Guggenheim Award); Celia Jakubowicz (Université de Paris V, known for her works in psycholinguistics); Tom Roeper (University of Massachusetts, Amherst and Max Planck Institute, known for his work in morphology and the acquisition of natural languages). The expertise of the researchers assembled in the various sub-groups is complementary since many of them are already working together in the same collaborative research projects. For instance, A,-M. Di Sciullo, E. Williams, J. Higginbotham, Y. Roberge, R. Canac-Marquis, A. Ralli and P. Gabrini are already working in the major collaborative research project on asymmetry. A.-M. Di Sciullo, M. Espagnol-Echevarria, C. Reiss, M. Hale, A. Ralli, E. Reuland and members of LOT have been invited to attend prestigious international linguistic conferences such as (GLOW-NELS-WFCCL, etc.) and A.-M. Di Sciullo, A. Ralli, R. Berwick, E. Wherli, S. Cardey, G. Jarema, L. Petitto, C. Jakubowicz, T. Roeper, also participate in psycholinguistic conferences. The expertise of these researchers is thus perfectly complementary and entirely covers the field of natural language processing which properly includes fundamental linguistics, computational linguistics and psycholinguistics.
4— COLLABORATION WITH INDUSTRIAL PARTNERS AND LOCAL ORGANIZATIONS
The industrial partners who are currently participants in our project at the national level are the Centre de recherche informatique de Montréal (CRIM) and Delphes Technologies, and on the international level NEC (Princeton). We propose to widen our links with the industrial partners in Quebec. We have already contacted Alain Auger and Nigel Penny of Alis Technologies, and Alan Bernardi of Bell Canada (see appendix). We are also aiming at having links with the following companies: Locus, whose director Yves Normandin is interested in participating in the phonological module of our project, and BCE Emergis, Canoë and EMC interested in developing the applications of our research in order to apply them to the automatic processing of contents of natural language for specialized fields, of which are finance, health and education.
5 — BenefitsBenefits as regards well-being This project helps to resolve societal problems concerning the ease of communication and the electronic transmission of the contents of natural languages. The recipients and users of these research results are the population at large, software developers who make use of natural language processing techniques, and scientists and engineers of natural languages. The population at large will particularly profit from the results of this project, since this will ease information retrieval more efficiently than the current tools. The project also has applications in health, education and legal fields, where the needs for the efficient treatment of a natural language are increasingly numerous and complex. This project will lead to a marked improvement of social services in these fields. Developers of software, scientists and language engineers will benefit from the systematic treatment of the contents supported by natural languages; this will improve the conception and the development of leading technologies in the domain. Benefits in terms of Quebec visibility regarding NLP We envisage starting up a Federation on natural language processing in Quebec and an associated web site. The federation will gather the current traditional resources in the field of natural language processing field, and will provide a single interactive forum for scientists, students, professionals and industrialists. The services of the federation will include an online full text library with a collection of books, reviews and reference works, an academic directory of the NLP programs, an up-to-date directory of the natural language processing companies; articles by invited scientists on the innovative or controversial aspects of their research, employment lists, CVs and biographies; conferences and virtual poster sessions, and online newsgroups. Benefits in terms of innovation This project will accelerate innovation with regards to research and interdisciplinary development (linguistics, computer science, psychology) and in other fields (science and technology, health, education). Thus, in the field of health, the explicit modeling of the different components of the grammars of natural languages has new applications for the treatment of aphasia and other linguistic deficiencies. This is exactly what is attested by the Lingraphica software program produced in the United States and whose clinical value is now under evaluation. In the field of education, this project will contribute to the innovation of language learning software, since the technology that we propose to develop supports universal grammar properties and parameters of variation among languages. This field’s innovation stems from the fact that the matrix on which the learning of a foreign language is based is generic; the learning software exploits the values of the parameter of variation. The central innovation in sciences and information technologies supported by natural languages lies in the fact that the technology, which we want to develop, carries out configurational processing of information such that it is possible to realize a range of new products in the processing of natural languages. This leads to a new generation of systems of multilingual information processing, systems of translation, and systems of dialogue processing by electronic means exceeding processing that is based on characters or singular words. The outcome that is expected of this innovation will lead to the development of science and new technologies and will accelerate the transfers of knowledge and technologies in the society.
Benefits with regards to the development of industries in Quebec This project will facilitate the transfer of sophisticated technologies from software developing companies, of which is Alis Technologies. Generally, our contribution to the development of industries in Quebec will consist in furnishing a generic platform for all the aspects of natural language processing. This will make it possible to increase the performance (precision, rapidity, analysis and generation of contents) of existing information technologies. As a matter of fact, the information industries in Quebec will be in a position to quickly develop high quality products that are durable, both nationally and internationally. Benefits for future researchers This project will make it possible to keep post-doctoral students educated in Quebec in the domain by providing them with employment. In this regard, four of our students are already working in the industry. It will also make it possible for us to attract the best researchers and developers in natural language processing to Quebec. The funds requested from the VRQ are necessary to ensure the presence of qualified researchers and professionals in the academic world who will not be constrained in accepting foreign positions, given the budgetary restrictions associated to the remuneration for each research group.
6 The VRQ funding allow for multidisciplinary development (linguistic/data processing/ psychology); pluri-institutional development (UQAM/ Université de Montréal, Concordia University, McGill University, Université Laval) and pluri-sectorial development (Universities/industries). This will lead to the rapid growth of the team making it even more outstanding in Quebec, in Canada and internationally. Our strategic position is double. It is situated at the scientific level and at the level of the realization of technological products of excellence which incorporate the results of our works on the automatic processing of the contents of natural languages. The VRQ funding is of great importance to obtain major subventions in the next four years. One second window of opportunity in front of us is that of a prestigious and well provided source of financing. Indeed, the Canadian Networks of Centers of Excellence have just become a permanent organization financing research, particularly research in partnership, which is completely in agreement with the approach we follow. Moreover, it becomes clear that this organization will be more open than before to the sets of social sciences themes. We believe firmly that we will be in a good position from now and in a few years to receive a biannual financing of the RCE for themes on natural languages. This center will regroup professors, researchers, and professionals of national and international repute who will contribute significantly to the field and participate actively in ensuring research in the development and the interaction in linguistics, data processing, cognitive psychology and their multiple technological applications.
7—THE RESOURCES AND BUDGET OF THE PROJECT
At this moment we have the following resources:
In order to achieve the plan of work we presented above, we are requesting for the subvention of 1 670 000$ for four years. The budget distribution for each average subsidy year will make it possible to cover the wages of 5 professionals in linguistics and computational linguistics, 3 post-doctoral trainees in linguistics and 20 higher level students. Moreover in order to insure the good course of such a vast operation, we will need an assistant for the technological survey and assistant for planning and coordination. Taking into account the operations’ time-table (supra.), the financial needs will be more significant in the beginning of the program. At last. We expect one envelop of 5 000$ per annum for each University establishment in Quebec for expenses billed to the research. Here are the precise needs of the research team according to each module.
LEXICAL MODULE We shall elaborate the lexical entries based on asymmetric relations. Given the breadth of work necessary for the coverage of lexical configurations, the added participation of a professional researcher and three research students is necessary in this domain. Given the number of researchers, the group will be in a position to elaborate on the integration of sophisticated lexical elements into several applications, of which are systems of analysis and dialogue generations, electronic games and learning software. MORPHOLOGICAL MODULE In order to ensure the complete coverage of the morphological properties, the added participation of a professional researcher and of two research students is needed to cover the whole of the parameterized morphological configurations. This number of researchers will be able to ensure the integration of processing modules of sophisticated morphological expressions into several applications, of which are: systems of analysis and dialogue generations; generators of lexical expressions; analyzers and generators of lexical expressions, and online translation systems, etc. SYNTACTIC MODULE In order to ensure the complete coverage of the syntactic properties, the added participation of a professional researcher, one post-graduate student in psycholinguistics and two research students is needed to cover the whole of the syntactic configurations. Given the number of researchers, the group will be able to develop and ensure the integration of sophisticated processing modules of the syntactic expressions of a large variety of languages. The applications of this sub-project are multiple. They include systems of analysis and automatic generation of natural languages, and help tools for the composition of summaries and expressions in natural languages. SEMANTIC MODULE This module is to be developed in conjunction with specialists in the processing of knowledge based on the semantic properties of natural languages and their relations with encyclopaedic knowledge, whose formal properties are not directly dependent on grammar. A professional researcher and a research student are needed to develop this module. PHONOLOGICAL MODULE This module will be developed in conjunction with specialists in the processing of the phonological/phonetic properties of languages. The contribution of this group will ensure a complete coverage of the system that we will develop. The development of this module will rather give precedence to applications in speech processing which originate from the properties of the grammars of languages, rather than from the processing signal. A professional researcher and two research students are needed to develop this module.
|