MEDJEZIČNO ISKANJE DOKUMENTOV

Jure Dimec

Povzetek

Izvleček

Članek utemeljuje potrebo po razvoju medjezičnega iskanja (MI), relativno novega področja shranjevanja in iskanja informacij v večjezičnih tekstovnih zbirkah, definira njegove cilje in umeščenost med raziskovalnimi področji, ki se ukvarjajo z različnimi vidiki obravnave besedil v elektronski obliki. Kratkemu pregledu zgodovine sledi opis najpomembnejših metodoloških pristopov v MI (prevajanje dokumentov, prevajanje iskalnih zahtev) in jezikovnih virov, ki so pri tem v uporabi. Med viri je največ pozornosti posvečene dvo- in večjezičnim ontologijam (tezavrom, slovarjem, prevajalskim leksikonom in tezavrom kolokacij), korpusom, njihovi gradnji in uporabi pri eksperimentih MI. Članek poskuša predvsem ilustrirati pestrost metodologije področja in manj delovanje konkretnih sistemov. Stanje MI v Sloveniji in obstoj jezikovnih virov, primernih za vključevanje slovenskih besedil v medjezične sisteme, nista obravnavana, ker je to tematika, ki zahteva poseben pregled.

Ključne besede

iskanje informacij; naravni jeziki; medjezično iskanje

Celotno besedilo:

PDF

Literatura

Ballesteros, L., & Croft, B. (1996). Dictionary methods for cross-lingual information retrieval. V Proceedings of the 7th International DEXA Conference on Database and Expert Systems (str. 791-801). URL: http://citeseer.nj.nec.com/ballesteros96dictionary.html

Ballesteros, L., & Croft, W. B. (1997). Phrasal translation and query expansion techniques for cross-language information retrieval. V Proceedings of the 20th ACM SIGIR Conference on Research and Development in Information Retrieval. URL: http://www.cfar.umd.edu/~kanungo/cmsc828K/clara/p84-ballesteros.pdf

Ballesteros, L., & Croft, W. B. (1998). Resolving ambiguity for cross-language retrieval. V C. J. Van Rijsbergen, W. B. Croft, A. Moffat, (Ur.), Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (str. 64-71). ACM Press. URL: http://citeseer.nj.nec.com/ballesteros98resolving.html

Braschler, M., & Schaeuble, P. (1998). Multilingual information retrieval based on document alignment techniques. V C. Nikolau, C. Stephanidis (ur.), Lecture Notes in Computer Science. Second European Conference on Research and Advanced Technology for Digital Libraries ECDL98, Crete.

Brown, P., Pietra, S. D., Pietra, V. D., & Mercer, R. (1993). The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19 (2), 263-311.

Brown, R. D. (1997). Corpus-based query translation for translingual information retrieval. Position paper for SIGIR-97 workshop on Cross-Lingual Information Retrieval. URL: http://www.cs.cmu.edu/~ralf/papers/querytrans.ps

Carbonell, J. G., Yang, Y., Frederking, R. E., Brown, R. D., Geng, Y., Lee, D. (1997). Translingual information retrieval: a comparative evaluation. V Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence. URL: http://citeseer.nj.nec.com/carbonell97translingual.html

Davis, M., & Dunning, T. (1995). A TREC evaluation of query translation methods for multi-lingual text retrieval. V Harman DK (ur.) The 4th Text Retrieval Conference (TREC-4). NIST. URL: http://trec.nist.gov/pubs/trec4/papers/nmsu.ps.gz

Eichmann, D., Ruiz, M. E., & Srinivasan, P. (1998). Cross-language information retrieval with the UMLS metathesaurus. V Proceedings of the 21th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (str. 72-80). URL: http://citeseer.nj.nec.com/218119.html 10. Gilarranz, J., Gonzalo, J., & Verdejo, F. (1997). Language-independent text retrieval with the EuroWordNet multilingual semantic database. V Second Workshop on Multilinguality in the Software Industry: The AI Contribution. URL: http://sensei.ieec.uned.es/NLP/papers/mulsaic97.ps

Haddouti, H. (1999). Survey: multilingual text retrieval and access. Working notes of the AAAI Symposium on Cross Manguage Text and Speech Retrieval. URL: http://www.forwiss.tu-muenchen.de/~haddouti/survey.ps

Hovy, E., Ide, N., Frederking, R., Mariani, J., & Zampolli, A. (ur.). (1999). Multilingual information management: current levels and future abilities. A report Commissioned by the US National Science Foundation and also delivered to the European Commission’s Language Engineering Office and the US Defense Advanced Research Projects Agency, Chapter 2. URL: http://www.cs.cmu.edu/~ref/mlim/index.html

Hull, D. A.,& Grefenstette, G. (1996). Querying across languages: A dictionary-based approach to multilingual information retrieval. V Proceedings of the 19th ACM SIGIR Conference on Research and Development in Information Retrieval.

Lin, D. (1998). An information-theoretic definition of similarity. V Fifteenth international conference of machine learning ICML-98. Madison, USA. URL: ftp://ftp.cs.umanitoba.ca/pub/lindek/papers/sim.ps.gz

Maeda, A., Sadat, F., Yoshikawa, M., & Uemura, S. (2000). Query term disambiguation for web cross-language information retrieval using a search engine. V Proceedings of the 5th International Workshop Information Retrieval with Asian Languages. URL: http://db-www.aist-nara.ac.jp/~aki-mae/pub/IRAL00-e.pdf

Melamed, I. D. (1996a). Automatic construction of clean broad-coverage translation lexicons. V Proceedings of the 2nd Conference of the Association for machine translation in the Americas. Montreal. URL: ftp://ftp.cis.upenn.edu/pub/melamed/papers/amta96.ps.gz

Melamed, I.D. (1996b). A geometric approach to mapping bitext correspondence. V First Conference on Empirical Methods in Natural Language Processing (EMNLP’96), Philadelphia, USA. URL: ftp://ftp.cis.upenn.edu/pub/melamed/papers/emnlp96.ps.gz

Melamed, I. D. (1997) A scalable architecture for bilingual lexicography. Dept. of Computer and Information Science Technical Report #MS-CIS-91-01. URL: ftp://ftp.cis.upenn.edu/pub/melamed/papers/sabletr.ps.gz

Miller, G. A., Beckwith, R, Fellbaum, C., Gross, D., & Miller, K. (1993). Introduction to WordNet: An on-line lexical database. V Five Papers on WordNet. URL: ftp://ftp.cogsci.princeton.edu/pub/wordnet/5papers.pdf

Nie, J.-Y., Simard, M., Isabelle, P., & Durand, R. (1999). Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the web. V Proceedings of the 22th ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, USA. URL: http://www.xrce.xerox.com/people/isabelle/publications/sigir99.ps

Oard, D. W. (1997a) Cross-language text retrieval research in the USA. V The 3rd ERCIM DELOS Workshop, Zurich. URL: http://www.clis.umd.edu/dlrg/filter/papers/delos.ps

Oard, D. W. (1997b). Cross-language information retrieval. SIGIR-97 tutorial. URL: http://www.clis2.umd.edu/dlrg/filter/papers/tutnotes.ps

Oard, D. W., Dorr, B. J. (1996). A survey of multilingual text retrieval. Technical Report UMIACS-TR-96-19. University of Maryland. URL: ftp://ftp.cs.umd.edu/pub/papers/papers/ncstrl.umcp/CS-TR-3615/CS-TR-3615.ps.Z24. Oard, D. W., Dorr, B. J., Hackett, P. G., & Katsova, M. (1998). A comparative study of knowledge-based approaches for cross-language information retrieval. Technical Report CLIS-TR-98-01. University of Maryland. URL: ftp://ftp.cs.umd.edu/pub/papers/papers/ncstrl.umcp/CS-TR-3897/CS-TR-3897.ps.Z

Pevzner, B. R. (1972). Comparative evaluation of the operation of the Russian and English variants of the »Pusto-Nepusto-2« system. Automatic Documentation and Mathematical Linguistics, 6 (2), 71-74.

Pirkola, A. (1998). The effects of query-structure and dictionary setups in dictionary-based cross-language information retrieval. V Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (str. 55-63).

Qiu, Y., & Frei, H. P. (1993.) Concept based query expansion. V Proceedings of the 16th ACM SIGIR Conference on Research and Development in Information Retrieval, Pittsburgh (str. 160-169). URL: http://citeseer.nj.nec.com/qiu93concept.html

Radwan, K. (1994). Vers l’Acces Multilingue en Langage Naturel aux Bases de Donnes Textuelles. PhD thesis. Paris: Universite de Paris-Sud.

Resnik, P., & Melamed, I. D. (1997). Semi-automatic acquisition of domainspecific translation lexicons. V Proceedings of the 7th ACL Conference on Applied Natural Language Processing, Washington, DC. URL: http://citeseer.nj.nec.com/42076.html

Rocchio, J. J. (1971). Relevance feedback in information retrieval. V Salton G (ur.), The SMART retrieval system (str. 313-323). Englewood Cliffs: Prentice Hall.

Salton, G. (1970). Automatic processing of foreign language documents. Journal of the American Society for Information Science, 21 (3), 187-194.

Salton, G. (1973). Experiments in multi-lingual information retrieval. Information processing letters, 2 (1), 6-11. TR 72-154. URL: http://cs-tr.cs.cornell.edu/Dienst/UI/1.0/Display/ncstrl.cornell/TR72-154

Salton, G., & Buckley, C. (1988). Term weighting approaches in automatic text retrieval. Information Processing & Management, 24 (5), 513-523.

Schaeuble, P., & Smeaton, A. F. (1998). An international research agenda for digital libraries. Summary report of the series of joint NSF-EU working groups on future directions for digital library research. URL: http://www.ercim.org/publication/ws-proceedings/DELOS-B/dl_sum_report.pdf

Sheridan, P., & Ballerini, J. P. (1996). Experiments in multilingual information retrieval using the Spider system. V Proceedings of the 19th ACM SIGIR Conference on Research and Development in Information Retrieval. URL: http://citeseer.nj.nec.com/sheridan96experiments.html

Sheridan, P., & Schaeuble, P. (1997). Cross-language information retrieval in a multilingual legal domain. V First European Conference on Research and Advanced Technology for Digital Libraries. URL:// citeseer.nj.nec.com/sheridan97crosslanguage.html

Sheridan, P., Wechsler, M., & Schaeuble, P. (1997). Cross-language speech retrieval: establishing a baseline performance. V Proceedings of the 20th ACM SIGIR Conference on Research and Development in Information Retrieval. URL: http://citeseer.nj.nec.com/142488.html

Sperer, R., & Oard, D. W. (2000). Structured translation for cross-language information retrieval. V Proceedings of the 23th ACM SIGIR Conference on Research and Development in Information Retrieval. Athens. URL: http://citeseer.nj.nec.com/298892.html

UNESCO (1971). Guidelines for establishment and development of multilingual scientific and technical thesauri for information retrieval. Paris: UNESCO.

Vorhees E. (1994). Query expansion using lexical-semantic relations. V Proceedings of the 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, Dublin (str. 61-69).

Vossen, P. (1998). EuroWordNet: building a multilingual database with wordnets for European languages. The ELRA Newsletter, 3 (1), 7-10. URL: http://www.hum.uva.nl/~ewn/docs/ELRARTF.zip

Xu, J., & Croft, W. B. (1998). Corpus-based stemming using co-occurrence of word variants. ACM Transactions on Information Systems, 16 (1), 61-81. URL: citeseer.nj.nec.com/32742.html

Yang, Y., Brown, R. D., Frederking, R. E., Carbonell, J. G., Geng, Y., & Lee, D. (1997). Bilingual corpus-based approaches to translingual information retrieval. V 2nd Workshop on Multilinguality in Software Industry: The AI Contribution (MULSAIC’97). Nagoya.