The Sanskrit Heritage Site
Shri Yantra

Version 258 [2011-09-15]
Mirror site UoH

Welcome to the Sanskrit Heritage site. You are invited to visit the first hypertext Sanskrit dictionary, available interactively through its index. It currently gives meanings in French, but has been designed for multilingual use. You may also download printable versions of this dictionary, under several formats, as explained below. This site offers also certain linguistic services for the Sanskrit language, such as a Sanskrit Reader that parses Sanskrit transliterated text into Sanskrit banks of tagged hypertext. Various phonological and morphological tools are also provided, as explained below. A general introduction to the site facilities is available here.

Book form

Portable Document Format

You may download the pdf file from PDF. This document is readable through Acrobat Reader, a well-known browser plug-in from Adobe freely available on Internet. Since the document is rather large, you have to account for some delay in loading its 3 Mb.

Postscript

For those of you who prefer Postscript, it is also available as a compressed Postscript file.

Web form

Interactive browsing

The dictionary may be accessed through a search engine implementing an index.

Your browser must be Strict XHTML 1.0 compliant, and for proper viewing of Sanskrit text you must have installed on your system open type fonts for roman transliteration with diacritics, and for devanagari. For instance, install fonts IndUni, available from John Smith's site. A Unicode-compliant font for devanagari with proper ligatures is Apple's Devanagari MT for Macintosh OS X stations. For Windows users, installation of font 'Arial MS Unicode' is advised for proper rendering.

You may have to fiddle with the controls of your browser, so that the font declarations from the dictionary pages get precedence over the standard selection, and thus encoding is specified as Unicode compliant (UTF-8 encoding).

Remark that most words are given with their etymology as hypertext links. You may thus navigate from a word to its components, down to its roots. Also, the gender declarations of the main entries are mouse-sensitive, and give you direct access to the relevant declension table. Similarly, the present class mark of the verbal roots gives access to the conjugation schemes. Also for verb entries, preverbs lead you to the correspondingly prefixed derived verbs.

Sanskrit made easy

If you want to search for a Sanskrit word without knowing its exact transliteration, go to section "Sanskrit made easy" of the index page, which allows you to search for words without knowing precise diacritics usage. For instance, search Vishnou, Siva, or the grammarian Panini.

Sanskrit Grammarian
Panini

This interface gives the declension tables for Sanskrit substantives. Try out this declension engine by submitting Sanskrit stems with intended gender. The same transliteration conventions as for the dictionary index apply. For instance, submit "deva" with gender Mas, or "devii" with gender Fem, or "brahman" with gender Neu. The fourth button, labeled "Any", may be used for the words which take their gender from the context, such as deictic pronouns ("aham", "tvad"), or numeral words such as "dva", "tri", etc.

A conjugation engine for roots is also available. It handles the full present system: present indicative, imperfect, imperative and optative, as well as the passive present, the perfect, the aorist and the future. Participial stems, absolutives and infinitives are listed as well. Some secondary conjugations (causative, intensive, desiderative) are also generated, for the full present and future systems. Try out this conjugation engine with data such as "bhuu" 1, "as" 2, "m.rj" 2, "han" 2, "haa" 3, "hu" 3, "daa" 4, "su" 5, "p.r" 6, "yuj" 7, "k.r" 8, "j~naa" 9, "namas" 10. In order to get the secondary conjugations of a root, enter code 0. You may cascade by generating declensions of the generated participial stems.

Lemmatizer

Conversely, a lemmatiser attempts to tag inflected words. Try for instance (in Velthuis transliteration format) "devaat", "jagmivaan", "a.s.tau" (clicking on Noun) or "apibat", "akaar.siit", "dudoha", "vaahyate" etc (clicking on Verb). This lemmatizer knows about inflected forms of derived stems in some secondary derivations. For instance, "darzayi.syati" is found as conjugated form: { ca. fut. a. sg. 3 }[d.rś_1], "dariid.rzyate" yields { int. pr. m. sg. 3 }[d.rś_1], "did.rk.sate" yields { des. pr. m. sg. 3 }[d.rś_1] and "bibhik.se" yields { des. pft. m. sg. 3 | des. pft. m. sg. 1 }[bhaj]. N.B. Do not attempt to lemmatize verbal forms with preverbs - this will not work, it knowns only how to invert root forms. Lemmatizing more complex forms is possible through the Sanskrit Reader below.

Morphology

A dictionary of inflected forms of Sanskrit words is provided in XML form under various transliteration schemes. Please visit the Sanskrit linguistic resources site.

Sanskrit Reader

Try our experimental
Sanskrit Reader. It is able to segment simple sentences. You may use it to analyse sandhi from compounds in the Segmentation mode. Try for instance to segment "sugandhi.mpu.s.tivardhanam". Then push the "Tagging" button and get the fully tagged sentence. For a simple sentence, try "maarjaarodugdha.mpibati", or "tacchrutvaasa~njaya uvaaca".

The precise grammar used to recognize sentences is given here as a local automaton graph. In this diagram, transparent nodes are non generative, and colored nodes correspond to the lexical categories recognized by the lemmatizer. The category Auxi is the subset of Verb consisting of conjugated forms of roots "k.r", "as" and "bhuu" used as auxiliaries in periphrastic constructions. Pv denotes sequences of preverbs.

Sanskrit Parser

If in the reader you press the "Parsing" button, many irrelevant pseudo-solutions are eliminated. Try for instance example "pratilekhanenaak.saraa.nisundaraa.nibhavanti". There are 80 potential segmentation solutions, but the parser keeps only 1.

Sentences may be broken with spaces for piecewise reading and for curbing down overgenerative items. For instance, the sentence "pitaaputramabhartsayat" returns two solutions because sandhi is ambiguous, but presented as "pitaa putramabhartsayat" only the intended solution is produced.

In other cases, ambiguities may remain. For instance, "ti.s.thanbaalaka upaadhyaayasyapraznaanaamuttaraa.nikathayati" proposes two solutions since the hiatus is potentially ambiguous.

The default interface, indicated by the strength parameter "Simplified", will fail for complex sentences. For instance, it does not recognize vocatives, and it knows only a restricted set of participial forms. If you do not succeed in obtaining the correct solution to a sentence, try the more powerful analyser by pressing the "Complete" button. Vocatives must be separated by spaces, like in: "raama apitvamadyapraata.h svasurg.rhamagaccha.h".

Each solution returned with the parser is marked with a green check sign, which may be pressed to get the semantic analysis of the sentence in terms of roles (kāraka).

The parser recognizes sentences. It may be made to recognize nominal phrases, provided one presses the "Contextual topic" button with the intended gender. You may for instance analyze the compound: "pravaran.rpamuku.tama.nimariicima~njariicayacarcitacara.nayugala.h" as a masculine nominal. Alternatively, one can ask to recognize this form as a single word, by pressing "Word" rather than the default "Sentence" text category. When breaking the text with spaces, the Word mode allows to recognize texts given in padapāṭha fashion. It is also possible to recognize sequences of chunks in final sandhi form separated by spaces, where sandhi will be assumed to be undone between the chunks, by specifying the "Unsandhied" mode in the reader interface.

Sanskrit Tagger

The semantic analysis may be still ambiguous, since a given segment may be decorated by several morphological categories. All interpretations are presented under the role matrix, sorted by increasing penalty. Check for your favorite interpretation in this list, and select it by clicking on its green heart symbol. The system will return the corresponding unambiguously tagged sentence, as a page which you may save on your own station. Iterating this process allows you to progressively tag a Sanskrit text with the Sanskrit reader assistance.

Other Sanskrit Resources

We have on on-going cooperation with the Department of Sanskrit Studies of the University of Hyderabad on computational linguistics for Sanskrit. A joint research team has been formed, together with scholars from the Sanskrit Library team at Brown University. In october 2007 we organized the First International Sanskrit Computational Linguistics Symposium. Please visit the Symposium Site. This was followed by the Second Symposium in may 2008 at Brown University, by a third one in january 2009 at Hyderabad University. and by a fourth one in december 2010 at JNU.

Yinyang

Cool Joe Caml The Zen Library

This site reflects an ongoing project of Sanskrit processing on a comprehensive software platform. The project is based on a structured lexicographic database, compiled from the Sanskrit Heritage dictionary, and on the Zen computational linguistics toolkit. This toolkit is a library of programs implemented in Pidgin ML, functional core of the Objective Caml programming language. The Zen library and its documentation are available as free software under the Gnu Lesser General Public License (LGPL) from the Zen site.

Ganesh The Sanskrit Portal

Please visit our Sanskrit Portal to find links to other Sanskrit resources.

Om Artwork credits

Orissan artwork at this site courtesy of Shauraj Rath. © Screenex, Bhubaneshwar, Ekamra, Orissa. All rights reserved.
Wallpaper om images courtesy of Vishvarupa.com.
Ganesh wallpaper courtesy of François Patte.
Shri Yantra design © Gérard Huet 1990.

Le chameau Ocaml Top | Index | Stemmer | Grammar | Sandhi | Reader | Help | Portal Xhtml valid
© Gérard Huet 1994-2011