Actualización
This commit is contained in:
@@ -0,0 +1,17 @@
|
||||
Libbrary of statistical profiles for language recognition
|
||||
---------------------------------------------------------
|
||||
|
||||
The sample texts for dieffernt languages have been taken from
|
||||
Perl module: Lingua::LanguageGuesser - http://gensen.dl.itc.u-tokyo.ac.jp/LanguageGuesser/LanguageGuesser_demo.html
|
||||
Statistical Text Analysis - http://boxoffice.ch/pseudo/
|
||||
Some random sample texts have been taken from Wikiedia - http://wikipedia.org/
|
||||
|
||||
All the sample texts should be UTF-8 encoded!
|
||||
|
||||
To understand how does language recognition work you need to read the following remarkable work:
|
||||
W. B. Cavnar and J. M. Trenkle. N-gram-based text categorization. Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, 1994.
|
||||
http://citeseer.ist.psu.edu/cache/papers/cs/810/http:zSzzSzwww.info.unicaen.frzSz~giguetzSzclassifzSzcavnar_trenkle_ngram.pdf/n-gram-based-text.pdf
|
||||
|
||||
License: GNU General Public License 3 as published by the Free Software Foundation (http://www.fsf.org/).
|
||||
Assembled by Ivan Tcholakov, <ivantcholakov@gmail.com>
|
||||
November, 2009
|
||||
Reference in New Issue
Block a user