Thursday, May 17, 2012

Daily Bookmarks 20120516

Fuzzy matching/chunking algorithm - Stack Overflow
http://stackoverflow.com/questions/5122527/fuzzy-matching-chunking-algorithm
9.2.7 Numeric Comparison with Percentage Tolerance  'FieldComparatorNumericPerc'
http://cs.anu.edu.au/~Peter.Christen/Febrl/febrl-0.3/febrldoc-0.3/node41.html
]  n-Gram/2L-approximation: a two-level n-gram inverted index ... http://infolab.dgist.ac.kr/~mskim/papers/CSSE07.pdf
CiteSeerX — A Practical q-Gram Index for Text Retrieval Allowing Errors
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.21.2942
Go together. - Wagn nice app RoR
http://wagn.org/
k-gram indexes for wildcard queries
http://nlp.stanford.edu/IR-book/html/htmledition/k-gram-indexes-for-wildcard-queries-1.html
Group-average agglomerative clustering
http://nlp.stanford.edu/IR-book/html/htmledition/group-average-agglomerative-clustering-1.html
轮排主题索引(Permuterm Subject Index)
http://dxyw.hep.com.cn:8080/downloads/%E4%BF%A1%E6%81%AF%E6%A3%80%E7%B4%A2%EF%BC%88%E5%A4%9A%E5%AA%92%E4%BD%93%EF%BC%89%E6%95%99%E7%A8%8B%EF%BC%88%E7%AC%AC%E4%BA%8C%E7%89%88%EF%BC%89/chap7/pages/7_2_1_2_5.html
Permuterm index for cs707_011712
http://www.scribd.com/doc/78793611/13/Permuterm-index
Building A Python-Based Search Engine — PyCon2012 Schedule & Notes 1.0 documentation
http://andrew-schoen-pycon-2012-notes.readthedocs.org/en/latest/sunday/session_2.html

bi gram index python - Google 搜尋
https://www.google.com/search?q=bi+gram+index+python&hl=zh-TW&client=firefox-a&hs=Ip3&rls=org.mozilla:zh-TW:official&prmd=imvns&ei=L-OzT_WBJa3mmAW-h-ScBQ&start=20&sa=N&biw=1132&bih=597
Indexing Text in Python
http://vermeulen.ca/python-indexing.html
Faceting — Haystack 2.0.0-beta documentation
http://django-haystack.readthedocs.org/en/latest/faceting.html
Sites Using Haystack — Haystack 2.0.0-beta documentation
http://django-haystack.readthedocs.org/en/latest/who_uses.html
term frequency/inverse document frequency (TFIDF) « Infomotions Mini-Musings
http://infomotions.com/blog/tag/term-frequencyinverse-document-frequency-tfidf/
Automatic metadata generation « Infomotions Mini-Musings
http://infomotions.com/blog/2009/07/automatic-metadata-generation/





谈谈BM25评分 - summerbell - ITeye技术网站
http://summerbell.iteye.com/blog/420084
BM25算法浅析 - iPie : 思维碎片
http://ipie.blogbus.com/logs/104136815.html
Project2--Lucene的Ranking算法修改:BM25算法 - wbia2010lkl的专栏 - 博客频道 - CSDN.NET
http://blog.csdn.net/wbia2010lkl/article/details/6046661
利用 Heritrix 构建特定站点爬虫
http://www.ibm.com/developerworks/cn/opensource/os-cn-heritrix/?S_TACT=105AGX52&S_CMP=reg-ccid
搜索引擎内容相关性 | 崔永秀
http://cuiyongxiu.com/201201/01231.html


Patent US7644076 - Clustering strings using N-grams - Google Patents
http://www.google.com/patents/US7644076















-end-

No comments: