Saturday, October 29, 2011

Daily Bookmarks 20111029

[Unix] 使用 Tarball 安裝 SQLite + Python + Apache + Django @ FreeBSD 7.2 @ 第二十四個夏天後 :: 痞客邦 PIXNET ::
http://changyy.pixnet.net/blog/post/25225752-%5Bunix%5D-%E4%BD%BF%E7%94%A8-tarball-%E5%AE%89%E8%A3%9D-sqlite-%2B-python-%2B-apache-%2B-djan
Installation - prettytable - How to install PrettyTable - A simple Python library for easily displaying tabular data in a visually appealing ASCII table format - Google Project Hosting
http://code.google.com/p/prettytable/wiki/Installation
搜索引擎之中文分词(Chinese Word Segmentation)简介 | 中文Flex例子
http://blog.minidx.com/2008/01/04/352.html
邮件过滤
http://people.ubuntu.com/~happyaron/ubuntu-docs-test/lucid/serverguide/zh_CN/mail-filtering.html
Minidx文件管理系统 | Minidx全文搜索引擎 - 主页
http://cn.minidx.com/
Python NLTK chinese - Google 搜尋
http://www.google.com.tw/search?q=Python+NLTK+chinese&hl=zh-TW&prmd=imvns&ei=eoitTvTvC5KkiQfyivnGDw&start=10&sa=N&biw=1235&bih=663
Machine Learning for Email - O'Reilly Media
http://shop.oreilly.com/product/0636920022350.do
http://nltk.googlecode.com/svn/trunk/doc/book/ch03.html
http://nltk.googlecode.com/svn/trunk/doc/book/ch03.html
伯克利大學學生開發圖片驗證碼Picatcha終結文字驗證碼CAPTCHA - Yahoo!奇摩3C科技
http://tw.tech.yahoo.com/network_trend/article/id/20953/
Classification: Spam Filtering python - Google 搜尋
http://www.google.com.tw/search?q=Classification:+Spam+Filtering+python&hl=zh-TW&prmd=imvns&source=lnt&tbs=lr:lang_1zh-CN%7Clang_1zh-TW&lr=lang_zh-CN%7Clang_zh-TW&sa=X&ei=LYatTsi1BamtiQfvo_jMDA&ved=0CAgQpwUoATgK&biw=1235&bih=663










印象·PKU - 人人小站
http://zhan.renren.com/jasony

Thursday, October 27, 2011

Daily Bookmarks 20111027

ik-analyzer - java开源中文分词器 - Google Project Hosting
http://code.google.com/p/ik-analyzer/
mmseg4j - MMSEG for java lucene chinese analyzer, or for solr - Google Project Hosting
http://code.google.com/p/mmseg4j/
hadoop的1TB排序 - NoSQLFan - 关注NoSQL相关技术、新闻
http://blog.nosqlfan.com/html/417.html
MongoDB在盛大大数据量下的应用 - NoSQLFan - 关注NoSQL相关技术、新闻
http://blog.nosqlfan.com/html/3315.html
Twitter同步人人脚本(Updated at 2010-04-12) | 一阁Blog
http://yegle.net/2010/04/12/php-script-synchronizing-twitter-to-renren-updated-version/
说说MMSeg分词 - bqrm_521(小奎) - 博客园
http://www.cnblogs.com/bqrm/archive/2008/08/16/1269258.html
py-instantse - Python instant search module for quora-like website - Google Project Hosting
http://code.google.com/p/py-instantse/
Apache Lucy FAQ
http://incubator.apache.org/lucy/faq.html
利用Xapian构建自己的搜索引擎:Xapian简介 - O-O Sharp - 博客频道 - CSDN.NET
http://blog.csdn.net/visualcatsharp/article/details/4176083
[转载]大数据量,海量数据 处理方法总结(转载)_cheriec_新浪博客
http://blog.sina.com.cn/s/blog_4d3a41f40100ic9d.html

Daily Bookmarks 20111026

樂多開發日誌:淺談 NoSQL - Mongo DB - 樂多日誌
http://blog.roodo.com/develop/archives/14702009.html
record db python - Google 搜尋
http://www.google.com/search?q=record+db+python&hl=zh-TW&client=firefox&rls=org.mozilla:zh-TW:official&prmd=imvns&ei=tj6oTuHzB-ugmQWM5czMDw&start=50&sa=N&biw=1280&bih=802
我从创立3家科技公司的经历中学到了什么 | 36氪
http://www.36kr.com/p/38490.html
Frank的五四三: Virtual Python
http://franks543.blogspot.com/2007/02/virtual-python.html
Named Tuples « Python recipes « ActiveState Code
http://code.activestate.com/recipes/500261/
Managing Records in Python (Part 1 of 3)
http://www.artima.com/weblogs/viewpost.jsp?thread=236637
ZODB - a native object database for Python — ZODB v3.10.3 documentation
http://www.zodb.org/index.html
PyDbLite
http://www.pydblite.net/
創業三部曲之一——學技術 - - 香港矽谷
http://www.hksilicon.com/kb/articles/37736?mobi=true
buzhug, a pure-Python database engine
http://buzhug.sourceforge.net/
MongoDB | 泥泞的沼泽
http://davidx.me/tag/mongodb/
Full Text Search in Mongo - MongoDB
http://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo#FullTextSearchinMongo-TextSearch
MongoDB Full Text Search With Sphinx
http://www.scribd.com/doc/33308510/MongoDB-Full-Text-Search-With-Sphinx
MongoDB | 泥泞的沼泽
http://davidx.me/tag/mongodb/
mongodb如何做全文检索和模糊查询? - IT知道
http://zhidao.it/questions/98

Tuesday, October 25, 2011

Daily Bookmarks 20111025

文本聚类研究 | BT的花
http://www.dup2.org/node/1015
Clustering - Fuzzy C-means
http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/cmeans.html
python决策树算法_bicloud_新浪博客
http://blog.sina.com.cn/s/blog_61c463090100ljv1.html
python 聚类算法_bicloud_新浪博客
http://blog.sina.com.cn/s/blog_61c463090100ljv0.html
萬方數據知識服務平臺鏡像版-論文檢索結果
http://g.wanfangdata.com.hk/S/paper.aspx?f=onlyFull&q=%E5%88%A9%E7%94%A8%E6%A0%87%E7%AD%BE%E7%9A%84%E5%B1%82%E6%AC%A1%E5%8C%96%E6%90%9C%E7%B4%A2%E7%BB%93%E6%9E%9C%E8%81%9A%E7%B1%BB%E6%96%B9%E6%B3%95&n=10&PID=&CID=&userid=
利用標簽的層次化搜索結果聚類方法
基于文本聚類的新聞信息聚合的研究
http://d.wanfangdata.com.cn/Thesis_Y1063770.aspx
Airiti Library華藝線上圖書館
http://www.airitilibrary.com/searchdetail.aspx?DocIDs=0253987x-200904-43-4-18-21-a
集体智慧编程_聚类 - 博文详细页
http://tayoto.tap.cn/article-view/article-1xtcr0s7q0102
利用后缀树来聚类 - javaeye - ITeye技术网站
http://mxdxm.iteye.com/blog/391263
數據聚類 - 维基百科,自由的百科全书
http://zh.wikipedia.org/wiki/%E6%95%B0%E6%8D%AE%E8%81%9A%E7%B1%BB
小默的研究中心 » 数据聚类 Good Must be read
http://wpxiaomo.sinaapp.com/?p=426
浪点python之我的地盘 - 张沈鹏,在路上... - ITeye技术网站
http://zsp.iteye.com/blog/163269
聚類 python - Google 搜尋
http://www.google.com/search?q=%E8%81%9A%E9%A1%9E+python&hl=zh-TW&client=firefox&rls=org.mozilla:zh-TW:official&prmd=imvns&ei=F9GmTv6HK6rMmAWhg-DGDw&start=40&sa=N&biw=1280&bih=862
小默的研究中心 » 集体智慧
http://wpxiaomo.sinaapp.com/?cat=29
Amazon.com: Programming Collective Intelligence: Building Smart Web 2.0 Applications (9780596529321): Toby Segaran: Books
http://www.amazon.com/Programming-Collective-Intelligence-Building-Applications/dp/0596529325/

Friday, October 21, 2011

Daily Bookmarks 20111021

life is short - you need Python!: Create thumbnail by resizing image in Python
http://love-python.blogspot.com/2008/08/create-thumbnail-by-resizing-image-in.html
Lanczos Algorithm Analyse | Outlier Life
http://www.ioutlier.com/algorithm/lanczos-algorithm-analyse/
Python Cloud: GFS the Google File System in 199 Lines of Python
http://clouddbs.blogspot.com/2010/11/gfs-google-file-system-in-199-lines-of.html
Python Cloud: Google's MapReduce in 98 Lines of Python
http://clouddbs.blogspot.com/2010/10/googles-mapreduce-in-98-lines-of-python.html
Tahoe: A Secure Distributed Filesystem
http://tahoe-lafs.org/~warner/pycon-tahoe.html
arighi's blog: PyGFS: implementing a distributed filesystem in python
http://arighi.blogspot.com/2008/01/pygfs-implementing-distributed.html

Tuesday, October 18, 2011

Daily Bookmarks 20111018

python
python at sa note – such another note site
http://www.sanote.org/?cat=13
How to change urllib User-Agent - Articles
http://wolfprojects.altervista.org/changeua.php
Consistent hashing implemented simply in Python - amix.dk
http://amix.dk/blog/post/19367
webresize.py
http://www.mm-log.com/system/files/mm%20extern%20webresize.py_.txt
朱隸安貓囈語錄: Flickr to Picasa 搬家script~~
http://julianshen.blogspot.com/2011/07/flickr-to-picasa-script.html
Python Course: Text Classification in Python
http://www.python-course.eu/text_classification_python.php


Hadoop初体验——搭建hadoop简单实现文本数据全局排序 - Leo Zhang - 博客园
http://www.cnblogs.com/vivounicorn/archive/2011/09/20/2182433.html
代码托管 bitbucket.org | {:web=>:wxianfeng}
http://wxianfeng.com/2011/03/04/hg
UserAgentString.com - Chrome version 14.0.835.94
http://www.useragentstring.com/
wikipediadownloader.py - wikiteam - tools for archiving wikis - Google Project Hosting
http://code.google.com/p/wikiteam/source/browse/trunk/wikipediadownloader.py
Python筆記:產生N-gram以及簡單頻率統計 @ Freedom is not free :: 痞客邦 PIXNET ::
http://hambao.pixnet.net/blog/post/18823664-python%E7%AD%86%E8%A8%98%EF%BC%9A%E7%94%A2%E7%94%9Fn-gram%E4%BB%A5%E5%8F%8A%E7%B0%A1%E5%96%AE%E9%A0%BB%E7%8E%87%E7%B5%B1%E8%A8%88

Gentoo
制作一张最小系统的gentoo安装光盘 - Steven的日志 - 网易博客
http://colder.blog.163.com/blog/static/17394661820114308749227/

Sunday, October 16, 2011

Daily Bookmarks 20111015

Case
PHP教學-CKeditor網頁編輯器與CKfinder上傳整合應用 | ♣梅問題‧教學網【Minwt】♣
http://www.minwt.com/?p=2848
How To: Creating a Simple Plugin for TinyMCE | alligatorsneeze.com
http://www.alligatorsneeze.com/how-creating-simple-plugin-tinymce
10 best WYSIWYG Text and HTML Editors for Your Next Project
http://www.1stwebdesigner.com/design/10-best-wysiwyg-text-and-html-editors-for-your-next-project/
WYSIWYG-HTML Edit
http://www.webresourcesdepot.com/category/goodies/wysiwyg-html-edit/
NicEdit - WYSIWYG Content Editor, Inline Rich Text Application
http://nicedit.com/index.php

python
Getting Started with virtualenv (Isolated Python Environments) | Mitch Fournier
http://mitchfournier.com/2010/06/25/getting-started-with-virtualenv-isolated-python-environments/
virtualenv — virtualenv v1.6.4 documentation
http://www.virtualenv.org/en/latest/
Installation instructions — pip 1.0.2 documentation install virtualenv not root user
http://www.pip-installer.org/en/latest/installing.html
Python于Web 2.0网站的应用 - QCon Beijing 2010
http://www.slideshare.net/hongqn/qcon2010-3881323

Thursday, October 13, 2011

Daily Bookmarks 20111013

【转】关键字提取算法之TF-IDF扫盲 - 码农.KEN - 博客园
http://www.cnblogs.com/ken-zhang/archive/2010/06/20/1761108.html
machine learning in biomedicine: pubmedpy: a simple module for fetching and tf-idf encoding biomedical texts
http://mlbiomedicine.blogspot.com/2009/07/pubmedpy-simple-module-for-fetching-and.html
Dennis Ritchie: 1941-2011
http://www.muppetlabs.com/~breadbox/rip-dmr.html
协同过滤 « 等待另一个人的奥林匹斯
http://blog.odichy.org/tag/%E5%8D%8F%E5%90%8C%E8%BF%87%E6%BB%A4
《挖掘社交网络(影印版)》——O'Reilly 北京
http://oreilly.com.cn/index.php?func=book&isbn=978-7-5641-2686-5
TFIDF与文章分类 - leen2010 - 博客大巴
http://leen2010.blogbus.com/logs/124793554.html
Myxberry's Blog: TF 和 IDF
http://myxberry.blogspot.com/2008/11/tf-idf.html
TF/IDF – eph's blog
http://blog.ieph.net/archives/tag/tfidf
TFIDF - Lighter - 賴特筆記
http://www.lighter.idv.tw/program/php/62-tfidf.html

A quick foray into linear algebra and Python: tf-idf
http://timtrueman.com/a-quick-foray-into-linear-algebra-and-python-tf-idf/

python
使用python提取英文文章中的单词及出现的次数(原创) - 冰山一角 - 51Testing软件测试网 51Testing软件测试网-中国软件测试人的精神家园 - Powered by X-Space
http://www.51testing.com/?uid-61753-action-viewspace-itemid-154953
用python提取百度贴吧的小说 - ToddNet2012 - 博客园
http://www.cnblogs.com/ToddNet2012/archive/2011/10/05/2199554.html
MyProject / ExtMainText —— 提取html文档正文 | Elias的个人主页
http://www.elias.cn/MyProject/ExtMainText

The Python textcluster Package | dave dash
http://davedash.com/2010/07/08/the-python-textcluster-package/



keyword extraction - Google 搜尋
http://www.google.com.tw/search?hl=zh-TW&q=keyword+extraction&oq=keyword+ex&aq=0&aqi=g7g-m3&aql=&gs_sm=e&gs_upl=8306l13403l0l15061l11l11l0l1l1l0l267l2265l0.3.7l10l0
Automatic Terminology Extraction - Homepage of Cheng-Zhi Zhang
https://sites.google.com/site/zhangczhomepage/terminology-extraction

Wednesday, October 12, 2011

Daily Bookmarks 20111012

How To Write A Simple Web Crawler In Ruby
http://www.skorks.com/2009/07/how-to-write-a-web-crawler-in-ruby/
Search Fundamentals – Basic Indexing
http://www.skorks.com/2010/01/search-fundamentals-basic-indexing/
CodeKata How to become a better devloper
http://codekata.pragprog.com/

http://blog.csdn.net/huanhuolang/article/details/6287224
Huge CSV and XML Files in Python - Irrational Exuberance
http://lethain.com/handling-very-large-csv-and-xml-files-in-python/
Lazy Method for Reading Big File in Python? - Stack Overflow
http://stackoverflow.com/questions/519633/lazy-method-for-reading-big-file-in-python
海量文档查同或聚类问题 -- Locality Sensitive Hash 算法 - fxjwind - 博客园
http://www.cnblogs.com/fxjwind/archive/2011/07/05/2098642.html


Data Structures and Algorithms with Object-Oriented Design Patterns in Python
http://www.brpreiss.com/books/opus7/Algorithm Education in Python
http://www.ece.uci.edu/~chou/py02/python.html
Desperado's World: Soring Algorithm - Python Implementation
http://minchuanwang.blogspot.com/2008/09/soring-algorithm-python-implementation.html
Dijkstra's algorithm for shortest paths « Python recipes « ActiveState Code
http://code.activestate.com/recipes/119466-dijkstras-algorithm-for-shortest-paths/
python - Finding k-nearest neighbors for a given vector? - Stack Overflow
http://stackoverflow.com/questions/5684370/finding-k-nearest-neighbors-for-a-given-vector



Monday, October 10, 2011

Daily Bookmarks 20111010

python
beautifulsoup解析中文网页乱码问题解决_初生牛犊要怕虎_百度空间
http://hi.baidu.com/luowenhan2008/blog/item/50ce6e0b1737aa0a94ca6bee.html
Weka使用笔记_初生牛犊要怕虎_百度空间
http://hi.baidu.com/luowenhan2008/blog/item/e9e37f19f20093a14bedbce8.html
初学python,感受和C的不同 | Windstorm
http://www.kunli.info/2008/06/22/c-python/
Neo4j Blog: Modeling categories in a graph database
http://blog.neo4j.org/2010/03/modeling-categories-in-graph-database.html
利用orange进行关联规则挖掘 » 超群.com的博客
http://www.fuchaoqun.com/2008/08/data-mining-with-python-orange-association_rule/



Neo4j 中文資源 Chinese Guide
http://neo4j.tw/
数据挖掘讲座9 Association Analysis:FP-growth and Others_百度文库
http://wenku.baidu.com/view/5bd123283169a4517723a394.html

other
聊聊这次找Google实习的过程 | Windstorm
http://www.kunli.info/2011/04/10/google-summer-intern-2011/
Reddit联合创始人Alexis Ohanian谈创业经验:Reddit之所以成为Reddit | 36氪
http://www.36kr.com/p/15746.html

开发技术 « 等待另一个人的奥林匹斯 分析各種類的推薦方式~~~~  推荐系统资料整理之二:常用相似度计算方法
http://blog.odichy.org/category/%E5%BC%80%E5%8F%91%E6%8A%80%E6%9C%AF
How Reddit ranking algorithms work - amix.dk
http://amix.dk/blog/post/19588

LaUDMS开发社区
http://166.111.131.53/trac/Test/wiki


Must be read
Programmer's Tracer » SinaMood基于新浪微博的信息挖掘2
http://blog.meecoder.com/archives/276

Sunday, October 09, 2011

Daily Bookmarks 20111009

How to change urllib User-Agent - Articles
http://wolfprojects.altervista.org/changeua.php

DNA soso search engine
dnasearchengine - a DNA search engine like google - Google Project Hosting
http://code.google.com/p/dnasearchengine/
joyfire 王乐珩 » 腾讯的DNA搜索引擎
http://wangleheng.net/2011/10/soso_dna_search_engine/

Saturday, October 08, 2011

Daily Bookmarks 20111008

Practical Data Analysis in Python
http://www.slideshare.net/hmason/practical-data-analysis-in-python
Building a very easy text classifier in python | Ulrich Scheller, Software Developer
http://www.ulrich-scheller.de/?p=52

Python Course: Introduction into Text Classification 理論
http://www.python-course.eu/text_classification_introduction.php
Building Decision Trees in Python - O'Reilly Media
http://onlamp.com/pub/a/python/2006/02/09/ai_decision_trees.html
Ian Bicking: a blog :: lxml: an underappreciated web scraping library
http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/
Beautiful Soup documentation
http://www.crummy.com/software/BeautifulSoup/documentation.zh.html

Best way to sort 1M records in Python - Stack Overflow
http://stackoverflow.com/questions/1180240/best-way-to-sort-1m-records-in-python


Google Search
python classify - Google 搜尋
http://www.google.com.tw/search?q=python+classify&hl=zh-TW&prmd=imvnsb&ei=dz-PTpHgPKiUiQeZ-aCXDg&start=10&sa=N&biw=1235&bih=663

Tools
web-classify - 用于网页分类的python工具包 - Google Project Hosting
https://code.google.com/p/web-classify/
Python | 一步一步学python - Part 3
http://www.91python.com/archives/category/python/page/3



IR Class
Web Retrieval and Mining (Fall 2008)
http://irlab.csie.ntu.edu.tw/wm2008/index.html
Search Engines: Information Retrieval in Practice
http://www.search-engines-book.com/
CS 6913 Spring 2011 Home Page CS 6913: Web Search Engines (Spring 2011)
http://cis.poly.edu/cs912/

分類語料
搜狗实验室资料下载 - 文本分类语料库
http://www.sogou.com/labs/dl/c.html
Tutorial - python-data-mining-platform - PyMining - 基于Python的数据挖掘平台(Data-mining platform based on Python) - Google Project Hosting
http://code.google.com/p/python-data-mining-platform/wiki/Tutorial
为什么Hadoop将一定会是分布式计算的未来? - LeftNotEasy - 博客园
http://www.cnblogs.com/LeftNotEasy/archive/2011/08/27/why-map-reduce-must-be-future-of-distributed-computing.html

Solve
search substring in list of strings - Python
http://bytes.com/topic/python/answers/629979-search-substring-list-strings
Find Key by Value in Python Dictionary - drumcoder.co.uk
http://drumcoder.co.uk/blog/2010/sep/11/find-key-value-python-dictionary/
Extracting plain text from HTML
http://www.effbot.org/zone/textify.htm
Mappings and Dictionaries — Building Skills in Python
http://homepage.mac.com/s_lott/books/python/html/p02/p02c05_maps.html

Thursday, October 06, 2011

Daily Bookmarks 20111006

python读文件一定行数的读取 - 小白私生活
http://androidpython.appspot.com/2011/03/17/%E8%A1%8C%E8%AF%BB%E5%8F%96.html
Web数据抓取库:sitedigger_redice's Blog
http://www.redicecn.com/html/blog/webscraping/2010/1221/209.html
《Python自然语言处理》学习笔记索引 - 一块努力的牛皮糖 - 博客园
http://www.cnblogs.com/yuxc/archive/2011/08/29/2157415.html
python下读取邮件列表,并过滤邮件,然后调用网页发短信 - 肖小胖的博客 - ITeye技术网站
http://flowercat.iteye.com/blog/253289
如何用python过滤html标签和准确的提取内容_skeryl的空间--ubuntu & Java& python_百度空间
http://hi.baidu.com/skeryl/blog/item/c1807f7a4977bfe52f73b301.html


Must Be Read
Python实现协同过滤
http://www.slideshare.net/jingc/python-3394066
Python Web数据抓取(xpath版)_redice's Blog
http://www.redicecn.com/html/Python/20101112/190.html
过滤字符串,小试python速度 | Jean - 记录成长历程
http://www.zhangyiqun.net/919.html
列表过滤重复值怎么过滤 - Python - ChinaUnix.net
http://bbs.chinaunix.net/thread-1667989-1-1.html
Fastest way to uniqify a list in Python - Peterbe.com (Peter Bengtsson on Python, Django, Zope, Kung Fu, London and photos)
http://www.peterbe.com/plog/uniqifiers-benchmark
急求助:python中文兼容问题,striing/list/dict不同显示结果?_百度知道
http://zhidao.baidu.com/question/96083697.html
Python中如何将一个List按照中文排序? | 曲径通幽
http://wfxiang.info/?p=241
ping不見路: Python 3.0 的新玩意(第一部)
http://pingyeh.blogspot.com/2008/12/python-30.html
Dive Into Python 中文版 - 2.6. Filtering lists
http://www.kuqin.com/diveinto_python_document/apihelper_filter.html
[转]Python中文排序
http://abloz.com/2010/01/03/transfer-python-sort-chinese.html
将文本倒过来写防敏感词
http://abloz.com/2010/07/27/i-will-write-the-name-backwards.html
Python 中文排序 [Python俱乐部]
http://www.pythonclub.org/python-basic/chinese-sort
The Anatomy of a Search Engine
http://infolab.stanford.edu/~backrub/google.html
PYnotes good site post must be read
http://www.pynotes.info/post/10/




Book Site
How to Implement a Search Engine Part 1: Create Index | Arden Dertat
http://www.ardendertat.com/2011/05/30/how-to-implement-a-search-engine-part-1-create-index/
Introduction to Information Retrieval
http://nlp.stanford.edu/IR-book/information-retrieval-book.html

Tuesday, October 04, 2011

Daily Bookmarks 20111003

High-performance XML parsing in Python with lxml
http://www.ibm.com/developerworks/xml/library/x-hiperfparse/#listing4
使用由 Python 编写的 lxml 实现高性能 XML 解析
http://www.ibm.com/developerworks/cn/xml/x-hiperfparse/
XML组, 网络与移动数据管理实验室, OrientX, 中国人民大学数据库研究组
http://idke.ruc.edu.cn/projects/xml_cn.htm#dataman_nav
Home, WAMDM, Database Group at Renmin University of China
http://idke.ruc.edu.cn/wamdm/
Xiaofeng Meng's Homepage
http://idke.ruc.edu.cn/xfmeng/cindex%20new.htm
数据工程与知识工程教育部重点实验室
http://deke.ruc.edu.cn/home/show/42

ToRead

Build a Web spider on Linux
http://www.ibm.com/developerworks/library/l-spider/
Working with XML Documents and Python - Python
http://www.devshed.com/c/a/Python/Working-with-XML-Documents-and-Python/
XML DB python record - Google 搜尋
http://www.google.com.tw/search?aq=f&gcx=w&ix=c2&sourceid=chrome&ie=UTF-8&q=XML+DB+python+record
Sorting XML Records with ElementTree
http://effbot.org/zone/element-sort.htm
python - Searching a normal query in an inverted index - Stack Overflow

http://stackoverflow.com/questions/3944910/searching-a-normal-query-in-an-inverted-index

Data


CSE 124: Lab 3 - Distributed Search Engine
http://cseweb.ucsd.edu/classes/fa07/cse124/lab3/index.html