Friday, December 30, 2011

Daily Bookmarks 20111230

Can Your Programming Language Do This? - Joel on Software
http://www.joelonsoftware.com/items/2006/08/01.html
Procs: Run Python Functions in Parallel Processes (Archived domnit.org blog)
http://domnit.org/blog/2007/10/procs.html
Some Notes on Tim Bray's Wide Finder Benchmark
http://effbot.org/zone/wide-finder.htm
http://domnit.org/misc/wf-proc.py
http://domnit.org/misc/wf-proc.py
Some thoughts on concurrency « Isotoma Blog
http://blog.isotoma.com/2008/05/some-thoughts-on-concurrency/
Wide Finder
http://dalkescientific.com/writings/diary/archive/2007/10/07/wide_finder.html#wf_effbot
other Wide Finder implementations
http://www.dalkescientific.com/writings/diary/archive/2007/10/10/other_wide_finder_implementations.html
Bill de hÓra: wfinder_serial.py
http://www.dehora.net/journal/2007/10/wfinder_serialpy.html


http://effbot.org wide finder - Google 搜尋
https://www.google.com/search?q=http://effbot.org+wide+finder&hl=zh-TW&client=firefox-a&hs=qpz&rls=org.mozilla:zh-TW:official&prmd=imvns&ei=EZ78TpTiF8famAXoxpyeAg&start=10&sa=N&biw=1132&bih=595

中文比对 - 动态感觉 静观其变 - 歪酷博客 Ycool Blog
http://xlp223.ycool.com/post.1465895.html
Coding Horror: Exploring Wide Finder
http://www.codinghorror.com/blog/2008/06/exploring-wide-finder.html
girtby.net – Wide Finder 2: The Widening
http://girtby.net/archives/2008/07/03/wide-finder-2-the-widening/

编程珠玑 Programming Pearls 学习笔记(一) | 梦想家的Blog
http://www.hoopercao.com/2011/01/25/%E7%BC%96%E7%A8%8B%E7%8F%A0%E7%8E%91-programming-pearls-%E5%AD%A6%E4%B9%A0%E7%AC%94%E8%AE%B0%EF%BC%88%E4%B8%80%EF%BC%89/
百度面试题:如何找出字典中的兄弟单词_IT面试题_百度空间 hash map
http://hi.baidu.com/mianshiti/blog/item/33590e3786e89c305bb5f592.html
编程珠玑 Programming Pearls 学习笔记(二) | 梦想家的Blog
http://www.hoopercao.com/2011/01/26/%e7%bc%96%e7%a8%8b%e7%8f%a0%e7%8e%91-programming-pearls-%e5%ad%a6%e4%b9%a0%e7%ac%94%e8%ae%b0%ef%bc%88%e4%ba%8c%ef%bc%89/
百度面试题:正向最大匹配分词,怎么做最快?_IT面试题_百度空间
http://hi.baidu.com/mianshiti/blog/item/957d2af0bd50afc00b46e079.html
美丽的Hash_liangrt_fd的空间_百度空间
http://hi.baidu.com/liangrt_fd/blog/item/3f034742d28123046a63e51c.html
迅雷面试题:合并用户基本信息和看电影的记录_IT面试题_百度空间
http://hi.baidu.com/mianshiti/blog/item/546acdc74ab9bea28326ace4.html
百度面试题:判断url的类型_IT面试题_百度空间
http://hi.baidu.com/mianshiti/blog/item/bef194b4a34af6ff30add10f.html
Python | 一步一步学python - Part 3
http://www.91python.com/archives/tag/python/page/3
多进程——加快处理速度(文本数据)_foricee的空间_百度空间
http://hi.baidu.com/foricee/blog/item/96f49f2679a4cf174d088d7f.html


marxy's musing on technology: python multiprocessing pays off
http://blog.marxy.org/2010/07/python-multiprocessing-pays-off.html
awk: 计算一列数字的sum (车东[Blog^2])
http://www.chedong.com/blog/archives/000682.html
911 四年前的今天你在做什么? (车东[Blog^2]) gais
http://www.chedong.com/blog/archives/000987.html
到底对“索引”怎么样理解 - 入门技术 - Java - ITeye论坛
http://www.iteye.com/topic/1038366
简并算法:文本自动聚类算法的实现_刀剑笑_新浪博客
http://blog.sina.com.cn/s/blog_57cae499010009l8.html
昨日关注:再说机器新闻的分类和聚类的相关文章推荐 - - ITeye专栏频道
http://www.iteye.com/wiki/blog/940729
复杂商品分类的表如何建立? - 数据库 - Tech - ITeye论坛
http://www.iteye.com/topic/26987
数据自动归类 - 企业应用 - Java - ITeye论坛
http://www.iteye.com/topic/1014463
分类和聚类 - - ITeye技术网站
http://samuschen.iteye.com/blog/562352
《集体智慧编程》第3章:浅谈文档聚类 - mdyang - 博客园
http://www.cnblogs.com/mdyang/archive/2011/07/14/PCI-ch3.html
小默的研究中心 » 数据聚类
http://wpxiaomo.sinaapp.com/archives/426
《Data-intensive Text Processing with MapReduce》读书笔记(入口)2011.7.23最后更新 - mdyang - 博客园
http://www.cnblogs.com/mdyang/archive/2011/06/29/data-intensive-text-prcessing-with-mapreduce-contents.html
分治法解决MapReduce stripe模式内存瓶颈问题 - mdyang - 博客园
http://www.cnblogs.com/mdyang/archive/2011/07/21/mapreduce-stripe-vocabulary-divide-and-conquer.html
文本聚类研究 | BT的花
http://www.dup2.org/node/1015












e

Thursday, December 29, 2011

Daily Bookmarks 20111229

language features - Why doesn't a python dict.update() return the object? - Stack Overflow
http://stackoverflow.com/questions/1452995/why-doesnt-a-python-dict-update-return-the-object
python - Memory efficiency: One large dictionary or a dictionary of smaller dictionaries? - Stack Overflow
http://stackoverflow.com/questions/671403/memory-efficiency-one-large-dictionary-or-a-dictionary-of-smaller-dictionaries
Hash Table Benchmarks
http://incise.org/hash-table-benchmarks.html

Simple, Complex and Complicated: Python的Generator
http://voidpp.blogspot.com/2007/04/pythongenerator.html
Python dictionary implementation | Laurent Luce's Blog
http://www.laurentluce.com/posts/python-dictionary-implementation/
Is a Python dictionary an example of a hash table? - Stack Overflow
http://stackoverflow.com/questions/114830/is-a-python-dictionary-an-example-of-a-hash-table

The Unexpected Ubiquity of Spam Detection Algorithms - Technology Review
http://www.technologyreview.com/blog/mimssbits/26579/

grouping
jQuery Sortable - Limit number of items in list - Stack Overflow
http://stackoverflow.com/questions/2438516/jquery-sortable-limit-number-of-items-in-list
Group list items with jQuery - Stack Overflow
http://stackoverflow.com/questions/6110571/group-list-items-with-jquery
Advance grouping of List Items - Jquery - Stack Overflow
http://stackoverflow.com/questions/1666967/advance-grouping-of-list-items-jquery
jQuery Table Plugin with Group By - Stack Overflow
http://stackoverflow.com/questions/1576908/jquery-table-plugin-with-group-by


IOError: [Errno 32] Broken pipe - Google 搜尋
https://www.google.com/search?q=IOError:+%5BErrno+32%5D+Broken+pipe&hl=zh-TW&client=firefox&hs=6CD&rls=org.mozilla:zh-TW:official&prmd=imvnsfd&source=lnt&tbs=lr:lang_1zh-CN%7Clang_1zh-TW&lr=lang_zh-CN%7Clang_zh-TW&sa=X&ei=kTD8TqTZMfHtmAXy2vinCA&ved=0CAgQpwUoAQ&biw=1280&bih=876
I am LAZY bones ? : python中的子进程 subprocess
http://luy.li/2010/04/14/python_subprocess/
Ryutlis推荐 Python脚本的输出在unix下用管道,累积一段时间会出 IOError: [Errno 32] Broken pipe 错误
http://www.douban.com/people/ryutlis/rec/93655626/
Python Error/Exception: IOError: [Errno 32] Broken pipe | Jay Taylor
http://jaytaylor.com/blog/2009/11/06/python-errorexception-ioerror-errno-32-broken-pipe/
How to handle a broken pipe (SIGPIPE) in python? - Stack Overflow
http://stackoverflow.com/questions/180095/how-to-handle-a-broken-pipe-sigpipe-in-python
花&&猪 » Blog Archive » subprocess再解析
http://www.liuzhongshu.com/code/subprocess-detail.html



回顾scrapy » Libear | 专注于互联网的最新技术领域
http://blog.libears.com/2011-06-11/python/%E5%9B%9E%E9%A1%BEscrapy
铁丑磨成针 » MySQL的SELECT…ORDER BY原理学习
http://tiechou.info/?p=46
易用小爬虫Scrapy – 鼻有鼻有鼻有涕
http://bububut.com/2011/07/%E6%98%93%E7%94%A8%E5%B0%8F%E7%88%AC%E8%99%ABscrapy/










e

Wednesday, December 28, 2011

Daily Bookmarks 20111228

 Aproximative string matching - Python
http://bytes.com/topic/python/answers/390301-aproximative-string-matching
The soundex module
http://effbot.org/librarybook/soundex.htm
Projects FuGrep
http://www.j-raedler.de/projects/
FuGrep 0.50 : Python Package Index
http://pypi.python.org/pypi/FuGrep/0.50
OneZ Studio: Python Egg的形式
http://onezstudio.blogspot.com/2006/04/python-egg.html

An Introduction to Python Lists
http://effbot.org/zone/python-list.htm#performance
Python Hash Algorithms
http://effbot.org/zone/python-hash.htm

My program is too slow. How do I speed it up?
http://effbot.org/pyfaq/my-program-is-too-slow-how-do-i-speed-it-up.htm

wiLdGoose » 64 位 FreeBSD 上部署轻量级 Subversion
https://www.xuchao.org/technology/subversion_daemon_for_freebsd.html
Python的帖子:RE: [python-chinese] 请教一个多模式字符串匹配的问题 - 哲思
http://www.zeuux.org/group/python/bbs/content/37932/
WM算法-ChinaUnix博客 - IT人与你分享快乐生活
http://blog.chinaunix.net/space.php?uid=20435679&do=blog&id=1680201
字符串多模式精确匹配(脏字/敏感词汇搜索算法)——TTMP算法 之理论如此 - Sumtec - 博客园
http://www.cnblogs.com/sumtec/archive/2008/02/01/1061742.html
Wu-Manber算法 - 知足常乐 - 博客大巴
http://wzgyantai.blogbus.com/logs/46021622.html
Wu-Manber 经典多模式匹配算法 - 小北的家 - 博客频道 - CSDN.NET
http://blog.csdn.net/iJuliet/article/details/4206487
多模式字符串匹配 python - Google 搜尋
https://www.google.com/search?q=%E5%A4%9A%E6%A8%A1%E5%BC%8F%E5%AD%97%E7%AC%A6%E4%B8%B2%E5%8C%B9%E9%85%8D+python&hl=zh-TW&client=firefox&hs=k2b&rls=org.mozilla:zh-TW:official&prmd=imvns&ei=0jj7TrC1Aqv2mAWk1OAl&start=10&sa=N&biw=1280&bih=904

Searching and Replacing Replace multiple string pairs in one go (File: MultiReplace.py)
http://effbot.org/zone/python-replace.htm#multiple

PythonでWu-Manber
http://blog.kzfmix.com/entry/1197552786
Nikita's blog: Fuzzy string search
http://ntz-develop.blogspot.com/2011/03/fuzzy-string-search.html
关于geohash的简单探讨 - K_Reverter - 博客园
http://www.cnblogs.com/step1/archive/2009/04/22/1441689.html

Charming Python: Beat spam using hashcash
http://www.ibm.com/developerworks/linux/library/l-hashcash/index.html
Dspam Python Module
http://bmsi.com/python/dspam.html
Spam Filtering Techniques: -- Comparing a Half-Dozen Approaches to Eliminating Unwanted Email --
http://gnosis.cx/publish/programming/filtering-spam.html
Learning Spam and Ham
http://infocenter.guardiandigital.com/manuals/SecureMail/node80.html

Perl Hashes
http://www.misc-perl-info.com/perl-hashes.html#findoutph
Substring search algorithm
http://volnitsky.com/project/str_search/index.html
Python - Dictionary Data Type
http://www.tutorialspoint.com/python/python_dictionary.htm
Counter class « Python recipes « ActiveState Code
http://code.activestate.com/recipes/576611/
Converting a single ordered list in python to a dictionary, pythonically - Stack Overflow
http://stackoverflow.com/questions/1639772/converting-a-single-ordered-list-in-python-to-a-dictionary-pythonically
Code Like a Pythonista: Idiomatic Python
http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html
Python: List to Dictionary - Stack Overflow
http://stackoverflow.com/questions/4576115/python-list-to-dictionary
Looks Like It - The Hacker Factor Blog
http://www.hackerfactor.com/blog/index.php?/archives/432-Looks-Like-It.html
Practical: A Spam Filter
http://www.gigamonkeys.com/book/practical-a-spam-filter.html
关于geohash的简单探讨 - K_Reverter - 博客园
http://www.cnblogs.com/step1/archive/2009/04/22/1441689.html
Python Programming - Email Whitelist for Spam Filters - Building an Email Whitelist to Train Your Spam Filter
http://python.about.com/od/pythonstandardlibrary/ss/email_whitelist.htm
Python: Bad Words Filter - jeff00seattle
http://sites.google.com/site/jeff00seattle/Home/python-coding/python--bad-words-filter



e

Tuesday, December 27, 2011

Daily Bookmarks 20111227

让 Archlinux 的 pacman 健步如飞 | Rest Valley
http://lihdd.net/2010/05/archlinux-pacman-accelerate/
千万别用MongoDB?真的吗?! | 酷壳 - CoolShell.cn
http://coolshell.cn/articles/5826.html

High Scalability - High Scalability - Product: Scribe - Facebook's Scalable Logging System
http://highscalability.com/product-scribe-facebooks-scalable-logging-system
benchmark.py - urllib3 - Python HTTP library with thread-safe connection pooling and file post support. - Google Project Hosting
http://code.google.com/p/urllib3/source/browse/test/benchmark.py

15.3. Zend_Service_Amazon
http://oss.org.cn/ossdocs/php/zend/ZendFramework-0.1.5/documentation/end-user/zh/zend.service.amazon.html
Introduction — PyTables 2.3.1 documentation
http://pytables.github.com/usersguide/introduction.html
分词算法的具体实践 | shell's home
http://shell909090.com/blog/2008/10/%E5%88%86%E8%AF%8D%E7%AE%97%E6%B3%95%E7%9A%84%E5%85%B7%E4%BD%93%E5%AE%9E%E8%B7%B5/

Some Notes on Tim Bray's Wide Finder Benchmark
http://effbot.org/zone/wide-finder.htm
The in operator ((An Unofficial) Python Reference Wiki)
http://pyref.infogami.com/in
KingsoftPythoner - cpyug - 金山长年招聘Py人才 - CPyUG~华蟒用户组 相关邮件列表管理通告收集/维护 - Google Project Hosting
http://code.google.com/p/cpyug/wiki/KingsoftPythoner#%E9%87%91%E5%B1%B1%E9%95%BF%E5%B9%B4%E6%8B%9B%E8%81%98Py%E4%BA%BA%E6%89%8D
python-segment - segmentation and classify library written by python - Google Project Hosting
http://code.google.com/p/python-segment/

路迢迢,人遥遥---关注c++,python,吃,人生和哲学 django
http://www.lutiaotiao.com/main/tags/15/0/
Wide Finder good site
http://dalkescientific.com/writings/diary/archive/2007/10/07/wide_finder.html#dalke-wf-9
other Wide Finder implementations
http://www.dalkescientific.com/writings/diary/archive/2007/10/10/other_wide_finder_implementations.html
Go deh!: Wide Finder on the command line
http://paddy3118.blogspot.com/2007/10/wide-finder-on-command-line.html
ongoing by Tim Bray · The Wide Finder Project
http://www.tbray.org/ongoing/When/200x/2007/09/20/Wide-Finder


豆瓣-只看楼主-pyquery版 - 代码分享 - 开源中国社区
http://www.oschina.net/code/snippet_87626_7691
Python3筆摘: [pyquery] 抓網頁資料的神器
http://nopython.blogspot.com/2011/11/pyquery.html
PyQuery Tutorial: Basic HTML Parsing with PyQuery | Vert Studios
http://www.vertstudios.com/blog/pyquery-tutorial-basic-html-parsing-pyquery/
利用pyquery抓資料-Tomda生活點滴
http://543.vipe.idv.tw/2011/06/pyquery.html

未來數學家的挑戰 NP問題
http://episte.math.ntu.edu.tw/articles/mm/mm_10_2_04/index.html



e

Friday, December 23, 2011

Daily Bookmarks 20111223

14.1. hashlib — Secure hashes and message digests — Python v2.7.2 documentation
http://docs.python.org/library/hashlib.html#module-hashlib
Python 去除序列s中的重复元素
http://proupy.com/news/50
Consistent hashing implemented simply in Python - amix.dk
http://amix.dk/blog/post/19367
Source Checkout - hashdb - Library and Application for building database/s of file hash values - Google Project Hosting
http://code.google.com/p/hashdb/source/checkout
pda/flexihash - GitHub
https://github.com/pda/flexihash#readme
Entity Crisis: Consistent Hashing in Python
http://entitycrisis.blogspot.com/2010/05/consistent-hashing-in-python.html
memcached的分布式算法-Consistent Hashing « 排头兵 @ Talk
http://www.paitoubing.cn/blog/memcached_consistent_hashing
consistent-hashing in python — Gist
https://gist.github.com/1341846
一致性hash算法 - consistent hashing - sparkliang的专栏 - 博客频道 - CSDN.NET
http://blog.csdn.net/sparkliang/article/details/5279393
memcached全面剖析–4. memcached的分布式算法 - idv2
http://tech.idv2.com/2008/07/24/memcached-004/
solr sint 字段 hashcode 冲突高达99%,导致 solr-memcached 的 bug - Bory.Chan
http://blog.chenlb.com/2009/06/solr-sint-hashcode-conflict-cause-solr-memcached-bug.html


e

Thursday, December 22, 2011

Daily Bookmarks 20111222

Clustering text in Python - Stack Overflow
http://stackoverflow.com/questions/1789254/clustering-text-in-python
Cython三分钟入门 - 赖勇浩的编程私伙局 - 博客频道 - CSDN.NET
http://blog.csdn.net/lanphaday/article/details/4561611
Speed up your Python: Unladen vs. Shedskin vs. PyPy vs. Cython vs. C « Geet Duggal
http://geetduggal.wordpress.com/2010/11/25/speed-up-your-python-unladen-vs-shedskin-vs-pypy-vs-c/
Ian Bicking: a blog
http://blog.ianbicking.org/
CPython vs PyPy vs Cython
http://jaredforsyth.com/blog/2010/jul/21/cpython-vs-pypy-vs-cython/
solem's vision blog: Hierarchical Clustering in Python
http://www.janeriksolem.net/2009/04/hierarchical-clustering-in-python.html
亚马逊:查看所有 品牌
http://www.amazon.cn/gp/search/other/ref=sv_cps_0?ie=UTF8&n=665002051&pickerToList=brandtextbin
python脚本生成的图书小网站
http://www.douban.com/group/topic/6043510/
用Python轻松提取链接
http://www.blogkid.net/archives/1827.html
python clustering - Google 搜尋
https://www.google.com/search?q=python+clustering&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:zh-TW:official&client=firefox
Rebill's Blog » 修改 Ubuntu ulimit 限制
http://blog.rebill.info/archives/modify-ubuntu-ulimit-restrictions.html
洛阳铲的日志 : crawler collection
http://www.lyc.name/crawler-collection.html
solem's vision blog: Hierarchical Clustering in Python
http://www.janeriksolem.net/2009/04/hierarchical-clustering-in-python.html
通过ulimit改善系统性能 | 飘渺的风 | 个人的生活,学习,工作感悟
http://www.huanxiangwu.com/631/improve-system-performance-by-ulimit
sitescraper - Web scraping made simple - Google Project Hosting
http://code.google.com/p/sitescraper/
Learning Python by writing a screen scraper | Composite
http://bookmaniac.org/learning-python-by-writing-a-screen-scraper/
Write a Screen Scraper with Python - Prodigy Productions, LLC - Learn software development, computer vision, and A.I. programming.
http://www.prodigyproductionsllc.com/articles/programming/write-a-screen-scraper-with-python/

程式旅人 - 學習紀事 -: Python中的List Comprehension(列表綜合?)
http://nio127.blogspot.com/2008/10/pythonlist-comprehension.html


other
王元涛的Blog: [笔记]豆瓣校园宣讲会
http://todwang.blogspot.com/2007/12/blog-post.html
来淘宝的这一年:前篇、生活和工作 - JasonLee的专栏 - 博客频道 - CSDN.NET
http://blog.csdn.net/jasonblog/article/details/7026193
第一届PyCon China小记 - JasonLee的专栏 - 博客频道 - CSDN.NET
http://blog.csdn.net/jasonblog/article/details/7040420

Pythonic到底是什么玩意儿? - 赖勇浩的编程私伙局 - 博客频道 - CSDN.NET
http://blog.csdn.net/lanphaday/article/details/2762251
Python en:More - Notes
http://www.swaroopch.com/notes/Python_en:More#List_Comprehension
王元涛的Blog: Netflix Data的一些统计特性
http://todwang.blogspot.com/2008/11/netflix-data.html
王元涛的Blog: Second-hand Hash
http://todwang.blogspot.com/2009/02/second-hand-hash.html
xlvector – Recommender System - 推荐系统的有效性——Amazon到底是百分之多少
http://xlvector.net/blog/?p=802

How to use curl_multi() without blocking
http://www.onlineaspect.com/2009/01/26/how-to-use-curl_multi-without-blocking/
蚊子館: Varnish – 安裝
http://linux-guys.blogspot.com/2011/01/varnish.html
amazon robots.txt
http://www.amazon.com/robots.txt
Amazon ASIN listing and similarity graph : Free Download & Streaming : Internet Archive
http://www.archive.org/details/amazon_similarity_graph/
Internet Archive: Details: Amazon ASIN listing and similarity graph data source | Infochimps
http://www.infochimps.com/sources/internet-archive-details-amazon-asin-listing-and-similarity-grap
Infogami Dev Site (Infogami)
http://infogami.org/
(theinfo)
http://theinfo.org/
Aaron Swartz
http://www.aaronsw.com/
Xapian介绍及使用 » OpenSalon
http://www.opensalon.org/blog/2011/05/xapian-intro
XAPIAN学习1--倒排数据(库)建立,工厂模式应用 - goldenlock - 博客园
http://www.cnblogs.com/rocketfan/archive/2010/08/09/1796054.html

e

Wednesday, December 21, 2011

Daily Bookmarks 20111221

卓越网商品数据分级抓取 | GooSeeker
http://www.gooseeker.com/cn/node/document/metaseeker/cookbookv4/multilayers.html
自己动手写网络爬虫(附CD-ROM光盘1张)/罗刚-图书-卓越亚马逊 [搜索引擎]
http://www.amazon.cn/%E8%87%AA%E5%B7%B1%E5%8A%A8%E6%89%8B%E5%86%99%E7%BD%91%E7%BB%9C%E7%88%AC%E8%99%AB-%E7%BD%97%E5%88%9A/dp/product-description/B0047T6B4O/ref=dp_proddesc_0?ie=UTF8&s=books
先贴一个我的爬虫:amazon价格追踪器 - 未名空间(mitbbs.com)
http://www.mitbbs.com/clubarticle_t/Net_Parser/29458563.html
优秀案例点评之--使用Mashup和爬虫技术构建商品评价网 - 软酷快讯
http://express.ruanko.com/ruanko-express_15/webpage/tech2.html
python爬虫,去抓亚马逊、当当、豆瓣的信息。 - Husw!OnRoad 在路上 - 我就是个世界的部落格--------人生如行路,就让心灵去旅行~ 我一直在路上...
http://www.husw.net/blog/post/1033/
python爬虫,去抓亚马逊、当当、豆瓣的信息。 - Husw!OnRoad 在路上 - 我就是个世界 WyattWang 的麦库
http://note.sdo.com/u/1445822006/n/mbnUS~jpc8wwLX028001cZ
用Python轻松提取链接
http://www.blogkid.net/archives/1827.html
程式扎記: [ Java Crawler ] 分散式爬蟲 : 分散式儲存
http://puremonkey2010.blogspot.com/2011/12/java-crawler.html
用python从百度获取亚马逊的商品ID - 代码分享 - 开源中国社区
http://www.oschina.net/code/snippet_220262_7783
xlvector – Recommender System - python 爬虫
http://xlvector.net/blog/?p=83







e

Monday, December 19, 2011

Daily Bookmarks 20111219

Java实现Amazon数据抓取(包括Signature生成) | I'm Donkey
http://imdonkey.com/blog/archives/60

Princess Polymath » Blog Archive » Accessing Amazon’s Product Advertising API with Python -
http://www.princesspolymath.com/princess_polymath/?p=182

python-amazon-product-api 0.2.5 : Python Package Index
http://pypi.python.org/pypi/python-amazon-product-api/

Basic usage — python-amazon-product-api v0.2.5 documentation
http://packages.python.org/python-amazon-product-api/basic-usage.html
Product Advertising API
http://docs.amazonwebservices.com/AWSECommerceService/2009-11-01/DG/index.html?BrowseNodeIDs.html
Hack 8 Browse and Search Categories with Browse Nodes :: Chapter 1. Browsing and Searching :: Amazon hacks. Tips and tools :: Misc :: eTutorials.org
http://etutorials.org/Misc/amazon+tips+tools/Chapter+1.+Browsing+and+Searching/Hack+8+Browse+and+Search+Categories+with+Browse+Nodes/
关于shishijia.com的介绍 | 网络购物时时价
http://www.trachina.com/diary/%E6%97%B6%E6%97%B6%E4%BB%B7%E9%A1%B9%E7%9B%AE/shishijia
Working With the "One-Second" Rule
http://www.a2sdeveloper.com/page-working-with-the-one-second-rule.html
时时价 - 有正品保证的比价网,提供最新的商品比价、优惠及打折促销信息
http://shishijia.com/

Python食谱-1.23.Unicode数据编码输出到XML或HTML文件 | Shine.IT
http://blog.shine-it.net/python/encoding-unicode-data-for-xml-and-html
killer python projects: Making The New Amazon Product API Easy to Work With
http://webcache.googleusercontent.com/search?q=cache:EuB9wvZUaAYJ:pythonprojectwatch.blogspot.com/2011/12/making-new-amazon-product-api-easy-to.html+&cd=3&hl=zh-TW&ct=clnk&client=firefox
Google App Engine + Amazon Product Advertising API | Rutwick Gangurde's Blog
http://blog.rutwick.com/set-up-an-amazon-book-store-on-google-app-engine
How to make a music mashup – Muskblog
http://blog.muschamp.ca/2010/12/31/how-to-make-a-music-mashup/
Querying Amazon ECS with Boto | SysAdminPy
http://www.sysadminpy.com/2011/02/querying-amazon-ecs-with-boto/





z

Friday, December 16, 2011

Daily Bookmarks 20111216

Xapian编译安装及python binding的步骤 | 弱类型
http://troycheng.blogcn.com/articles/xapian%E7%BC%96%E8%AF%91%E5%AE%89%E8%A3%85%E5%8F%8Apython-binding%E7%9A%84%E6%AD%A5%E9%AA%A4.html
Xapian 如何发音? | BT的花
http://www.dup2.org/node/1422
World Hello - xapian索引的term处理
http://www.worldhello.net/2010/07/31/1613.html#more-1613
利用 xapian 建立索引 (python 版) - 系统架构 - python.cn(news, jobs)
http://simple-is-better.com/news/619
用php简单实现Search Engine Friendly的URL – 某人的栖息地
http://www.ooso.net/archives/174
Xapian构建自己的搜索引擎:检索 | 新鲜事 | 关注开源,互联网,游戏,技术,创业,云计算,架构,移动,生活
http://www.liulizhi.info/xapian%e6%9e%84%e5%bb%ba%e8%87%aa%e5%b7%b1%e7%9a%84%e6%90%9c%e7%b4%a2%e5%bc%95%e6%93%8e%ef%bc%9a%e6%a3%80%e7%b4%a2/
Xapian ( Python ) 之 TermGenerator 的简单理解和使用示例 - pyman hall - 博客频道 - CSDN.NET Good example
http://blog.csdn.net/zlchina1989/article/details/6777150
学习Xapian(4) – Faceting Search(Filter / 过滤) – 四号程序员
http://www.coder4.com/archives/2253
学习Xapian(1) – 基础的建索引和搜索 – 四号程序员
http://www.coder4.com/archives/2218
Xapian | Search Results | Gea-Suan Lin's BLOG
http://blog.gslin.org/?s=Xapian
幫 Pixnet 做 Fulltext Search | Gea-Suan Lin's BLOG
http://blog.gslin.org/archives/2007/06/15/1202/
Xapian 的幾個細節 | Gea-Suan Lin's BLOG
http://blog.gslin.org/archives/2007/08/10/1264/xapian-%e7%9a%84%e5%b9%be%e5%80%8b%e7%b4%b0%e7%af%80/
A Comparison of Open Source Search Engines | Vik's Blog
http://zooie.wordpress.com/2009/07/06/a-comparison-of-open-source-search-engines-and-indexing-twitter/
Pumping Up Your Applications with Xapian Full-Text Search | Nadav Samet's Blog
http://www.thesamet.com/blog/2007/02/04/pumping-up-your-applications-with-xapian-full-text-search/
折腾xapian的那点事 3 - twelfthing - 博客园
http://www.cnblogs.com/twelfthing/articles/1916112.html


python web.py使用flup lighttpd优化过程 – Tim[后端技术]
http://timyang.net/python/python-webpy-lighttpd/






z

Wednesday, December 14, 2011

Daily Bookmarks 20111214

python字符串匹配工具性能比较 | 弱类型
http://troycheng.blogcn.com/articles/python%e5%ad%97%e7%ac%a6%e4%b8%b2%e5%8c%b9%e9%85%8d%e5%b7%a5%e5%85%b7%e6%80%a7%e8%83%bd%e6%af%94%e8%be%83.html
py-contentfilter:敏感词过滤服务 | 弱类型
http://troycheng.blogcn.com/articles/py-contentfilter%EF%BC%9A%E6%95%8F%E6%84%9F%E8%AF%8D%E8%BF%87%E6%BB%A4%E6%9C%8D%E5%8A%A1.html

Facebook’s photo storage rewrite | Niall Kennedy
http://www.niallkennedy.com/blog/2009/04/facebook-haystack.html
MIT World » :Akamai 的故事︰從理論到實務(The Akamai Story: From Theory to Practice)
http://www.myoops.org/twocw/mitworld/video/199/index.htm
gentoo 更新系统-xorg-server 1.6.1.901-r4鼠标键盘失效解决 | 专注于linux安全
http://webcache.googleusercontent.com/search?q=cache:Pm3IWju4m04J:www.xslife.net/%3Fp%3D60+&cd=1&hl=zh-TW&ct=clnk&client=firefox
时隔两年,再回 Gentoo (一) -- anyLinux
https://anylinux.net/post/1617.html
Linux学习笔记(三百六十六)——Gentoo升级至Xorg 1.10后鼠标和键盘失效的问题 | 王不日天
http://huanhaoadam.wordpress.com/2011/09/08/linux%E5%AD%A6%E4%B9%A0%E7%AC%94%E8%AE%B0%EF%BC%88%E4%B8%89%E7%99%BE%E5%85%AD%E5%8D%81%E5%85%AD%EF%BC%89%E2%80%94%E2%80%94gentoo%E5%8D%87%E7%BA%A7%E8%87%B3xorg-1-10%E5%90%8E%E9%BC%A0%E6%A0%87%E5%92%8C/
Linux学习笔记(三百六十五)——利用指针减少C程序的空间开销 | 王不日天
http://huanhaoadam.wordpress.com/2011/07/06/linux%e5%ad%a6%e4%b9%a0%e7%ac%94%e8%ae%b0%ef%bc%88%e4%b8%89%e7%99%be%e5%85%ad%e5%8d%81%e4%ba%94%ef%bc%89%e2%80%94%e2%80%94%e5%88%a9%e7%94%a8%e6%8c%87%e9%92%88%e5%87%8f%e5%b0%91c%e7%a8%8b%e5%ba%8f/
gentoo 更新系统-xorg-server 1.6.1.901-r4鼠标键盘失效解决 | 专注于linux安全
http://www.xslife.net/?p=60

Tuesday, December 13, 2011

Daily Bookmarks 20111213

用Python实现CRUD功能REST服务 – Tim[后端技术]
http://timyang.net/python/python-rest/
某分布式应用实践一致性哈希的一些问题 – Tim[后端技术]
http://timyang.net/architecture/consistent-hashing-practice/
理解Python命名机制 - 赖勇浩的编程私伙局 - 博客频道 - CSDN.NET
http://blog.csdn.net/lanphaday/article/details/1734990
从HTML文件中抽取正文的简单方案 - 赖勇浩的编程私伙局 - 博客频道 - CSDN.NET
http://blog.csdn.net/lanphaday/article/details/1741185
三本可以一买的 Python 书 - 赖勇浩的编程私伙局 - 博客频道 - CSDN.NET
http://blog.csdn.net/lanphaday/article/details/6204639

Sunday, December 11, 2011

Daily Bookmarks 20111211

关于最近研究的关键词提取keyword extraction做的笔记 - caohao2008的专栏 - 博客频道 - CSDN.NET
http://blog.csdn.net/caohao2008/article/details/3144639
distance.py - nltk - Natural Language Toolkit Development - Google Project Hosting
http://code.google.com/p/nltk/source/browse/trunk/nltk/nltk/metrics/distance.py
程序员编程艺术:第三章续、Top K算法问题的实现 - 结构之法 算法之道 - 博客频道 - CSDN.NET good site
http://blog.csdn.net/v_JULY_v/article/details/6403777
aMMAI
http://chiehchi.blogspot.com/
找出Top K个数 - 就像以往 - 51CTO技术博客
http://dongdong1314.blog.51cto.com/389953/366991
zz:查找一段文字中最长的重复字串 – 编程珠玑(排过序的后缀数组的应用) | Bruce is coding !
https://www.cse.msu.edu/~liyang5/?p=53
统计单词出现次数--hash表,二叉树,标准库 - - 博客频道 - CSDN.NET
http://blog.csdn.net/lalor/article/details/7001357
十道海量数据处理面试题与十个方法大总结 - 结构之法 算法之道 - 博客频道 - CSDN.NET
http://blog.csdn.net/v_JULY_v/article/details/6279498
再谈脏字过滤(基于hash的优化算法) - 边城浪 - 博客园
http://www.cnblogs.com/yeerh/archive/2011/10/20/2219035.html
再度提升!.NET脏字过滤算法 - xingd - 博客园
http://www.cnblogs.com/xingd/archive/2008/02/01/1061800.html
高效的关键字过滤及查找算法(Trie KO Hash) - 边城浪 - 博客园
http://www.cnblogs.com/yeerh/archive/2011/08/24/2152607.html
海量数据实时计算随笔 | 搜索引擎技术博客
http://flychen.com/article/massive-data-real-time-computation-essays.html
search engine中的duplicate detection | In Programming We Trust
http://ptsolmyr.com/2010/08/13/duplicate_detection/
刘未鹏 新书 《暗时间》 (全)
http://www.douban.com/group/topic/20932914/
Storm :twitter的实时数据处理工具 - 论坛阅读
http://www.starming.com/index.php?action=plugin&v=wave&mid=34483&tid=15965


z

Friday, December 09, 2011

Daily Bookmarks 20111209

Approximate string matching - Wikipedia, the free encyclopedia
http://en.wikipedia.org/wiki/Approximate_string_matching




: 作業十七:互訊息Mutual Information - yam天空部落
http://blog.yam.com/Wfin/article/15631028
pealco/python-mutual-information - GitHub
https://github.com/pealco/python-mutual-information
Mutual Information 互信息的应用 - 好工具站长分享平台
http://www.haogongju.net/art/892435

统计自然语言处理---信息论基础 - zhoubl668的专栏:远帆,梦之帆! - 博客频道 - CSDN.NET
http://blog.csdn.net/zhoubl668/article/details/6923763
互信息 - 维基百科,自由的百科全书
http://zh.wikipedia.org/wiki/%E4%BA%92%E4%BF%A1%E6%81%AF
中央研究院-近代漢語標記語料庫 Academia Sinica Tagged Corpus of Early Mandarin Chinese
http://db1x.sinica.edu.tw/kiwi/pkiwi/early_mandarin_chinese_c_againhelp.html



基于数据挖掘的新词发现Approach for Lexicon Updating Based on ...

d.wanfangdata.com.cn › ... › 计算机应用研究2006年12期 - 轉為繁體網頁
由 王立希 著作 - 2006 - 被引用 5 次 - 相關文章
利用文本挖掘技术提出了一种用于主题式搜索引擎的专业词典库发现新专业词汇的 ... P Word Association Norms,Mutual Information and Lexicography 1990(01) ...

Tuesday, December 06, 2011

Daily Bookmarks 20111206

统计重复文本行的两种方法 | 我爱正则表达式
http://iregex.org/blog/get-duplicated-lines.html
统计汉字/英文单词数 | 我爱正则表达式
http://iregex.org/blog/words-counter-in-python.html
anti spam杂谈 | 我爱正则表达式
http://iregex.org/blog/anti-spam.html



z

Saturday, December 03, 2011

Thursday, December 01, 2011

Daily Bookmarks 20111201

Python中最快的字典排序方法 - 张沈鹏 - 知识梳理
http://zuroc.42qu.com/10037539
descriptor:Python Idiom: sort - 樂多日誌
http://blog.roodo.com/descriptor/archives/7883223.html

CiteSeerX — Discovering Identities in Web Contexts with Unsupervised Clustering
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.67.9602
/tags/release-0.1/src/colint/ch03/generatefeedvector.py – Colint
http://csrc.tamu-commerce.edu/projs/colint/browser/tags/release-0.1/src/colint/ch03/generatefeedvector.py

Paul Graham | 36氪
http://www.36kr.com/tag/paul-graham/page/3
Finding duplicate files using Python « Endlessly Curious
http://www.endlesslycurious.com/2011/06/01/finding-duplicate-files-using-python/
Get the MD5 Hash of a String - Python - Source Code | DreamInCode.net
http://www.dreamincode.net/code/snippet1851.htm


Automatic Keyword Extraction
https://docs.google.com/viewer?a=v&q=cache:qwsqMHh78I8J:www.l3s.de/~demidova/students/thesis_oelze.pdf+term+frequency+extract+keyword+product&hl=zh-TW&gl=tw&pid=bl&srcid=ADGEESjBgTvNmtNRiGNH4xTrPzw3ZryPhvbQ0MqutWg80zD0eSSOTw4517JkwyvkjivxzD1BkkxBJ4onqQ8YiCp9S_y89rukOkJAAh4AMLmb6fqZpw5AQOaHUeEadIw_XMFyvdq3MdYl&sig=AHIEtbS8553nR9Q9wG7EDtUtj4RBnM0ljg

hash table
十一、从头到尾彻底解析Hash表算法 - 结构之法 算法之道 - 博客频道 - CSDN.NET Good
http://blog.csdn.net/v_july_v/article/details/6256463
打造最快的Hash表(转) - 我风 - C++博客
http://www.cppblog.com/kyelin/archive/2007/08/21/30506.aspx
Alex's Blog : 打造最快的Hash表(和Blizzard的对话)
http://blog.itpub.net/post/670/16449
uthash: a hash table for C structures
http://uthash.sourceforge.net/
HASH表原理 - 未知世界 - ITeye技术网站
http://calmness.iteye.com/blog/184465

How Entity Extraction is Fueling the Semantic Web Fire - O'Reilly Broadcast
http://broadcast.oreilly.com/2009/02/how-entity-extraction-is-fueli.html
[totti's blog] 命名是件麻煩的事
http://webcache.googleusercontent.com/search?q=cache:bNykOxSvuxEJ:totti-yang.blogspot.com/+Named+Entity&cd=3&hl=zh-TW&ct=clnk&gl=tw&client=firefox-a
Named-Entity Recognition product - Google 搜尋
http://www.google.com.tw/search?q=Named-Entity+Recognition+product&hl=zh-TW&client=firefox-a&hs=EBV&rls=org.mozilla:zh-TW:official&prmd=imvns&ei=tKrXTs3qLOGRiQfpnOX-DQ&start=10&sa=N&biw=1235&bih=649
Named Entity Extraction
http://infoglutton.com/yooname-named-entity-recognition.html
Named entity recognition with preset list of names for Python / PHP - Stack Overflow
http://stackoverflow.com/questions/4206882/named-entity-recognition-with-preset-list-of-names-for-python-php
Named-entity recognition - Wikipedia, the free encyclopedia
http://en.wikipedia.org/wiki/Named-entity_recognition

http://nltk.googlecode.com/svn/trunk/doc/book/ch07.html








z