a crawler using wget and xargs
http://www.xcombinator.com/2010/09/06/a-crawler-using-wget-and-xargs/
某分布式应用实践一致性哈希的一些问题 – Tim[后端技术]
http://timyang.net/architecture/consistent-hashing-practice/
kite1988的专栏 - CSDN博客 dblp
http://blog.csdn.net/kite1988/archive/2010/01.aspx
Text Processing in Python (a book)
http://gnosis.cx/TPiP/
命令行CURL教程 | Andy's Blog
http://www.21andy.com/blog/20080602/1154.html
Monday, December 20, 2010
Sunday, December 19, 2010
Daily Bookmarks 20101219
Artificial Intelligence - 人工智慧研究部落格
http://ai.mmdays.com/
中国人民大学数据库研究组面向领域的Deep Web数据集成项目
http://idke.ruc.edu.cn/domain_integration/help.htm
仲子说
http://www.wangzhongyuan.com/index.php
RUC DB-IIR 语义web与知识网格小组
http://iir.ruc.edu.cn/project/kg.jsp
林宣華的知識入口 - Knowledge Management
http://www.wke.csie.ncnu.edu.tw/scripts/shlin/index.asp?Sort=&View=&UID=&bClassIMG=&ViewName=&CID=56
piaip's Using (lib)SVM Tutorial
http://ntu.csie.org/~piaip/docs/svm/
The MetaQuerier Project at UIUC
http://metaquerier.cs.uiuc.edu/
QProber: Classifying and Searching "Hidden-Web" Text Databases
http://qprober.cs.columbia.edu/
Deep Web - Wikipedia, the free encyclopedia
http://en.wikipedia.org/wiki/Deep_Web
Regular Expressions in grep
http://www.robelle.com/smugbook/regexpr.html
Web数据抽取 - Google 搜尋
http://www.google.com.tw/search?q=Web%E6%95%B0%E6%8D%AE%E6%8A%BD%E5%8F%96&hl=zh-TW&prmd=ivns&ei=X9wMTbv_BMPIce6nhLsG&start=20&sa=N
乐思网络信息采集系统 -- 用于信息资源整合与网页数据抓取,网站抓取,信息采集技术
http://www.knowlesys.cn/cn/products/web_data_miner.htm
http://ai.mmdays.com/
中国人民大学数据库研究组面向领域的Deep Web数据集成项目
http://idke.ruc.edu.cn/domain_integration/help.htm
仲子说
http://www.wangzhongyuan.com/index.php
RUC DB-IIR 语义web与知识网格小组
http://iir.ruc.edu.cn/project/kg.jsp
林宣華的知識入口 - Knowledge Management
http://www.wke.csie.ncnu.edu.tw/scripts/shlin/index.asp?Sort=&View=&UID=&bClassIMG=&ViewName=&CID=56
piaip's Using (lib)SVM Tutorial
http://ntu.csie.org/~piaip/docs/svm/
The MetaQuerier Project at UIUC
http://metaquerier.cs.uiuc.edu/
QProber: Classifying and Searching "Hidden-Web" Text Databases
http://qprober.cs.columbia.edu/
Deep Web - Wikipedia, the free encyclopedia
http://en.wikipedia.org/wiki/Deep_Web
Regular Expressions in grep
http://www.robelle.com/smugbook/regexpr.html
Web数据抽取 - Google 搜尋
http://www.google.com.tw/search?q=Web%E6%95%B0%E6%8D%AE%E6%8A%BD%E5%8F%96&hl=zh-TW&prmd=ivns&ei=X9wMTbv_BMPIce6nhLsG&start=20&sa=N
乐思网络信息采集系统 -- 用于信息资源整合与网页数据抓取,网站抓取,信息采集技术
http://www.knowlesys.cn/cn/products/web_data_miner.htm
Thursday, December 16, 2010
Wednesday, December 15, 2010
Daily Bookmarks 20101215
Map and Reduce in PHP » Sebastian Bergmann
http://sebastian-bergmann.de/archives/750-Map-and-Reduce-in-PHP.html
Lunchpauze: Writing A Hadoop MapReduce Program In PHP
http://www.lunchpauze.com/2007/10/writing-hadoop-mapreduce-program-in-php.html
Writing An Hadoop MapReduce Program In Python @ Michael G. Noll
http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
Documentation | Mendeley Developers Portal
http://dev.mendeley.com/docs/
Easily renaming multiple files.
http://www.debian-administration.org/articles/150
http://sebastian-bergmann.de/archives/750-Map-and-Reduce-in-PHP.html
Lunchpauze: Writing A Hadoop MapReduce Program In PHP
http://www.lunchpauze.com/2007/10/writing-hadoop-mapreduce-program-in-php.html
Writing An Hadoop MapReduce Program In Python @ Michael G. Noll
http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
Documentation | Mendeley Developers Portal
http://dev.mendeley.com/docs/
Easily renaming multiple files.
http://www.debian-administration.org/articles/150
Monday, December 13, 2010
Daily Bookmarks 20101213
Benson's Personal Blog: 八月 2009
http://bensontw.blogspot.com/2009_08_01_archive.html
专题:Linux下的文件共享服务全攻略_51CTO.COM
http://os.51cto.com/art/201010/231947.htm
Linux下的NFS快速配置教程与安全策略(1) - 51CTO.COM
http://os.51cto.com/art/201010/231717.htm
CentOS 5.5上安装openvpn全过程 - 51CTO.COM
http://os.51cto.com/art/201011/234004.htm
kite1988的专栏 - CSDN博客 DBLP 中文
http://blog.csdn.net/kite1988/archive/2010/01.aspx
Zeal's Blog · 计算机工程专业,如何找paper?
http://zeal.haliluya.org/blog/2006/05/26/how-to-find-computer-engineering-papers/
Benchmarking D2RQ v0.1
http://www4.wiwiss.fu-berlin.de/bizer/d2rq/benchmarks/index01.html
a crawler using wget and xargs
http://www.xcombinator.com/2010/09/06/a-crawler-using-wget-and-xargs/
http://bensontw.blogspot.com/2009_08_01_archive.html
专题:Linux下的文件共享服务全攻略_51CTO.COM
http://os.51cto.com/art/201010/231947.htm
Linux下的NFS快速配置教程与安全策略(1) - 51CTO.COM
http://os.51cto.com/art/201010/231717.htm
CentOS 5.5上安装openvpn全过程 - 51CTO.COM
http://os.51cto.com/art/201011/234004.htm
kite1988的专栏 - CSDN博客 DBLP 中文
http://blog.csdn.net/kite1988/archive/2010/01.aspx
Zeal's Blog · 计算机工程专业,如何找paper?
http://zeal.haliluya.org/blog/2006/05/26/how-to-find-computer-engineering-papers/
Benchmarking D2RQ v0.1
http://www4.wiwiss.fu-berlin.de/bizer/d2rq/benchmarks/index01.html
a crawler using wget and xargs
http://www.xcombinator.com/2010/09/06/a-crawler-using-wget-and-xargs/
Thursday, December 09, 2010
Daily Bookmarks 20101209
Consistent hashing - Wikipedia, the free encyclopedia
http://en.wikipedia.org/wiki/Consistent_hashing
A Web Crawler in Perl | Linux Journal
http://www.linuxjournal.com/article/2200?page=0,0
Scraping Links With PHP
http://www.developertutorials.com/tutorials/php/scraping-links-with-php-8-01-05-958/
使用php simple html dom parser解析html标签 | Jphp-1号蟋蟀-关注PHP
http://www.init09.com/php/use-php-simple-html-dom-parser-html-tag.html
Parallel web scraping in PHP: cURL multi functions
http://www.developertutorials.com/tutorials/php/parallel-web-scraping-in-php-curl-multi-functions-375/
PHP Simple HTML DOM Parser
http://simplehtmldom.sourceforge.net/index.htm
[PHP] 使用 cURL + HTTP REFERER + Cookie + File:自製 my_wget 下載資料存到檔案 @ 第二十四個夏天後 :: 痞客邦 PIXNET ::
http://changyy.pixnet.net/blog/post/26475606
PHP 使用 CURL 同步抓取多個網頁 | Tsung's Blog
http://plog.longwin.com.tw/programming/2009/10/07/php-multi-thread-curl-2009
Key-Value 系統 分類整理 (NoSQL) | Fred Chu
http://fred.oracle1.com/weblog/2010/02/23/key-value-nosql-system-category-2009/
brad's life - Contributing to Open Source projects
http://brad.livejournal.com/2409049.html
檢索
我思故我在_博客_检索_百度空间 Good site
http://hi.baidu.com/rodimus/blog/category/%BC%EC%CB%F7
http://en.wikipedia.org/wiki/Consistent_hashing
A Web Crawler in Perl | Linux Journal
http://www.linuxjournal.com/article/2200?page=0,0
Scraping Links With PHP
http://www.developertutorials.com/tutorials/php/scraping-links-with-php-8-01-05-958/
使用php simple html dom parser解析html标签 | Jphp-1号蟋蟀-关注PHP
http://www.init09.com/php/use-php-simple-html-dom-parser-html-tag.html
Parallel web scraping in PHP: cURL multi functions
http://www.developertutorials.com/tutorials/php/parallel-web-scraping-in-php-curl-multi-functions-375/
PHP Simple HTML DOM Parser
http://simplehtmldom.sourceforge.net/index.htm
[PHP] 使用 cURL + HTTP REFERER + Cookie + File:自製 my_wget 下載資料存到檔案 @ 第二十四個夏天後 :: 痞客邦 PIXNET ::
http://changyy.pixnet.net/blog/post/26475606
PHP 使用 CURL 同步抓取多個網頁 | Tsung's Blog
http://plog.longwin.com.tw/programming/2009/10/07/php-multi-thread-curl-2009
Key-Value 系統 分類整理 (NoSQL) | Fred Chu
http://fred.oracle1.com/weblog/2010/02/23/key-value-nosql-system-category-2009/
brad's life - Contributing to Open Source projects
http://brad.livejournal.com/2409049.html
檢索
我思故我在_博客_检索_百度空间 Good site
http://hi.baidu.com/rodimus/blog/category/%BC%EC%CB%F7
Tuesday, December 07, 2010
Daily Bookmarks 20101207
進階gdb
http://www.study-area.org/cyril/opentools/opentools/x1265.html
........: VIM
http://anrris-lab.blogspot.com/2009/08/vim.html
工程師的家 - 《五級》星星題--菱形
http://ehome.hifly.to/showthread.php?threadid=232
Virtual Memory
http://www.cs.duke.edu/~narten/110/nachos/main/node34.html#SECTION00074000000000000000
CSC546 - Operating Systems
http://condor.depaul.edu/~glancast/546class/docs/lec7.html
lighttpd 支援的項目 | Gea-Suan Lin's BLOG
http://blog.gslin.org/archives/2006/03/09/439/lighttpd-%E6%94%AF%E6%8F%B4%E7%9A%84%E9%A0%85%E7%9B%AE
flv streaming with lighttpd
http://blog.lighttpd.net/articles/2006/03/09/flv-streaming-with-lighttpd
High Scalability - High Scalability - YouTube Architecture
http://highscalability.com/youtube-architecture
YouTube: The Platform
http://techcrunch.com/2008/03/12/youtube-the-platform/
YouTube架构学习 - hideto - JavaEye技术网站
http://hideto.javaeye.com/blog/129726
Top 10 Largest Databases in the World -Good site
http://www.focus.com/fyi/operations/10-largest-databases-in-the-world/
Hash Maps with linear probing and separate chaining | Daniel Graziotin
http://task3.cc/308/hash-maps-with-linear-probing-and-separate-chaining/
SparkNotes: Hash Tables: Coding up a Hash Table
http://www.sparknotes.com/cs/searching/hashtables/section3.rhtml
ncache - Project Hosting on Google Code
http://code.google.com/p/ncache/
新浪使用的开源项目及开放平台 – 拾豆网
http://www.ctoof.com/archives/3002
Tags:memcacheq - 回忆未来[张宴] - 服务器系统架构与底层研发
http://blog.s135.com/tags/memcacheq/
强人推出恋爱约会指南!单身男女青年必备~ - 希奇古怪志 - 鲜为人志
http://www.i-oo.com/post/607.html
鲜为人摘_鲜为人摘 | 乐自有我
http://www.i-oo.net/a/xianzhai/
Chrome 的 WebSocket 測試 « Chui-Wen Chiu's Note
http://chuiwenchiu.wordpress.com/2009/12/14/chrome-%E7%9A%84-websocket-%E6%B8%AC%E8%A9%A6/
http://www.study-area.org/cyril/opentools/opentools/x1265.html
........: VIM
http://anrris-lab.blogspot.com/2009/08/vim.html
工程師的家 - 《五級》星星題--菱形
http://ehome.hifly.to/showthread.php?threadid=232
Virtual Memory
http://www.cs.duke.edu/~narten/110/nachos/main/node34.html#SECTION00074000000000000000
CSC546 - Operating Systems
http://condor.depaul.edu/~glancast/546class/docs/lec7.html
lighttpd 支援的項目 | Gea-Suan Lin's BLOG
http://blog.gslin.org/archives/2006/03/09/439/lighttpd-%E6%94%AF%E6%8F%B4%E7%9A%84%E9%A0%85%E7%9B%AE
flv streaming with lighttpd
http://blog.lighttpd.net/articles/2006/03/09/flv-streaming-with-lighttpd
High Scalability - High Scalability - YouTube Architecture
http://highscalability.com/youtube-architecture
YouTube: The Platform
http://techcrunch.com/2008/03/12/youtube-the-platform/
YouTube架构学习 - hideto - JavaEye技术网站
http://hideto.javaeye.com/blog/129726
Top 10 Largest Databases in the World -Good site
http://www.focus.com/fyi/operations/10-largest-databases-in-the-world/
Hash Maps with linear probing and separate chaining | Daniel Graziotin
http://task3.cc/308/hash-maps-with-linear-probing-and-separate-chaining/
SparkNotes: Hash Tables: Coding up a Hash Table
http://www.sparknotes.com/cs/searching/hashtables/section3.rhtml
ncache - Project Hosting on Google Code
http://code.google.com/p/ncache/
新浪使用的开源项目及开放平台 – 拾豆网
http://www.ctoof.com/archives/3002
Tags:memcacheq - 回忆未来[张宴] - 服务器系统架构与底层研发
http://blog.s135.com/tags/memcacheq/
强人推出恋爱约会指南!单身男女青年必备~ - 希奇古怪志 - 鲜为人志
http://www.i-oo.com/post/607.html
鲜为人摘_鲜为人摘 | 乐自有我
http://www.i-oo.net/a/xianzhai/
Chrome 的 WebSocket 測試 « Chui-Wen Chiu's Note
http://chuiwenchiu.wordpress.com/2009/12/14/chrome-%E7%9A%84-websocket-%E6%B8%AC%E8%A9%A6/
Monday, December 06, 2010
Daily Bookmarks 20101206
UCI Data Cleaning and Entity Resolution Project.
http://www.ics.uci.edu/~dvk/GDF/
WEST: Modern Technologies for Web People Search (IEEE ICDE 2009)
http://www.ics.uci.edu/~dvk/pub/ICDE09_dvk_WEST.html
Faceted Search
分面搜索(Faceted Search) – 腾讯CDC Good site
http://cdc.tencent.com/?p=1401
Search « Alibaba.com UED
http://www.aliued.com/tag/search/
LinkedIn Search: A Look Beneath the Hood
http://thenoisychannel.com/2010/01/31/linkedin-search-a-look-beneath-the-hood/
kafka0102的边城客栈 » 一周技术文档分享
http://www.kafka0102.com/2010/02/46.html
kafka0102的边城客栈 » 分享Poppen.de架构经验
http://www.kafka0102.com/2010/04/96.html
kafka0102的边城客栈 » 使用Zoie构建实时检索系统
http://www.kafka0102.com/2010/05/119.html
kafka0102的边城客栈 » twitter的新搜索架构
http://www.kafka0102.com/2010/10/347.html
People You May Know — Now With Faceted Search!
http://thenoisychannel.com/2010/05/15/people-you-may-know-now-with-faceted-search/
[Facebook] 計算粉絲團內特定文章的按"讚"數
http://patw.idv.tw/blog/archives/136
並行 HTTP請求在PHP中使用PECL的HTTP類 [答案:HttpRequestPool類]
http://zh-tw.w3support.net/index.php?db=so&id=168951
7-2 網頁抓取與分析:進階篇
http://mirlab.org/jang/books/perl/getWebPage02.asp?title=7-2%20%BA%F4%AD%B6%A7%EC%A8%FA%BBP%A4%C0%AAR%A1G%B6i%B6%A5%BDg
Roger Jang's Home Page 清華大學張智星
http://neural.cs.nthu.edu.tw/jang/
http://www.ics.uci.edu/~dvk/GDF/
WEST: Modern Technologies for Web People Search (IEEE ICDE 2009)
http://www.ics.uci.edu/~dvk/pub/ICDE09_dvk_WEST.html
Faceted Search
分面搜索(Faceted Search) – 腾讯CDC Good site
http://cdc.tencent.com/?p=1401
Search « Alibaba.com UED
http://www.aliued.com/tag/search/
LinkedIn Search: A Look Beneath the Hood
http://thenoisychannel.com/2010/01/31/linkedin-search-a-look-beneath-the-hood/
kafka0102的边城客栈 » 一周技术文档分享
http://www.kafka0102.com/2010/02/46.html
kafka0102的边城客栈 » 分享Poppen.de架构经验
http://www.kafka0102.com/2010/04/96.html
kafka0102的边城客栈 » 使用Zoie构建实时检索系统
http://www.kafka0102.com/2010/05/119.html
kafka0102的边城客栈 » twitter的新搜索架构
http://www.kafka0102.com/2010/10/347.html
People You May Know — Now With Faceted Search!
http://thenoisychannel.com/2010/05/15/people-you-may-know-now-with-faceted-search/
[Facebook] 計算粉絲團內特定文章的按"讚"數
http://patw.idv.tw/blog/archives/136
並行 HTTP請求在PHP中使用PECL的HTTP類 [答案:HttpRequestPool類]
http://zh-tw.w3support.net/index.php?db=so&id=168951
7-2 網頁抓取與分析:進階篇
http://mirlab.org/jang/books/perl/getWebPage02.asp?title=7-2%20%BA%F4%AD%B6%A7%EC%A8%FA%BBP%A4%C0%AAR%A1G%B6i%B6%A5%BDg
Roger Jang's Home Page 清華大學張智星
http://neural.cs.nthu.edu.tw/jang/
Wednesday, December 01, 2010
Daily Bookmarks 20101201
BBS to WordPress via XML-RPC 實作說明 | 野貓的零碎生活片段
http://blog.wildcat.tw/?p=206
Tweetrank
http://tweetrank.me/
HadoopDB Quick Start Guide
http://hadoopdb.sourceforge.net/guide/quick_start_guide.html#SECTION00030000000000000000
HadoopDB Join Testing on 3-Node Cluster @ 第二十四個夏天後 :: 痞客邦 PIXNET ::
http://changyy.pixnet.net/blog/post/25684989
HadoopDB
http://www.bullogger.com/blogs/dbanotes/archives/315847.aspx
[周报全文]HadoopDB混合数据库问世-周报全文-CNW.com.cn!
http://www.cnw.com.cn/weekly/htm2009/20090805_179407.shtml
boxing computer
框计算_互动百科
http://www.hudong.com/wiki/%E6%A1%86%E8%AE%A1%E7%AE%97
百度槓Google 強打框計算 - 線上國度 - 網路文化 - udn數位資訊
http://mag.udn.com/mag/digital/storypage.jsp?f_ART_ID=208892
百度“框計算”一周年 發布新標識_新華網
http://big5.xinhuanet.com/gate/big5/news.xinhuanet.com/eworld/2010-08/21/c_12469187.htm
YTS
Yahoo! 把 Traffic Server 的 source code 放出來了 | Gea-Suan Lin's BLOG
http://blog.gslin.org/archives/2009/10/30/2133/yahoo-%E6%8A%8A-traffic-server-%E7%9A%84-source-code-%E6%94%BE%E5%87%BA%E4%BE%86%E4%BA%86
Yahoo釋出雲端運算加速器 "Traffic Server" ,個人想法分享 @ Min's Web Life: 談網路產業研究與生活閒聊 :: 痞客邦 PIXNET ::
http://miin1130.pixnet.net/blog/post/24670966
雅虎开源Traffic Server | BING必应CHENG
http://www.52bingcheng.com/2009/11/01/traffic_server/
http://blog.wildcat.tw/?p=206
Tweetrank
http://tweetrank.me/
HadoopDB Quick Start Guide
http://hadoopdb.sourceforge.net/guide/quick_start_guide.html#SECTION00030000000000000000
HadoopDB Join Testing on 3-Node Cluster @ 第二十四個夏天後 :: 痞客邦 PIXNET ::
http://changyy.pixnet.net/blog/post/25684989
HadoopDB
http://www.bullogger.com/blogs/dbanotes/archives/315847.aspx
[周报全文]HadoopDB混合数据库问世-周报全文-CNW.com.cn!
http://www.cnw.com.cn/weekly/htm2009/20090805_179407.shtml
boxing computer
框计算_互动百科
http://www.hudong.com/wiki/%E6%A1%86%E8%AE%A1%E7%AE%97
百度槓Google 強打框計算 - 線上國度 - 網路文化 - udn數位資訊
http://mag.udn.com/mag/digital/storypage.jsp?f_ART_ID=208892
百度“框計算”一周年 發布新標識_新華網
http://big5.xinhuanet.com/gate/big5/news.xinhuanet.com/eworld/2010-08/21/c_12469187.htm
框计算
框計算是什麼? What is box computing? « 搜尋引擎 « 台灣搜尋引擎優化與行銷研究院:SEO:SEM Good site
YTS
Yahoo! 把 Traffic Server 的 source code 放出來了 | Gea-Suan Lin's BLOG
http://blog.gslin.org/archives/2009/10/30/2133/yahoo-%E6%8A%8A-traffic-server-%E7%9A%84-source-code-%E6%94%BE%E5%87%BA%E4%BE%86%E4%BA%86
Yahoo釋出雲端運算加速器 "Traffic Server" ,個人想法分享 @ Min's Web Life: 談網路產業研究與生活閒聊 :: 痞客邦 PIXNET ::
http://miin1130.pixnet.net/blog/post/24670966
雅虎开源Traffic Server | BING必应CHENG
http://www.52bingcheng.com/2009/11/01/traffic_server/
Subscribe to:
Posts (Atom)