Saturday, August 25, 2012

Daily Bookmarks 20120825

B2C之番茄树停止运营的必然性:business sense的先天缺失 - UCD大社区
http://ucdchina.com/snap/5020
-end-

Sunday, August 19, 2012

Daily Bookmarks 20120819

Nutch 相关 (一) 爬虫的研究 - stme - BlogJava
http://www.blogjava.net/stme/archive/2007/04/18/91788.html
Doug Cutting (Lucene-Nutch-Hadoop 创始人简介)_大云与云计算技术讨论_圈子_移动Labs 一小故事跟對搜尋引擎的想法
http://labs.chinamobile.com/groups/10219_12301
基于词典的正向最大匹配中文分词算法,能实现中英文数字混合分词 - lucene + hadoop 分布式并行计算搜索框架 - BlogJava
http://www.blogjava.net/nianzai/archive/2011/08/04/355786.html
Nutch 相关 (二)分词的算法 - stme - BlogJava
http://www.blogjava.net/stme/archive/2007/01/05/90111.html
Nutch 1.3 学习笔记 8 LinkDb - lemo的专栏 - 博客频道 - CSDN.NET 非常詳盡的解析linkdb
http://blog.csdn.net/amuseme_lu/article/details/6730756
源代码阅读笔记(2) --- nutch (Injector) 對於crawlerDB做解說
http://blog.sheimi.me/blog/2012/05/17/source-code-02.html


Lucene:基于Java的全文检索引擎简介 (笔记 by 车东)
http://www.chedong.com/tech/lucene.html
Lucene学习总结之三:Lucene的索引文件格式(1) - 觉先 - 博客园 非常好的文章
http://www.cnblogs.com/forfuture1978/archive/2009/12/14/1623597.html
Lucene学习总结之三:Lucene的索引文件格式(2) - 觉先 - 博客园
http://www.cnblogs.com/forfuture1978/archive/2009/12/14/1623599.html
谈谈Hadoop和分布式Lucene
http://www.chinacloud.cn/show.aspx?id=50&cid=12







-end-

Daily Bookmarks 20120818

jlhutch/pylru
https://github.com/jlhutch/pylru

Database War Stories #3: Flickr - O'Reilly Radar
http://radar.oreilly.com/2006/04/database-war-stories-3-flickr.html
Sharding and IDs at Instagram | Hacker News
http://news.ycombinator.com/item?id=3058327
Sharding & IDs at Instagram - Instagram Engineering
http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram
Instagram Engineering Challenge: The Unshredder - Instagram Engineering
http://instagram-engineering.tumblr.com/post/12651721845/instagram-engineering-challenge-the-unshredder
High Scalability - High Scalability - Instagram Architecture: 14 Million users, Terabytes of Photos, 100s of Instances, Dozens of Technologies
http://highscalability.com/blog/2011/12/6/instagram-architecture-14-million-users-terabytes-of-photos.html
High Scalability - High Scalability - Instagram Architecture Update: What’s new with Instagram? Shard 怎麼定義怎麼預先規定好
http://highscalability.com/blog/2012/4/16/instagram-architecture-update-whats-new-with-instagram.html
High Scalability - Search
http://highscalability.com/display/Search?searchQuery=instagram&moduleId=4876569&moduleFilter=&categoryFilter=&startAt=0
High Scalability - High Scalability - An Unorthodox Approach to Database Design : The Coming of the Shard
http://highscalability.com/blog/2009/8/6/an-unorthodox-approach-to-database-design-the-coming-of-the.html
High Scalability - High Scalability - Flickr Architecture
http://highscalability.com/flickr-architecture
flickr database design - Google 搜尋
https://www.google.com/search?q=photo+share+database+design&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a#hl=zh-TW&client=firefox-a&hs=108&rls=org.mozilla:en-US%3Aofficial&q=flickr++database+design&oq=flickr++database+design&gs_l=serp.3..0i30.4115.6690.7.6869.14.14.0.0.0.4.346.2210.6j4j2j2.14.0...0.0...1c.LtH9DmpW8yA&bav=on.2,or.r_gc.r_pw.r_cp.&fp=df7fd97ffab41db1&biw=1132&bih=597

lyxint/shurl
https://github.com/lyxint/shurl












-end-

Thursday, August 16, 2012

Daily Bookmarks 20120815


http://www.bittorrent.org/beps/bep_0005.html#dht-queries
python - The easiest DHT to implement - Stack Overflow
http://stackoverflow.com/questions/1704646/the-easiest-dht-to-implement
一致性哈希和分布式哈希表 | 呆鸥
http://www.dullgull.com/2012/05/%e4%b8%80%e8%87%b4%e6%80%a7%e5%93%88%e5%b8%8c%e5%92%8c%e5%88%86%e5%b8%83%e5%bc%8f%e5%93%88%e5%b8%8c%e8%a1%a8/

基于Hadoop的分布式索引构建 « 搜索技术博客-淘宝
http://www.searchtb.com/2012/04/distribute_index_build.html

顶级技术网站博客汇总 - MoTadou - 博客园
http://www.cnblogs.com/motadou/archive/2012/06/10/2544173.html
中文商品的标题信息分析 « 搜索技术博客-淘宝
http://www.searchtb.com/2012/05/product-chinese-title-analysis.html
看不完的书,折腾不完的周末 - Flower Fly - 博客大巴
http://flower.blogbus.com/logs/86156724.html

搜索研发部官方博客 » Blog Archive » 基于hash计算的多层实验流量切分的实现
http://stblog.baidu-tech.com/?p=1942
搜索研发部官方博客
http://www.baidu-tech.com/
搜索研发部官方博客 » Blog Archive » 使用hadoop进行大规模数据的全局排序
http://stblog.baidu-tech.com/?p=397
搜索研发部官方博客 » Blog Archive » 搜索引擎中的粒度问题
http://stblog.baidu-tech.com/?p=1429
搜索研发部官方博客 » Blog Archive » Boosting算法简介
http://stblog.baidu-tech.com/?p=19
搜索研发部官方博客 » Blog Archive » 基于主特征空间相似度计算的切分算法及切分框架
http://stblog.baidu-tech.com/?p=1383
搜索研发部官方博客 » Blog Archive » 搜索背后的奥秘——浅谈语义主题计算
http://stblog.baidu-tech.com/?p=1190
亚马逊 VS 当当SEO案例分析 — 以“朱镕基讲话实录”为例
http://i-baidu.com/2012/03/amazon-dangdang-seo-case-study/
定向抓取漫谈 « 搜索技术博客-淘宝
http://www.searchtb.com/2011/01/an-introduction-to-crawler.html














-end-

Monday, August 13, 2012

Daily Bookmarks 20120813

如何利用 CSS 製作多級選單? | Zespia
http://zespia.tw/blog/2012/01/24/css-multi-level-menu/
Scroll/Follow Sidebar, Multiple Techniques | CSS-Tricks
http://css-tricks.com/scrollfollow-sidebar/
Vertical Menu by E-Lusion - CSS Portal
http://www.cssportal.com/vertical-menus/elusion2.htm

Issue 10 - twitdao - 怎么配置多用户? - A web based twitter client on Google App Engine. - Google Project Hosting
http://code.google.com/p/twitdao/issues/detail?id=10
出现 NeedIndexError,是不是要等索引完毕? - V2EX
http://www.v2ex.com/t/2416
GAE 已支持中文全文搜索
http://www.keakon.net/2012/05/09/GAE%E5%B7%B2%E6%94%AF%E6%8C%81%E4%B8%AD%E6%96%87%E5%85%A8%E6%96%87%E6%90%9C%E7%B4%A2





-end-

Sunday, August 12, 2012

Daily Bookmarks 20120812

How to Implement a Search Engine Part 2: Query Index | | Arden DertatArden Dertat
http://www.ardendertat.com/2011/05/31/how-to-implement-a-search-engine-part-2-query-index/
Implementing Search Engines | | Arden DertatArden Dertat
http://www.ardendertat.com/2012/01/11/implementing-search-engines/
Write your first MapReduce program in 20 minutes | Michael Nielsen
http://michaelnielsen.org/blog/write-your-first-mapreduce-program-in-20-minutes/
反向索引Inverted Index – Map Reduce Program – 尘埃落定
http://www.lovelucy.info/inverted-index-mapreduce.html
external sorting - python - Python文档中心 - ChinaUnix.net -
http://bbs.chinaunix.net/thread-1625081-1-1.html
External sorting of large datasets : umbrant
http://www.umbrant.com/blog/2011/external_sorting.html
Ashish Sharma's Tech Blog: Inverted index - Lucene Data Structure
http://www.ashishsharma.me/2011/10/inverted-index-lucene-data-structure.html
Cloud9: A MapReduce Library for Hadoop » Exercises » Inverted Indexing » Solutions
http://lintool.github.com/Cloud9/docs/exercises/indexing-solutions.html

MapReduce from the basics to the actually useful (in under 30 minutes)
http://blog.cloudant.com/mapreduce-from-the-basics-to-the-actually-useful/








-end-

Tuesday, August 07, 2012

Friday, August 03, 2012

Daily Bookmarks 20120802

[Sviluppo] Driver per periferiche DVB-T USB basate su AF9035 - proposto per kernel 3.5 • Forum Ubuntu-it
http://forum.ubuntu-it.org/viewtopic.php?t=516182
FC17 DVB-T Firmware not found (anymore), upgrade problem - Ask Fedora: Community Knowledge Base and Support Forum
http://ask.fedoraproject.org/question/1745/fc17-dvb-t-firmware-not-found-anymore-upgrade
USB DVB Avermedia Volar HD (Page 1) / Kernel & Hardware / Arch Linux Forums
https://bbs.archlinux.org/viewtopic.php?id=130457



-end-