Wednesday, February 27, 2013

Dairy Bookmarks 20130227


“结巴”分词:做最好的Python分词组件 - Alan Liu - 博客频道 - CSDN.NET
http://blog.csdn.net/liuzhoulong/article/details/8051676
Hadoop分布式环境下的数据抽样 - Alan Liu - 博客频道 - CSDN.NET
http://blog.csdn.net/liuzhoulong/article/details/6965471
大数据排序或取重或去重相关问题 - Alan Liu - 博客频道 - CSDN.NET
http://blog.csdn.net/liuzhoulong/article/details/6972331
Hadoop in China2011参会总结 - Alan Liu - 博客频道 - CSDN.NET
http://blog.csdn.net/liuzhoulong/article/details/7079042
hadoop和hive的实践应用(三)——hive的基本应用 - Alan Liu - 博客频道 - CSDN.NET
http://blog.csdn.net/liuzhoulong/article/details/6447075









http://blog.csdn.net/liuzhoulong/article/details/6447075

Thursday, February 21, 2013

Daily Bookmarks 20130221

Hive-0.8.1 索引解析(BitMapIndex) - liwei_1988的专栏 - 博客频道 - CSDN.NET
http://blog.csdn.net/liwei_1988/article/details/7352643
liwei_1988的专栏 - 博客频道 - CSDN.NET
http://blog.csdn.net/liwei_1988?viewmode=contents
Ubuntu下安装eclipse开发环境for Hive(mysql) - liwei_1988的专栏 - 博客频道 - CSDN.NET
http://blog.csdn.net/liwei_1988/article/details/7311691

Hive执行流程分析 - liwei_1988的专栏 - 博客频道 - CSDN.NET
http://blog.csdn.net/liwei_1988/article/details/7333187





Wednesday, February 20, 2013

Daily Bookmarks 20130220

How to implement RESTful authentication - Synopse
http://blog.synopse.info/post/2011/05/24/How-to-implement-RESTful-authentication

Elasticsearch nabs $24M because search is hard, and big data makes it harder
http://pandodaily.com/2013/02/19/elasticsearch-nabs-24m-because-search-is-hard-and-big-data-makes-it-harder/
Hive for Beginners | Orzota Blog
http://orzota.com/blog/hive-for-beginners/

写好Hive 程序的五个提示 - 阿里集团数据平台 alidata.org
http://www.alidata.org/archives/622
Hive SQL语法解读 - Hadoop - SimpleFramework
http://simpleframework.net/blog/v/13954.html
从hive将数据导出到mysql
http://abloz.com/2012/07/20/export-data-to-mysql-from-the-hive.html

HBase的一些应用设计tip - Change Dir - BlogJava
http://www.blogjava.net/changedi/archive/2013/01/02/393697.html





Tuesday, February 19, 2013

Daily Bookmarks 20130219

42qu 源碼架構導讀 PowerPoint Presentation
https://docs.google.com/viewer?a=v&q=cache:A5EpzhMvm-wJ:woodpecker4org.b0.upaiyun.com/pyconcn/pycon2012china/121020-bj-slides/11-121020-%25E5%25BC%25A0%25E6%25B2%2588%25E9%25B9%258F-42qu%25E6%25BA%2590%25E7%25A0%2581%25E5%25AF%25BC%25E8%25AF%25BB.pdf+&hl=zh-TW&gl=tw&pid=bl&srcid=ADGEESg6buHwFIRbiOqwhcrPuJDZj2eWR8ZSnxWq6Y2TOCXksmTez04Hxa6vWJQAhgO9vx7fwkjvGO7rVlwAN-niDP_FcEHj-XEnf2PXo6kjdw0w9Iatgla97pIg4mBL49p1-o6m8oqy&sig=AHIEtbTJmggICPsMZhFP6ECmHiME1VupBQ

TopGeek | 和而不同,为而不争,让好奇心引导我们寻找技术的道! | Page 3
http://topgeek.org/?paged=3
Agile Data - O'Reilly Media
http://shop.oreilly.com/product/0636920025054.do
mikedewar/d3py · GitHub
https://github.com/mikedewar/d3py#readme
Open Data Application Showcases - Google 雲端硬碟
https://docs.google.com/presentation/d/1L-Ln-5AMttcjt217uBBgZ64kHz5x2k-DXZq2a8f4X_s/edit#slide=id.gce045725_920
pyvideo.org - Server Log Analysis with Pandas
http://pyvideo.org/video/1593/server-log-analysis-with-pandas
Will's blog
http://www.bluesock.org/~willg/blog/
Will Guaraldi Kahn-Greene
http://bluesock.org/~willg/


Swift Python Script - SDSC Cloud Storage
https://cloud.sdsc.edu/hp/swift.php#large
Python HTTP upload large file « Ashes of Time
http://chapter09.sinaapp.com/?p=824

Transloadit | File upload processing web service
https://transloadit.com/
Exploring the FileSystem APIs - HTML5 Rocks
http://www.html5rocks.com/en/tutorials/file/filesystem/
Streaming uploads to S3 with Python and Poster
http://blog.odonnell.nu/posts/streaming-uploads-s3-python-and-poster/

Swift Python Script - SDSC Cloud Storage
https://cloud.sdsc.edu/hp/swift.php#large


为什么Hadoop将一定会是分布式计算的未来 - xiaotom5的专栏 - 博客频道 - CSDN.NET
http://blog.csdn.net/xiaotom5/article/details/8146428

Java 开发 2.0: 用 Hadoop MapReduce 进行大数据分析
http://www.ibm.com/developerworks/cn/java/j-javadev2-15/index.html
Site Admin | FanStock
http://fanstock.mirlab.org/admin/










Monday, February 18, 2013

Daily Bookmarks 20130218

zookeeper
zookeeper - wenfeng762 - 博客园
http://www.cnblogs.com/wenfeng762/archive/2011/11/13/2247576.html
Zookeeper 簡介 - waue0920
https://sites.google.com/site/waue0920/Home/zookeeper/zookeeper-jian-jie
zookeeper3.3学习笔记1:hello world - 十万小时之旅 - ITeye技术网站
http://eryk.iteye.com/blog/1187284

各大型网站架构分析收集 - 黄刚的专栏 - 博客频道 - CSDN.NET
http://blog.csdn.net/lovingprince/article/details/3379710
Amazon 的 Dynamo 架构
http://dbanotes.net/tech-memo/amazon_dynamo.html
感受《打造 Facebook》
http://dbanotes.net/review/facebook_5_years.html
闲谈 Web 图片服务器
http://dbanotes.net/web/web_image_server.html


Alexandru Popescu谈InfoQ.com网站架构
http://www.infoq.com/cn/interviews/popescu-infoq-architecture-cn#
征服延迟:Map Reduce数据分析案例研讨
http://www.infoq.com/cn/presentations/conquer-delay-map-reduce-data-analysis-case-studies

Phoenix:在Apache HBase上执行SQL查询

http://www.infoq.com/cn/news/2013/02/Phoenix-HBase-SQL





Sunday, February 17, 2013

Daily Bookmarks 20130217

Tableau上海推广会小结 - dataV数据可视化分享平台
http://datavlab.org/cat/news
新书 The Data Journalism Handbook 上线了 - dataV数据可视化分享平台
http://datavlab.org/2012/05/03/3177#more-3177
Public Data Goes Social - The Data Journalism Handbook
http://datajournalismhandbook.org/1.0/en/delivering_data_9.html
Data visualization DIY - Our Top Tools - The Data Journalism Handbook
http://datajournalismhandbook.org/1.0/en/delivering_data_7.html

Case studies | ckan - The open source data portal software
http://ckan.org/case-studies/
可视化案例 - dataV数据可视化分享平台
http://datavlab.org/cases
图表汇_新浪博客
http://blog.sina.com.cn/huangyu4124
学习造轮子 - 彭琪谈编程
http://blog.pengqi.me/2012/12/09/learn-reinventing-wheels/
使用Node.js和Redis实现push服务 - 彭琪谈编程
http://blog.pengqi.me/2012/12/30/implement-push-service-using-node.js-and-redis/

微信技术总监解读微信架构的秘密 - 讲堂活动 - 腾讯大讲堂 to read
http://djt.qq.com/article-201-1.html

扫描discuz里的垃圾帖 - 彭琪谈编程
http://blog.pengqi.me/2013/01/09/check-discuz-spam/






Wednesday, February 06, 2013

Daily Bookmarks 20130206


Qubole Blog | Next Generation Cloud Data Platform
http://www.qubole.com/blog/
Usage Scenarios | Qubole
http://www.qubole.com/usage-scenarios
Wikipedia Page Traffic Statistics : Public Data Sets : Amazon Web Services
http://aws.amazon.com/datasets/2596
Public Data Sets : Amazon Web Services
http://aws.amazon.com/datasets?c=25&p=3&sm=dD
Public Data Sets : Amazon Web Services
http://aws.amazon.com/datasets
Scheduler | Qubole
http://www.qubole.com/loaders?q=scheduler
Log Parsing through Hadoop, Hive & Python | Search, Data and Technology
http://www.hiregion.com/2010/02/log-parsing-through-hadoop-hive-python.html
LanguageManual Transform - Apache Hive - Apache Software Foundation
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Transform
forcedotcom/phoenix · GitHub
https://github.com/forcedotcom/phoenix
How does Blur Work?
http://incubator.apache.org/blur/how_it_works.html
Apache HBase Region Splitting and Merging | Hortonworks
http://hortonworks.com/blog/apache-hbase-region-splitting-and-merging/

9.7. Regions

http://hbase.apache.org/book/regions.arch.html#compaction
详解HBase Compaction - NoSQLFan - 关注NoSQL相关技术、新闻
http://blog.nosqlfan.com/html/1080.html
HBase运维碎碎念 - NoSQLFan - 关注NoSQL相关技术、新闻
http://blog.nosqlfan.com/html/2225.html
关于HBase的一些零碎事 - NoSQLFan - 关注NoSQL相关技术、新闻
http://blog.nosqlfan.com/html/2090.html
High Scalability - High Scalability - Facebook's New Real-time Messaging System: HBase to Store 135+ Billion Messages a Month
http://highscalability.com/blog/2010/11/16/facebooks-new-real-time-messaging-system-hbase-to-store-135.html
HBase 文件结构图 - NoSQLFan - 关注NoSQL相关技术、新闻
http://blog.nosqlfan.com/html/1135.html
HBase简介与实践分享 - NoSQLFan - 关注NoSQL相关技术、新闻
http://blog.nosqlfan.com/html/2378.html
dirtysalt (dirtysalt)
https://github.com/dirtysalt
hbase
http://dirlt.com/hbase.html#sec-1-2-6
mapreduce - how to join tables in hbase - Stack Overflow
http://stackoverflow.com/questions/11327316/how-to-join-tables-in-hbase
hbase 物理結構 - Google 搜尋
https://www.google.com.tw/search?q=hbase+%E7%89%A9%E7%90%86%E7%B5%90%E6%A7%8B&hl=zh-TW&tbo=d&ei=QNEQUeCdLoGPmQW9xYHQDQ&start=10&sa=N&biw=731&bih=443
HBase体系结构 - 新城主力唱好 - 博客园
http://www.cnblogs.com/NicholasLee/archive/2012/09/13/2683223.html
如何理解Hadoop-Hbase原理与应用小结 - leonarding个人空间 - ITPUB个人空间 - powered by X-Space
http://space.itpub.net/26686207/viewspace-746977
Hbase系统架构及数据结构
http://www.uml.org.cn/zjjs/201211132.asp
HBase的一些应用设计tip - Change Dir - BlogJava
http://www.blogjava.net/changedi/archive/2013/01/02/393697.html

Daily Bookmarks 20130205

译言精选-谷歌新系统Dremel让大数据处理更加便捷 | 连线
http://select.yeeyan.org/view/338738/312827
Apache推出Google Dremel的开源版本Drill-CSDN.NET
http://www.csdn.net/article/2012-08-20/2808871


Monday, February 04, 2013

Daily Bookmarks 20130204


Big Data Engineering, Practices and Research: HFile: A Block-Indexed File Format to Store Sorted Key-Value Pairs
http://cloudepr.blogspot.tw/2009/09/hfile-block-indexed-file-format-to.html
HBase compaction 之辩 - 白话技术 - ITeye技术网站
http://hbase.iteye.com/blog/1192458
Scaling the Messages Application Back End
https://www.facebook.com/note.php?note_id=10150148835363920
Facebook messages实现解读
http://blog.bluedavy.com/?p=258
HBase简介与实践分享 - NoSQLFan - 关注NoSQL相关技术、新闻
http://blog.nosqlfan.com/html/2378.html

HBase 文件结构图 - NoSQLFan - 关注NoSQL相关技术、新闻
http://blog.nosqlfan.com/html/1135.html

High Scalability - High Scalability - Facebook's New Real-time Messaging System: HBase to Store 135+ Billion Messages a Month
http://highscalability.com/blog/2010/11/16/facebooks-new-real-time-messaging-system-hbase-to-store-135.html

yfrog的HBase应用经验 - NoSQLFan - 关注NoSQL相关技术、新闻
http://blog.nosqlfan.com/html/2509.html










图形化理解 HBase 数据写操作、压缩操作过程 - NoSQLFan - 关注NoSQL相关技术、新闻
http://blog.nosqlfan.com/html/1249.html

Integrating Apache Hive and Apache HBase | Apache Hadoop for the Enterprise | Cloudera
http://blog.cloudera.com/blog/2010/06/integrating-hive-and-hbase/

sperm/essay/hbase.org at master · dirtysalt/sperm · GitHub
https://github.com/dirtysalt/sperm/blob/master/essay/hbase.org




hbase sort join
http://dirlt.com/hbase.html#sec-1-2-6



Daily Bookmarks 20130203

开始实践git-flow | Jeff的妙想奇境
http://www.jeffkit.info/2010/12/842/
你为神马不用git-flow呢? | Jeff的妙想奇境
http://www.jeffkit.info/2010/12/860/
python网络爬虫备忘记 | Jeff的妙想奇境
http://www.jeffkit.info/2008/10/547/#comment-444
用Redis实现分布式锁 | Jeff的妙想奇境
http://www.jeffkit.info/2011/07/1000/