3、应用的在线数据存储部分是最不应该冒险尝试新技术的地方。一旦你丢失了你的数据库或出现了故障,这将是一次无法恢复的灾难。If you're not the developer of one of these new databases, and you're one of a very small number of companies using them at scale in production, you're at the mercy of the developer to fix bugs and handle scalability issues as they come up.
6、Personally, I believe the relational data model is the "right" way to structure most of the data for an application like Quora (and for most user-generated content sites). Schemas allow the data to persist in a typed manner across lots of new versions of the application as it's developed, they serve as documentation, and prevent a lot of bugs. And SQL lets you move the computation to the data as necessary rather than having to fetch a ton of data and post-process it in the application everywhere. I think the "NoSQL" fad will end when someone finally implements a distributed relational database with relaxed semantics.
选择Pylons,就像你在万圣节选择一个南瓜,他们去除了其内部的templates 和ORM,然后加入了自己的技术,用Python写的。看这里LiveNode and webnode2 reside.
MochiMedia(一个在线游戏网站)也使用了 Pylons.
Python
因为Charlie 和Adam来自FaceBook, PHP应该说是一个很好的选择。然而正如Adam 说的。“Facebook is stuck on that for legacy reasons, not because it is the best choice right now“。从这些经验来看,他们知道技术选型,尤其是编程语言对公司长期发展非常重要。他们也考虑过C#,Java,和Scala。选择C#则不仅仅是语言了。那需要他们引入微软的一大堆东西。Python胜过Java的原因是它比java更富有表现力,也更容易快速写出代码。Scala则太年轻。Adam 提到速度和缺乏类型检查是Python的弱点,但是他们都已经知道这个语言相当不错。针对性能要求高的后端组件他们使用C++来写,因为Python的速度不够。他们觉得Ruby和Python很接近,但是他们有Pyhton 的经验,对Ruby缺乏经验,因此Python胜出了。他们使用的是Pyhton 2.6。
How do you push messages back to a web-browser client through AJAX? Is there any way to do this without having the client constantly polling the server for updates?
Adam D’Angelo, Quora (Sep 29, 2010)
There is no reliable way to do this without having the client polling the server. However, you can make the server stall its responses (50 seconds is a safe bet) and then complete them when a message is ready for the client. This is called “long polling” and it’s how Quora, Gmail, Meebo, etc all handle the problem.
If you have a specialized server that uses epoll or kqueue, you should be able to hold on the order of 100k users per server (depending on how many messages are going). This is called the “c10k” problem. www.kegel.com/c10k.html
MySQL
就像Adam D’Angelo 的老雇主FaceBook一样,Quora 重度使用了MySQL。在Quora上的一个问题的回答中“When Adam D’Angelo says “partition your data at the application level”, what exactly does he mean?“, D’Angelo深入的讲述了在分布式存储的情况下如何使用MySQL(或者说关系型数据库)
最基本的建议是按需对数据进行分区,如果可能尽量把数据存储在一台机器上,并且使用一个带主键的hash表对横跨多个数据库的大数据集进行分区。必须避免表连接。对此,他认为FriendFeed的架构是一个很好的例子。FriendFeed的架构在Bret Taylor 的文章“How FriendFeed uses MySQL to store schema-less data“中有所阐述。D’Angelo 也说道,你不应该在一个社交网站中使用NOSQL 数据库,除非你有上百万的用户。
不仅仅Quora和FriendFeed 侧重使用MySQL,是否听说过“Google”?真的很难想像,在the words of Google上是这么说的:“Google使用MySQL在一些和搜索无关的应用上”。Google已经为MySQL发布了与复制,同步,监视和提升速度等方面相关的补丁。
How does one evaluate if a database is efficient enough to not crash as it’s put under increasing load?
Adam D’Angelo, Quora (Oct 10, 2010)
One option is to simulate some load. Write a script that mimics the kinds of queries your application will be doing, and make sure it can handle the amount of load you want it to be ready for (especially as the size of the dataset changes).
Steve Souders, 《高性能网站建设指南》一书的作者列出了以下原则: rules for making websites faster. Quora的联合创始人 Charlie Cheever也提到过, 想必这也是Quora速度快的一个原因吧。
“One resource we used as a guide is Steve Souders’ list of rules for high performance websites:stevesouders.com/hpws/rules.php” – Charlie Cheever, Quora
Steve Souders’ 14 rules are…
1、Make Fewer HTTP Requests
2、Use a Content Delivery Network
3、Add an Expires Header
4、Gzip Components
5、Put Stylesheets at the Top
6、Put Scripts at the Bottom
7、Avoid CSS Expressions
8、Make JavaScript and CSS External
9、Reduce DNS Lookups Minify JavaScript
10、Avoid Redirects
11、Remove Duplicate Scripts
12、Configure ETags
13、Make AJAX Cacheable
3、应用的在线数据存储部分是最不应该冒险尝试新技术的地方。一旦你丢失了你的数据库或出现了故障,这将是一次无法恢复的灾难。If you're not the developer of one of these new databases, and you're one of a very small number of companies using them at scale in production, you're at the mercy of the developer to fix bugs and handle scalability issues as they come up.
6、Personally, I believe the relational data model is the "right" way to structure most of the data for an application like Quora (and for most user-generated content sites). Schemas allow the data to persist in a typed manner across lots of new versions of the application as it's developed, they serve as documentation, and prevent a lot of bugs. And SQL lets you move the computation to the data as necessary rather than having to fetch a ton of data and post-process it in the application everywhere. I think the "NoSQL" fad will end when someone finally implements a distributed relational database with relaxed semantics.
选择Pylons,就像你在万圣节选择一个南瓜,他们去除了其内部的templates 和ORM,然后加入了自己的技术,用Python写的。看这里LiveNode and webnode2 reside.
MochiMedia(一个在线游戏网站)也使用了 Pylons.
Python
因为Charlie 和Adam来自FaceBook, PHP应该说是一个很好的选择。然而正如Adam 说的。“Facebook is stuck on that for legacy reasons, not because it is the best choice right now“。从这些经验来看,他们知道技术选型,尤其是编程语言对公司长期发展非常重要。他们也考虑过C#,Java,和Scala。选择C#则不仅仅是语言了。那需要他们引入微软的一大堆东西。Python胜过Java的原因是它比java更富有表现力,也更容易快速写出代码。Scala则太年轻。Adam 提到速度和缺乏类型检查是Python的弱点,但是他们都已经知道这个语言相当不错。针对性能要求高的后端组件他们使用C++来写,因为Python的速度不够。他们觉得Ruby和Python很接近,但是他们有Pyhton 的经验,对Ruby缺乏经验,因此Python胜出了。他们使用的是Pyhton 2.6。
How do you push messages back to a web-browser client through AJAX? Is there any way to do this without having the client constantly polling the server for updates?
Adam D’Angelo, Quora (Sep 29, 2010)
There is no reliable way to do this without having the client polling the server. However, you can make the server stall its responses (50 seconds is a safe bet) and then complete them when a message is ready for the client. This is called “long polling” and it’s how Quora, Gmail, Meebo, etc all handle the problem.
If you have a specialized server that uses epoll or kqueue, you should be able to hold on the order of 100k users per server (depending on how many messages are going). This is called the “c10k” problem. www.kegel.com/c10k.html
MySQL
就像Adam D’Angelo 的老雇主FaceBook一样,Quora 重度使用了MySQL。在Quora上的一个问题的回答中“When Adam D’Angelo says “partition your data at the application level”, what exactly does he mean?“, D’Angelo深入的讲述了在分布式存储的情况下如何使用MySQL(或者说关系型数据库)
最基本的建议是按需对数据进行分区,如果可能尽量把数据存储在一台机器上,并且使用一个带主键的hash表对横跨多个数据库的大数据集进行分区。必须避免表连接。对此,他认为FriendFeed的架构是一个很好的例子。FriendFeed的架构在Bret Taylor 的文章“How FriendFeed uses MySQL to store schema-less data“中有所阐述。D’Angelo 也说道,你不应该在一个社交网站中使用NOSQL 数据库,除非你有上百万的用户。
不仅仅Quora和FriendFeed 侧重使用MySQL,是否听说过“Google”?真的很难想像,在the words of Google上是这么说的:“Google使用MySQL在一些和搜索无关的应用上”。Google已经为MySQL发布了与复制,同步,监视和提升速度等方面相关的补丁。
How does one evaluate if a database is efficient enough to not crash as it’s put under increasing load?
Adam D’Angelo, Quora (Oct 10, 2010)
One option is to simulate some load. Write a script that mimics the kinds of queries your application will be doing, and make sure it can handle the amount of load you want it to be ready for (especially as the size of the dataset changes).
Steve Souders, 《高性能网站建设指南》一书的作者列出了以下原则: rules for making websites faster. Quora的联合创始人 Charlie Cheever也提到过, 想必这也是Quora速度快的一个原因吧。
“One resource we used as a guide is Steve Souders’ list of rules for high performance websites:stevesouders.com/hpws/rules.php” – Charlie Cheever, Quora
Steve Souders’ 14 rules are…
1、Make Fewer HTTP Requests
2、Use a Content Delivery Network
3、Add an Expires Header
4、Gzip Components
5、Put Stylesheets at the Top
6、Put Scripts at the Bottom
7、Avoid CSS Expressions
8、Make JavaScript and CSS External
9、Reduce DNS Lookups Minify JavaScript
10、Avoid Redirects
11、Remove Duplicate Scripts
12、Configure ETags
13、Make AJAX Cacheable