-
Website
http://www.eflorenzano.com/ -
Original page
http://www.eflorenzano.com/blog/post/my-thoughts-nosql/ -
Subscribe
All Comments -
Community
-
Top Commenters
-
traviscooper
1 comment · 2 points
-
pppoll37
7 comments · 1 points
-
Christine taylor
1 comment · 1 points
-
dobrych
2 comments · 3 points
-
bosveld
1 comment · 1 points
-
-
Popular Threads
I have a particular problem in an app I'm working on and tokyo cabinet tables seem to address it perfectly. I have a server that provides a service that basically runs func(ItemA, ItemB, ItemC) and that takes a while so I'd like to cache the results. It's easy to stick them in a KV store and just get them out with a simple key (ItemA.id, ItemB.id, ItemC.id), and TC is wicked fast for that. The issue is, when some given ItemA changes, how do I invalidate the cached results for that? I need to make a simple query for whatever items have modified_a.id as the first part of their key so I can delete them. TC tables seem *perfect* for that.
I don't want to use a relational DB because that's *way* to heavyweight. I don't need joins. All I've got is a single table and I want to do some really simple querying on it.
Very good informations.
Greetings and all good wishes from Russia
It's true that 99.8% sites are fine with SQL. SQL is tried and true technology and if it works for you, why look elsewhere? However, I believe there are a lot of potential applications out there that nobody gets done because they don't fit well in the standard databases.
Regarding the implied SQL-hate of "the nosql movement":
"The idea is that people think data-persistence == SQL which is not the case. #nosql is not saying SQL is bad." — http://twitter.com/janl/status/2742018744
Cheers
Jan
--
Now I wonder if non-relational databases really offer performance advantages on read/write and write mostly workloads? Or will we end up implementing ad-hoc relational abstraction layers on top of them?
I only used CouchDB briefly and my first reaction was to exactly that.
We shouldn't confuse access layer activities with data access language semantics
I'd add another contender into the mix even if its only in beta. MSFT's Live Mesh is a consumer app built on top of a hierarchical sync'd entity data store. It doesn't really have a name per se but they have a RESTful API in beta that will soon be available for anyone to party on.
-Jamie
I think you hit the nail on the head with your comment "Most people ... relying less on the features provided by traditional relational databases and engineering more database logic in their application code."
For the time being, I don't mind SQL and can get it to do what I want so I can wait for these new technologies to mature.
Rich
Terry
Can you tell me which ORM you're using? I've used several (ActiveRecord, NHibernate, etc.), and found them all to be pretty lousy at ease-of-use compared to even a crappy OODB.
It's like C. Most people writing OO code today don't write it in C. You certainly can, but it never feels like more than a pale imitation of the real thing. ORMs feel the same way to me: you're pretending your RDBMS records are objects, with an abstraction layer so flimsy you can almost never forget about what's just on the other side.
It sounds like you're making the classic mistake of assuming the only types of databases are ancient RDBMSs and brand-new distributed key-value stores. You don't mention any OODBs, even in your list in the conclusion. Object databases are doing just fine, thanks! :-)
Atamert: Trust me when I say that I want a good, open source, column store as well. At work we evaluated lots of options and ended up buying a license of Vertica. Vertica is pretty awesome, but being a fan of open source it would have been great to have found an open source product. To my knowledge one doesn't exist still.
Michael: You're probably right, I guess I was wrong about people hating SQL. It must have just been a few of the people that I talked to in person or something.
Ville: I think there's an interesting case to be made there. Do we change our way of thinking simply because we don't have tools readily available to do things in other ways? I think that yes, we do.
Terry: I'm super excited about FluidDB, and I'd love an early peek at the docs :)
Tim: I've always been partial to Django's ORM, but I also like SQLAlchemy as well. I've never found the impedence mismatch to be a real problem in practice.
I'm not sure if you received my dm in Twitter (perhaps not the most reliable comms medium right now). Anyway, if you send me a mail I'll send you some details.
I'm terry <at< fluidinfo >dot> com
Try subscribing to the Planet CouchDB feed sometime, they, at least, do say SQL is bad.
Most companies *don't* need to get rid of their SQL boxes. But if you're in the growing Social Media or BI fields, where you have to crunch hundreds of TBs of data, then you have certain problems that you can't just toss at SQL. You have to engineer your data from the ground up, and make sure you make the correct sacrifices (Transactions, Denormalization, etc.)
I wrote a blog article on this topic called "Social Media Kills the RDBMS": http://www.roadtofailure.com/2009/06/19/social-...
The FLAIM engine/db for Novells eDirectory does this as well.
PS: I'm the admin of Planet CouchDB :) It is a collection of personal opinions. It is not an official statement of the CouchDB project or its developers.
In case some readers are interested, Prevayler (http://www.prevayler.org/) provides a more classic object persistence approach as an alternative to SQL databases.
I do think a lot of people blame scalability issues on SQL and RDBMS's in general. I don't know how many times I've heard "joins don't scale." Unfortunately some of this ignorant crowd has been drawn to NoSQL. Coincidence? Self selection? Who knows.
What's even worse are the folks who stick memcached in front of an RDBMS.
I'm a developer on the MongoDB (http://www.mongodb.org) project. The goal of the project is to bridge the functionality gap between key/value stores and RDBMSes, while maintaining the high performance and scalability gains you get with K/V systems. If anybody has any questions check out the mailing list.
insert into My_Table
(My_Column_1, My_Column_2)
values
('My_Value_1', 'My_Value_2')
Better than:
insert into My_Table
My_Column_1 = 'My_Value_1'
My_Column_2 = 'My_Value_2'
SQL is a suboptimal language. It is also lacks a convenient portable subset which is sufficient for most applications (e.g. http://www.reddit.com/r/programming/comments/92...).
Every ORM tool I have used (Hibernate etc.) has serious flaws and limitations. Just yesterday I found I could not have a unidirectional link to a 'table per concrete class' hierarchy.
I can work around the limitations, and issues of the SQL technologies, and create great software, but I still know SQL sucks in so many ways.
Take a look at mainstream frameworks like Rails et al, who wave away the problem by storing sessions in cookies - a very suboptimal and inefficient solution, IMO. I think many sites who would not otherwise need to look outside RDBMSs for their *content* are interested in TC et al for their session management.
Also, a little harsh on MongoDB I think? It might be "jack of all trades but master of none" but if it gives you 90% of what you need in 3 different areas, for most people that's a lot better than running 3 different programs.
One more thing - Redis is fantastic for many uses, but its memory usage is astonishingly high. I don't think it should even be called a DB because you'd be nuts to store data in it. I would think of it more as a smart cache.
SQL databases are sophisticated and include many features (query optimizers, declarative referential integrity, security, etc.).
Sophistication is not free. Creating an SQL database with scalability and performance is harder than creating a database with a simpler model with equivalent performance and scalability.
Therefore given constant time and budget, a system with joins is less scalable than a system with no joins. Compared to no joins, joins do not scale.
In case you didn't know, but 'insert into' has more advanced forms which are way more powerful and useful than your trivial example. See also http://en.wikipedia.org/wiki/Insert_(SQL)
Also, regarding "unidirectional link to a 'table per concrete class' hierarchy", we do this all the time... you just need to know where to put your foreign keys. Also, FYI there is no such thing as 'unidirectional link' in SQL...that's only a concept in your object model not in your database schema.
simple as that, sql is data persistence and data persistence is sql, if you can't have both then you have nothing.
permanently persisting data in xml and the like is for playground purposes, its NOT for business...
In the world of web development we inevitably end up denormalising a lot of data (especially if we depend on ORMs rather than writing the queries oursevles). You can basically make the argument that the longer a codebase is around, the further away from taking advantage of relational features it gets. Eventually you reach a point that you've shifted so much of the data management into code that you may as well switch to a key/value store.
Personally I think the problem isn't with relational databases, it's with the SQL programming language. SQL is great for ad-hoc queries and manipulating data but as a programming language, it utterly sucks.
The reason applications need to have the significant overhead of a seperate business logic layer is that SQL is such a feeble language for expressing business rules.
KV db's really shine when it comes to serving up the standard 'find this resource for that identifier' but maybe are not that great when it comes to generating reports across many different types of related data. CouchDB and its map-reduce facility may have solved that issue. It would be nice to see some standardization on how this could be accomplished across other schemaless db's (TokyoCabinet's Table store for example). Perhaps a UQL (unstructured query language) is in order.
ie: If it doesn't make sense to use SQL as a data store, why use it?
That this common sense thinking has become typified and meme'd as a "movement," amazes me.
Also, it's not an either/or situation here: web apps can use both relational and non-relational data stores as appropriate. Sho gets at this a couple of comments up by talking about managing sessions in a key/value store.
MongoDB supports some cool advanced queries out of the box - see here:
http://www.mongodb.org/display/DOCS/Advanced+Qu...
@Jochen
The MongoDB data model allows lists and many other types, including embedded documents. Can also do atomic append to list, update embedded document, etc. Might be worth a look.
Note: cassandra hasn't been demonstrated in a multithousand node production config yet.
Really?
I refer you to Google's explanation of the recent App Engine outage: http://groups.google.com/group/google-appengine...
Just because you have failover, doesn't mean it's not a single point of failure. Failover can break just like any other moving part.
@keong
SQL insert has many forms. Complexity like this in SQL results from SQL not being the modern, powerful, general tool we should create. Also many examples on the link you gave are non-portable, the other issue I discussed.
Some people do no perceive link in SQL as directional. Links in ORM are understood to be directional:
http://docs.jboss.org/hibernate/stable/annotati...
This is the name "Table" that may not be appropriate here, I don't see Tokyo Cabinet Table as a replacement of a traditional RDBMs table, but more as an extension of the Hash database. Call it SuperHash, you'll think it's great ;-)
I don't dispute that a lot of the world runs on XML. But I'm excited about noSQL projects that make the transport of JSON more popular. I think it is much easier for folks to share/merge broken XML than JSON, but time will tell.
@sho
100% right on.
I think you hit the nail on the head with your comment "Most people ... relying less on the features provided by traditional relational databases and engineering more database logic in their application code."
For the time being, I don't mind SQL and can get it to do what I want so I can wait for these new technologies to mature.
You can find them here:
www.cafepress.com/nosql
Ryan
Greetings from Germany
HotLena
To sum up my feelings: TC/TT is awefully fast and if you really need a very fast non-sharded simple key/value store - it fits that bill. It is on the other hand a one-man show. The table extensions etc are interesting.
CouchDB is cool indeed. It is not yet really "distributed" though - but hopefully this will come more and more. It has AFAICT from having worked with it quite a bit one major problem - it handles dynamic queries quite poorly. Also, your description was slightly inaccurate regarding the views, you don't really "call" those functions. Also - people often miss one of the most important facts - not only the values in CouchDB are JSON - but the keys are too.
MongoDB is my next project - I think it does a lot of what Couch does BUT also supports dynamic queries and thus covers a different (although overlapping) area of applications.
Hallo mein Name ist Anastasia, ich bin eine Russin aus Moskau und finde diese Seite super geil.
Liebe Grüße Anastasia
Greetings from Anastasia
anastasia from moscow in russia
Greetings from Germany
Julia
www.sexyjulia.sex-privat.tv
Please come visit my site Local Business Directory Of Cincinnati U.S.A. when you got time. Thanks.
My next step is to try a similar thing in Cassandra (without data serialization) and see how it fares. The issue that I think you should have mentioned is that although Cassandra uses the Perl Thrift API, the APIs in general into Cassandra are weak. The Ruby API (through the cassandra gem) is very straight forward though. TC has fairly well documented and straightforward Perl and Ruby APIs. -- http://eric.lubow.org/2009/databases/tokyo-tyra...
Great and very useful summary.
There are various reports that was gathered from data center statistics that analyze disk failure statistics:
* Actual disk failure/year is 3% (vs. estimates of 0.5 - 0.9%) – this is a 600% difference on reported vs. actual disk failure.
* There is NO correlation between failure rate and disk type – whether it is SCSI, SATA, or fiber channel. * * There is NO correlation between high disk temperature and failure rates
Those analysis shows that the approach of relying on a shared storage for reliability as with most RAC clusters is broken. Instead NOSSQL approach assumes that failure are inevitable and where designed to deal with those failure even under extreme scenarios.
I summarized that topic on one of my recent post: Why Existing Databases (RAC) are So Breakable!: http://natishalom.typepad.com/nati_shaloms_blog...
You may also want to consider another category of In-Memory-Data-Grid (http://en.wikipedia.org/wiki/Data_grid )
See more details on that regard on Todd Hoff (highscalability.com) write-up: Are Cloud Based Memory Architectures the Next Big Thing? http://highscalability.com/blog/2009/3/16/are-c...