Amazon re:Invent 2020 is in full swing and although some of their announcements impact my current career more than others (still waiting to hear about Timestream updates…), one item did blow up my Twitter feed a bit on Tuesday (12/1/2020).
AWS goes after Microsoft’s SQL Server with Babelfish
Techcrunch
Want more PostgreSQL? You just might like Babelfish
Matt Asay, AWS
My first reaction was, “Whoa… this is a big deal!” Having spent a lot of my career with SQL Server (and still a big fan of the community), the jump back to PostgreSQL wasn’t as smooth as I wanted it to be. Two years later, things have changed dramatically and I’m feeling much more at home (and enthusiastic) about PostgreSQL and the community. So much so that I switched jobs and careers to join the awesome team at Timescale!
So back to Babelfish. 1 week into thinking about the announcement, and I’m a lot less excited. Let me explain.
The problem isn’t Babelfish
The imputes for creating the tool is clear for AWS. Provide a way for customers to easily connect a SQL Server app to Aurora Postgres, saving big on licensing fees and reducing total cost of ownership. Assuming the tool is successful at some level, I’m sure it will provide a revenue boost for Amazon and some customers might (initially) feel a win. No harm, no foul on Amazon for leading the effort. Free markets, baby!
No matter how clever Babelfish is, however, I just can’t see how this is ultimately a win for SQL Server or PostgreSQL… or the developers that will ultimately need to support these “hybrid” apps.
(Quick Disclaimer: I wasn’t able to attend the live streamed deep dive session from re:Invent on Aurora Postgres and Babelfish (I’m waiting for the recording to post), so I’m sure there are details I’m overlooking, which may impact what I’m about to say. And, one other caveat… I have no doubt that the team spearheading this project is smart, hardworking, and doing everything they can to make Babelfish as awesome as possible.)
For some Amazon customers pricing may actually be reduced, at least initially, resulting in the promised financial win. Depending on the application and code, there might be a negligible performance impact (or maybe even an improvement??) Honestly, there might be plenty of apps that recreate the schema, flip the connection string, and cruise into the sunset – Piña Colada in hand.
Even if this all works better than expected, I still think there will be more lost than gained for one reason:
SQL illiteracy.
The problem is SQL illiteracy
I’ve been involved in software development for nearly 20 years and always with an affinity for databases and SQL in general. Even then, in a room of 100 SQL folks (SQL Server or PostgreSQL), my level of expertise is probably in the bottom 20%. (but my, how I do love learning more every day!) Still, in every job I’ve had, teaching and mentoring have been a focus that I’ve enjoyed and treasured.
While SQL may be the third most popular language in 2020, I haven’t run across many traditional developers that understand the nuances of each SQL platform, even one they’ve been “using” for their entire career. In most cases the culprit for this SQL illiteracy is an ORM. They are convenient and extremely helpful with the mundane tasks like selecting lists of objects or saving data without writing your own abstraction later. And this is often really good stuff because, after all, we’ve been taught over and over to abstract away the mundane. ORM’s help us do that.
But beyond select * from...
and INSERT INTO...
, things almost always start to fall apart at some point. It’s nearly impossible to create a well written query that takes advantage of specific SQL dialect and features when you’re trying to be all things to all people. Too often DBA’s are called in to debug a slow query (some bad enough that they bring down servers under load), only to find a query created by an ORM. Who doesn’t love un-nesting 15 levels of sub-select queries when a simple (and more readable) version using a CTE or CROSS APPLY
(SQL Server) or LATERAL JOIN
(PostgreSQL) would have done the job? Most of us have been there.
Let’s not even talk about PIVOT
and translating to crosstab
! 😃
So why do ORMs come to mind when thinking about Babelfish? Because I think it’s going magnify the SQL illiteracy of most developers making this transition. This is primarily a marketing effort and a way to bring down spend and make CFOs happy… initially.
Babelfish won’t help SQL Server developers understand the features of PostgreSQL better, or why portions of the app don’t work well after the transition (let alone how to improve them). They won’t understand the pluggable extension architecture that PostgreSQL thrives on. (trigram index anyone? Easy GIS queries? Time-series data?) It doesn’t point them to the best resources and trainers on the internet for PostgreSQL (in the last month alone, I’ve seen multiple requests that start, “Who is the Brent Ozar of Postgres?”)
There is also be plenty of nuance that goes the other way, too… features that the SQL Server community spends a lot of time teaching about and building consulting careers around. Clustered indexes (vs. MVCC)? Columnstore Indexes (“just use them!”)? Querystore (which AWS won’t replicate any time soon)? Built-in query plan visualization? In short, the SQL Server community has spent a lot of effort teaching and learning about the internals of the database query engine to create performant apps. Translating those features will take a lot of effort that I probably doesn’t interest AWS at all.
Instead, Babelfish will simply return an error if it can’t correctly figure out what to do. I suspect a lot of developers will be getting those errors.
In the end, nobody will be happy
As a result, I think we’ll have at least three different problem sets when these apps start to implement Babelfish:
- ORMs creating complex, (already?) poorly performing queries for SQL Server being translated to (poorly) performing PostgreSQL queries. Lose/lose for everyone
- Targeted, hand-crafted queries that perform well on SQL Server that are hard (or impossible) to translate to PostgreSQL
- Stored Procedure-based applications that will require manual updates and still not take advantage of PostgreSQL features for improved query performance – features that SQL Server just doesn’t have. (👋 SQL Server folks, did you know you can tune PostgreSQL planner values for functions and stored procedure?!?!)
Putting it all together, in the short-term (at least) we’ll have a few initial outcomes:
- More revenue for AWS and likely higher long-term costs for users as the need to scale up PostgreSQL instances increases in order to deal with inefficient queries.
- Frustrated SQL Server developers that assume PostgreSQL is an inferior database
- A PostgreSQL community that thinks SQL Server developers are inferior and “stupid” for not understanding how things really work. (To be honest, a lot of SQL Server devs may be shocked that there’s not as much #SQLFamily community in the #Postgres world. Sorry.)
As I said, I haven’t seen the deep-dive from AWS yet and I’m not currently using Aurora Postgres to give it a test run, so these are preliminary thoughts.
Be the solution! Level-up your SQL
In the end, my greatest desire when I lead a team or teach about a technology is for developers to better understand the tools and technology they use, enabling them to create awesome applications. I get no greater joy for others (or myself honestly) than when new insights are gained and core technology is understood.
If you’re a SQL Server dev that might get caught in this transition, spend some time getting to know PostgreSQL a little better. While I haven’t quite found Brent’s replacement yet, there are a lot of good places to start learning. Start with my old post on PostgreSQL tooling (Part 1 & Part 2) and have a look at the responses to this Reddit post.
If you’re a PostgreSQL dev or instructor, have patience with the folks that will be thrust into learning the basics of Postgres for the first time. Remember that they probably didn’t choose this transition for themselves!
Hopefully as more details emerge and the Babelfish implementation is better understood, the performance worry is misplaced. Hopefully it drives better understanding of both products – because they each have a lot to offer users depending on the use case.
Still, I’m afraid we’ll be missing an opportunity to level-up on skills, drive deeper understanding with developers, and increasing the ROI for all parties.
What are your thoughts?