Igor's Techno Club

My Thoughts On BigQuery's Pipe Query Syntax

Bigquery Pipe Syntax

BigQuery's recent announcement of its public preview of a new SQL pipe syntax has stirred up a lot of buzz within the data community. To those deeply familiar with SQL, this might seem like a relatively incremental change. But to someone like me – admittedly not a SQL expert – it feels like a potential shift in how we interact with and manipulate data.

World Without Subqueries

Traditional SQL can sometimes feel like a labyrinth of nested subqueries and CTEs (Common Table Expressions), which are difficult to read and even harder to debug. The promise of pipe syntax lies in its ability to streamline this process, offering a more sequential, almost "step-by-step" approach to data transformation.

FROM orders
|> WHERE order_date >= '2024-01-01'
|> AGGREGATE SUM(order_amount) as total_spent GROUP BY customer_id
|> WHERE total_spent > 1000
|> INNER JOIN customers USING(customer_id)
|> CALL ENRICH.APOLLO(EMAIL > customers.email)
|> AGGREGATE COUNT(*) high_value_customer GROUP BY company.country

The idea is simple: take your raw data (FROM orders), apply a filter (WHERE order_date >= '2024-01-01'), then aggregate, join, and so on, all linked together by the |> operator, which acts as a sort of "data pipeline." This approach is quite common for functional languages like F# or Elixir, but I haven't seen being applied to SQL-like languages.

Where Could This Be Useful?

Here's where I see the potential for this new syntax to be exciting, even for those of us who don't live and breathe SQL daily:

More Intuitive Data Exploration

What if you could explore your data like you'd explore a river, just "going with the flow," adding transformations and filters as you go? Pipe syntax seems to offer exactly that, reducing the mental overhead needed to construct and understand complex queries.

Making Data Engineering Jobs More Accessible

If SQL becomes marginally easier to read and write, could that enable a wider range of people to engage in basic data analysis?

Streamlining Data Pipelines

Could pipe syntax simplify the construction and maintenance of complex data pipelines, reducing the need for external tools and frameworks? Perhaps, it can improve the speed and accuracy of processes.

Code Generation and AI Tools

As AI-powered coding tools become more prevalent, pipe syntax might be easier for these tools to generate and modify. The linear nature of the syntax could make it simpler for AI to understand the intent and suggest modifications or optimizations.

Visual Query Builders

Future tools might be able to better visualize and manipulate queries written in pipe syntax, as the step-by-step nature could translate well to visual programming interfaces. Each pipe operation could be represented as a block or node in a visual flow.

Potential Downsides and Open Questions

Readability in Complex Scenarios

Could very long and complex pipe chains become difficult to understand, negating the supposed benefits of simplicity? Could breaking those chains into smaller subqueries be a better alternative? Is it better to name those more complex, simpler queries?

Learning Curve for SQL Veterans

How will seasoned SQL developers adapt to this new paradigm? Will they embrace it, or view it as an unnecessary complication?

The Bottom Line

BigQuery's pipe query syntax feels like a step toward a potentially more intuitive and accessible way to manipulate data. While it's still early days, the potential benefits for data exploration, pipeline construction, and empowering a broader audience are intriguing. I'm curious to see how this new syntax evolves and whether it truly does change the way we interact with data in the future

#dataengineering