More

roaramburu · on April 18, 2024

Full Disclosure: I'm a cofounder at Voltron Data.

This is such a good post. I'm pretty humbled by your words about us being "everywhere that's of interest" and that "we're highly respected." It's hard to see that when you're in the weeds, so I just wanted to say I appreciate it.

Regarding proprietary…I get it. I was the CEO of BlazingSQL, and we were fully OSS with an open-core model. The number of Fortune 500 customers that were deploying us at scale but not paying us in money, feedback, or testimonials was honestly heartbreaking.

When Josh (our CEO) and I were in the early days of Voltron Data, we thought maybe we could hold ourselves accountable to the open-source community with a new model, which we now call open-periphery, where, as you said, the interchanges, standards, and protocols are open, allowing companies and developers to build resilient, evolvable data stacks.

Open-periphery also means we don't have to debate what goes back to the community and what goes into the proprietary code because there is such a clear delineation. Open-periphery is our way of thinking about OSS business models, and it's the solution we came up with to ensure we can continue to invest in open-source and next-generation query engines.

roaramburu · on Sept 14, 2020

Howdy, full disclosure I'm the CEO at BlazingSQL (BSQL).

I'm not incredibly familiar with Ares save the linked article, but we aren't a DBMS or manage data in any way.

BlazingSQL is a SQL engine, it's easier to think of it similar to SparkSQL, Presto, Drill, etc.

We're core contributors to RAPIDS cuDF (CUDA DataFrame), which is a Pyhton and C++ library for Apache Arrow in-GPU memory. The Python library follows a pandas-like API, and the compute kernels are in C/C++.

BSQL binds to the same C++ as the pandas-like cuDF. What this enables users to do is interact with a DataFrame with either SQL or pandas depending on their needs or preferences. This interoperability means that the rest of the RAPIDS stack can be applied to a variety of different use cases (data viz, ML, Graph, Signal Processing, DL, etc), with the same DataFrame.

The DataFrame also has performant libraries for IO, Joins, Aggregations, Math operations, and more.

Here is an example of running a query on ~1TB on a single GPU in under 9 minutes. The data was stored on AWS S3 in Apache Parquet. https://twitter.com/blazingsql/status/1303370102348361729

Here is an example of scaling that same query up to 32 GPUs and running it in 16 seconds. https://twitter.com/blazingsql/status/1304450203030880257

Again, think of BSQL as a query engine, that runs queries on data wherever and however you have it. Here is a BSQL user running 1-2 minute queries on 1.5TB of CSV files using 2 GPUs. https://twitter.com/tomekdrabas/status/1303824164273270789

Let me know if that helps at all (or not).

roaramburu · on May 14, 2020

GoAi was to get GPU developers on the same page and to work together to build an ecosystem for analytics on GPUs.

RAPIDS is a project that was born out of GoAi to bring that ecosystem to Python.

It is built on Apache Arrow (although on GPU memory), and has many of the original GoAi members like my team, BlazingSQL, and others such as Anaconda, Nvidia, and many MANY others.

roaramburu · on Aug 7, 2019

So, the part that confuses me with this argument is we live in an Intel world where they have 98% market share in servers. So we're already at the whim of a single company. Why not challenge that dominance?

craftyguy · on Aug 7, 2019

Not the same. Two companies make x86 processors, and in the very specific case of this article/comment thread, more than one company supports OpenCL. Nvidia/cuda is a one-pony show, no matter how you look at it.

roaramburu · on Aug 5, 2019

Distributed is getting released in the next few days, I've been playing with it over the past week.

Right now we use k8s on Google K8s Engine(GKE) to deploy in distributed mode.

We don't supported nested at present, there are Rapids teams looking into this.

roaramburu · on Aug 5, 2019

Tons of benchmarks at blog.blazingdb.com

Check it out, it's fast.

roaramburu · on Aug 5, 2019

Yeah, we were totally ignorant on PartiQL until your post. Although now looking at it, looks boss! Totally agrees with many of our theses, and there looks to be a lot to glean from this project as well.

roaramburu · on Aug 5, 2019

Also! We have a guy in Lima pushing out some GIS work into cuDF and BlazingSQL too.

roaramburu · on Aug 5, 2019

Not a stupid question. The reason is priorities, but definitely our ideal to do predicate push down and join databases to files, streams, etc.

roaramburu · on May 15, 2019

FYI you can run the queries in this post through the Google Colab link. I work at BlazingDB btw. https://colab.research.google.com/drive/1EbPE9FwFur7fE2054BH...