Costs and optimization

Cost

All queries on Redivis incur a compute cost. Currently, there is no upper limit on compute usage, but users are encouraged to stay within a 1TB monthly compute limit (your current usage will be displayed on your workspace). At a future date, compute limits may be enforced, with increased compute allotments available at additional cost.

Before running a query, you can view its cost by hovering over the Run button

The cost of a query is determined based on the sum of the size of all variables that are referenced by the query. Variables are referenced whenever they are kept, used in a filter, or used to construct a new variable. Referencing a variable more than once will have no effect on cost.

Importantly, the computational complexity and the size of the output table have no effect on the compute cost. In order to reduce compute costs, two options available:

1. Limit the number of variables that are referenced by the query

For every variable referenced within a query, you will be charged for total size of that variable in the source table. By reducing the number of variables in your keep column, and / or limiting the variables referenced through the Build and Partition steps, you can substantially reduce query costs against big tables.

2. Reduce the size of the source table(s) of a query

The approach used to store different data types on Redivis can have an outsized impact on table size and subsequent queries and exports from those tables. In sophisticated cases, you may be able to reshape your data to achieve denser storage, but even in trivial examples you may be able to drastically reduce the size of a table by utilizing efficient variable types, and leveraging the fact that NULL values don't count have any storage cost.

For example, let's say we have a table with a billion records, and a dummy variable stored as an integer containing 0's and 1's. Each integer value contains 8 bytes, meaning that this variable alone is 8 bytes * 1 billion records = 8GB.

However, if we were to recode this as a boolean, replacing 0 and 1 with false and true , each cell would now only take up one byte, leading to a total variable size of 1GB - that's an 87% reduction, without any information loss!

We can achieve even further storage savings by recoding the false values to NULL . If we have 10% true values, and the rest false, we will now only be charged for those 100 million true records — 1 byte * 100 million records = 100MB. This is 90% smaller than our previous table, and almost 99% smaller than our original table — yet it preserves the same information and is generally just as usable.

Read the full breakdown of different variable types and their storage costs here.

Optimization

The Redivis querying interface connects to a highly performant, parallelized data store — queries in Terabytes of data can complete in seconds, often utilizing the resources of thousands of compute nodes.

However, there are some best practices that can help you avoid common performance pitfalls or bottlenecks:

1. Limit output table size

Table writes are generally much slower than table reads — if your output table is exceptionally large, it may take the querying engine several minutes to materialize the output. Try restricting the number of rows returned by applying row filters to your transforms when possible, and be cognizant of joins that may substantially increase your record count. Avoid keeping variables that aren't needed.

When you are performing initial exploration, you may consider applying a LIMIT on your transform to reduce the output table size. Note that utilizing a limit will have no impact on your transform's compute cost, but may improve its execution speed substantially.

2. Limit the number of new variables when possible

Each new variable adds to the computational complexity of the query; the new variable must be constructed for every row is not removed by the row filter(s).

A common anti-pattern is to construct numerous boolean CASE new variables, and then use the result of these new variables in the row filter(s). If possible, it is far more efficient to inline the CASE logic within the row filters, or within fewer new variables, as this allows for improved efficiency in logical short-circuiting.

3. Optimize join patterns

When your query utilizes a non-union join, consider the order in which you are merging the data. The best practice is to place the largest table first, followed by the smallest, and then by decreasing size.

While the query optimizer can determine which table should be on which side of the join, it is still recommended to order your joined tables appropriately.

If all of your joins are INNER joins, the join order will have no impact on the final output. If your query leverages combinations of left / right / inner joins, the join order may affect your output; be careful in these cases.