Query optimization


The Redivis querying interface connects to a highly performant, parallelized data store — queries on terabytes of data can complete in seconds, often utilizing the resources of thousands of compute nodes.
However, there are some best practices that can help you avoid common performance pitfalls or bottlenecks:

1. Limit output table size

Table writes are generally much slower than table reads — if your output table is exceptionally large, it may take the querying engine several minutes to materialize the output. Try restricting the number of rows returned by applying row filters to your transforms when possible, and be cognizant of joins that may substantially increase your record count. Avoid keeping variables that aren't needed.
When you are performing initial exploration, you may consider applying a LIMIT on your transform to reduce the output table size.

2. Limit the number of new variables when possible

Each new variable adds to the computational complexity of the query; the new variable must be computed for every row in the table.
A common anti-pattern is to construct numerous boolean CASE new variables, and then use the result of these new variables in the row filter(s). If possible, it is far more efficient to inline the CASE logic within the row filters, or within fewer new variables, as this allows for improved efficiency in logical short-circuiting.

3. Optimize join patterns

When your query utilizes a non-union join, consider the order in which you are joining the data. The best practice is to place the largest table first, followed by the smallest, and then by decreasing size.
While the query optimizer can determine which table should be on which side of the join, it is still recommended to order your joined tables appropriately.
If all of your joins are INNER joins, the join order will have no impact on the final output. If your query leverages combinations of left / right / inner joins, the join order may affect your output; be careful in these cases.
Last modified 11mo ago