The Redivis querying interface connects to a highly performant, parallelized data store — queries on terabytes of data can complete in seconds, often utilizing the resources of thousands of compute nodes.
However, there are some best practices that can help you avoid common performance pitfalls or bottlenecks:
Table writes are generally much slower than table reads — if your output table is exceptionally large, it may take the querying engine several minutes to materialize the output. Try restricting the number of rows returned by applying row filters to your transforms when possible, and be cognizant of joins that may substantially increase your record count. Avoid keeping variables that aren't needed.
When you are performing initial exploration, you may consider applying a
LIMITon your transform to reduce the output table size.
Each new variable adds to the computational complexity of the query; the new variable must be computed for every row in the table.
A common anti-pattern is to construct numerous boolean
CASEnew variables, and then use the result of these new variables in the row filter(s). If possible, it is far more efficient to inline the
CASElogic within the row filters, or within fewer new variables, as this allows for improved efficiency in logical short-circuiting.
When your query utilizes a non-union join, consider the order in which you are joining the data. The best practice is to place the largest table first, followed by the smallest, and then by decreasing size.
While the query optimizer can determine which table should be on which side of the join, it is still recommended to order your joined tables appropriately.
If all of your joins are
INNERjoins, the join order will have no impact on the final output. If your query leverages combinations of left / right / inner joins, the join order may affect your output; be careful in these cases.