Compute resources
Notebooks on Redivis provide a highly flexible computational environment. Notebooks can be used for anything from quick visualizations to training sophisticated ML models on a large corpus of data.
Understanding the compute resources available, and when to modify which parameters, can help you take full and efficient advantage of the high-performance computing resources on Redivis.
Default (free) notebooks
The default notebook configuration on Redivis is always free, and provides a performant environment for working with most datasets. The computational resources in the default notebook are comparable to a typical personal computer, though likely with substantially better network performance.
The default free notebook configuration offers:
2 vCPUs (Intel Ice Lake or Cascade Lake)
32GB RAM
100GB SSD:
IOPS: 170,000 read | 90,000 write
Throughput: 660MB/s read | 350MB/s write
16Gbps networking
No GPU (see custom environments below)
6 hr max duration
30min idle timeout (no code is being written or executed)
Custom compute configurations
For scenarios where you need additional computational resources, you can choose a custom compute configuration for your notebook. This enables you to specify CPU, memory, GPU, and hard disk resources, while also giving you control over the notebook's max duration and idle timeout.
In order to customize the compute configuration for your notebook, click the Edit compute configuration button in the notebook start modal or toolbar.
Custom machine types
Redivis makes available nearly every machine type available on Google Cloud. These machines can scale from small servers all the way to massively powerful VMs with thousands of cores, terabytes of memory, and dozens of state-of-the-art GPUs.
These machine types are classified by four high-level compute platforms: General purpose, memory optimized, compute optimized, and GPU. Choose the platform, and machine type therein, that is mose appropriate for your workload.
When your notebook is running, make sure to keep an eye on the utilization widgets to see how much CPU / RAM / GPU your notebook is actually using. This can help inform whether your code is actually taking example of all of the compute resources.
For example, if your code is single-threaded, adding more CPUs won't do much to improve performance. Similarly, you might need to adjust your model training and inference workflows to take advantage of more than one GPU.
If you find that you've under- or over-provisioned resources, you can simply update your machine configuration and restart the notebook.
Custom machine costs
All custom machines have an associated hourly cost (charged by the second). This cost is determined by the then-current price for that machine configuration on Google Cloud.
In order to run a custom machine, you must first purchase compute credits, and have enough credits to run the notebook for at least 15 minutes. If you run low on credits and don't have credit auto-purchase configured, you will receive various alerts as your credits run low, and ultimately the notebook will shut down when you are out of credits.
Maximizing notebook performance
All notebooks on Redivis use either Python, R, Stata, or SAS. While Redivis notebooks are highly performant and scalable, the coding paradigms in these languages can introduce bottlenecks when working with very large tabular data. If you are running into issues with performance we suggest:
Use transforms to clean and reduce the size of your data before analyzing them further in a notebook. When possible, this will often be the most performant and cost-efficient approach.
Adjust the compute resources in your notebook. This may help to resolve these bottlenecks depending on what is causing them!
Some quick rules of thumb:
< 1GB: probably doesn't matter, use what suits you!
1-10GB: probably fine for a notebook, though a transform might be faster.
10-100GB: maybe doable in a notebook, but you'll want to make sure to apply the right programming methodologies. Try to pre-cut your data if you can.
>100GB: You should probably cut the data first in a transform, unless if you really know what you're doing.
Last updated