Redivis Documentation
API DocumentationRedivis Home
  • Introduction
  • Redivis for open science
    • FAIR data practices
    • Open access
    • Data repository characteristics
    • Data retention policy
    • Citations
  • Guides
    • Getting started
    • Discover & access data
      • Discover datasets
      • Apply to access restricted data
      • Create a study
    • Analyze data in a workflow
      • Reshape data in transforms
      • Work with data in notebooks
      • Running ML workloads
      • Example workflows
        • Analyzing large tabular data
        • Create an image classification model
        • Fine tuning a Large Language Model (LLM)
        • No-code visualization
        • Continuous enrollment
        • Select first/last encounter
    • Export & publish your work
      • Export to other environments
      • Build your own site with Observable
    • Create & manage datasets
      • Create and populate a dataset
      • Upload tabular data as tables
      • Upload unstructured data as files
      • Cleaning tabular data
    • Administer an organization
      • Configure access systems
      • Grant access to data
      • Generate a report
      • Example tasks
        • Emailing subsets of members
    • Video guides
  • Reference
    • Your account
      • Creating an account
      • Managing logins
      • Single Sign-On (SSO)
      • Workspace
      • Studies
      • Compute credits and billing
    • Datasets
      • Documentation
      • Tables
      • Variables
      • Files
      • Creating & editing datasets
      • Uploading data
        • Tabular data
        • Geospatial data
        • Unstructured data
        • Metadata
        • Data sources
        • Programmatic uploads
      • Version control
      • Sampling
      • Exporting data
        • Download
        • Programmatic
        • Google Data Studio
        • Google Cloud Storage
        • Google BigQuery
        • Embedding tables
    • Workflows
      • Workflow concepts
      • Documentation
      • Data sources
      • Tables
      • Transforms
        • Transform concepts
        • Step: Aggregate
        • Step: Create variables
        • Step: Filter
        • Step: Join
        • Step: Limit
        • Step: Stack
        • Step: Order
        • Step: Pivot
        • Step: Rename
        • Step: Retype
        • Step: SQL query
        • Variable selection
        • Value lists
        • Optimization and errors
        • Variable creation methods
          • Common elements
          • Aggregate
          • Case (if/else)
          • Date
          • DateTime
          • Geography
          • JSON
          • Math
          • Navigation
          • Numbering
          • Other
          • Statistical
          • String
          • Time
      • Notebooks
        • Notebook concepts
        • Compute resources
        • Python notebooks
        • R notebooks
        • Stata notebooks
        • SAS notebooks
        • Using the Jupyter interface
      • Access and privacy
    • Data access
      • Access levels
      • Configuring access
      • Requesting access
      • Approving access
      • Usage rules
      • Data access in workflows
    • Organizations
      • Administrator panel
      • Members
      • Studies
      • Workflows
      • Datasets
      • Permission groups
      • Requirements
      • Reports
      • Logs
      • Billing
      • Settings and branding
        • Account
        • Public profile
        • Membership
        • Export environments
        • Advanced: DOI configuration
        • Advanced: Stata & SAS setup
        • Advanced: Data storage locations
        • Advanced: Data egress configuration
    • Institutions
      • Administrator panel
      • Organizations
      • Members
      • Datasets
      • Reports
      • Settings and branding
    • Quotas and limits
    • Glossary
  • Additional Resources
    • Events and press
    • API documentation
    • Redivis Labs
    • Office hours
    • Contact us
    • More information
      • Product updates
      • Roadmap
      • System status
      • Security
      • Feature requests
      • Report a bug
Powered by GitBook
On this page
  • Default (free) notebooks
  • Custom compute configurations
  • Maximizing notebook performance

Was this helpful?

Export as PDF
  1. Reference
  2. Workflows
  3. Notebooks

Compute resources

Last updated 6 months ago

Was this helpful?

Notebooks on Redivis provide a highly flexible computational environment. Notebooks can be used for anything from quick visualizations to training sophisticated ML models on a large corpus of data.

Understanding the compute resources available, and when to modify which parameters, can help you take full and efficient advantage of the high-performance computing resources on Redivis.

Default (free) notebooks

The default notebook configuration on Redivis is always free, and provides a performant environment for working with most datasets. The computational resources in the default notebook are comparable to a typical personal computer, though likely with substantially better network performance.

The default free notebook configuration offers:

  • 2 vCPUs (Intel Ice Lake or Cascade Lake)

  • 32GB RAM

  • 100GB SSD:

    • IOPS: 170,000 read | 90,000 write

    • Throughput: 660MB/s read | 350MB/s write

  • 16Gbps networking

  • No GPU (see below)

  • 6 hr max duration

  • 30min idle timeout (no code is being written or executed)

Custom compute configurations

For scenarios where you need additional computational resources, you can choose a custom compute configuration for your notebook. This enables you to specify CPU, memory, GPU, and hard disk resources, while also giving you control over the notebook's max duration and idle timeout.

In order to customize the compute configuration for your notebook, click the Edit compute configuration button in the notebook start modal or toolbar.

Custom machine types

These machine types are classified by four high-level compute platforms: General purpose, memory optimized, compute optimized, and GPU. Choose the platform, and machine type therein, that is mose appropriate for your workload.

When your notebook is running, make sure to keep an eye on the utilization widgets to see how much CPU / RAM / GPU your notebook is actually using. This can help inform whether your code is actually taking example of all of the compute resources.

For example, if your code is single-threaded, adding more CPUs won't do much to improve performance. Similarly, you might need to adjust your model training and inference workflows to take advantage of more than one GPU.

If you find that you've under- or over-provisioned resources, you can simply update your machine configuration and restart the notebook.

Custom machine costs

All custom machines have an associated hourly cost (charged by the second). This cost is determined by the then-current price for that machine configuration on Google Cloud.

Maximizing notebook performance

All notebooks on Redivis use either Python, R, Stata, or SAS. While Redivis notebooks are highly performant and scalable, the coding paradigms in these languages can introduce bottlenecks when working with very large tabular data. If you are running into issues with performance we suggest:

  • Adjust the compute resources in your notebook. This may help to resolve these bottlenecks depending on what is causing them!

Some quick rules of thumb:

  • < 1GB: probably doesn't matter, use what suits you!

  • 1-10GB: probably fine for a notebook, though a transform might be faster.

  • 10-100GB: maybe doable in a notebook, but you'll want to make sure to apply the right programming methodologies. Try to pre-cut your data if you can.

Redivis makes available nearly . These machines can scale from small servers all the way to massively powerful VMs with thousands of cores, terabytes of memory, and dozens of state-of-the-art GPUs.

In order to run a custom machine, you must first purchase , and have enough credits to run the notebook for at least 15 minutes. If you run low on credits and don't have configured, you will receive various alerts as your credits run low, and ultimately the notebook will shut down when you are out of credits.

Use to clean and reduce the size of your data before analyzing them further in a notebook. When possible, this will often be the most performant and cost-efficient approach.

Adjust your programming model to load data lazily or on-disk to exceeding memory limits. See our suggestions for working with larger tables in , , , and .

>100GB: You should probably cut the data first in a , unless if you really know what you're doing.

every machine type available on Google Cloud
transforms
transform
custom environments
compute credits
Choose from hundreds of machine types to allocate additional resources to your notebook
credit auto-purchase
SAS
Stata
R
Python