Notebooks

Overview

A notebook node in a Redivis project provides an integrated environment for performing data analysis, connecting Redivis tables to the scientific compute stack in Python, R, SAS, and Stata. Redivis notebooks are built on top of Jupyter notebooks and the .ipynb notebook format.
From within a notebook, you will be able to query and analyze any data within your project. Importantly, because data referenced in a notebook never leaves Redivis, you can securely analyze data that would otherwise be bound by export restrictions.
Visualize and analyze data in Redivis notebooks

Notebook management

Create a notebook by clicking on any table in a project you have edit access to and selecting + Notebook. By default this notebook will come with code that references the table it was created from.

Starting a notebook

Notebook nodes need to be started in order to edit or execute cells. When first clicking on a notebook node, you will see a read-only view of its contents (including cell outputs). Click the purple Start notebook button in the top right section to connect this notebook to compute resources.

Collaboration

All Redivis notebooks support real-time collaboration, allowing multiple editors to edit and run cells in a running notebook. When another editor is active in a notebook, you will see a colored cursor associated with them (much like a Google Doc).

Compute resources

Notebooks are currently deployed with access to 2 CPUs and 8GB working memory, alongside a 50GB hard disk. These values may fluctuate based on system utilization. As a rule of thumb, notebooks will easily support interactive analysis of tables up to ~1GB; if your table is larger, try reducing it first by creating a transform.

Installing dependencies

All notebooks come with a number of common packages pre-installed, depending on the notebook type. Each notebook can also be extended with a dependencies startup script, which contains a series of commands that are run within the notebook container on startup. These commands are executed within the system shell; for example:
1
# Install python packages via pip (or conda)
2
pip install plotly
3
4
# Install R packages by invoking R directly
5
R -e 'install.packages("vioplot")'
Copied!
If your notebook references data with export restrictions, the dependencies script is the only time the notebook will be able to connect to the outside internet.

Lifecycle

Notebooks have a maximum lifetime of 6 hours, and after 30 minutes of inactivity, any running notebook will automatically be stopped. All notebooks are automatically saved as they are edited.

Restoring state

Notebook state (e.g., stored variables) is persisted between restarts, allowing you to resume work where it was left off without needing to re-run all previous cells. This is implemented via the dill library in python, and the save.image() command in R. For python notebooks, there are certain documented limitations with the dill library, and in some scenarios you may still need to rerun your cells when starting your notebook.

Exporting your notebook

Notebooks can be downloaded as PDF, HTML, and .ipynb files by clicking the three-dot More button at the top right of the notebook. You will be given the option of whether to include cell outputs in your export — it is important that you ensure the outputs displayed in your notebook do not contain sensitive data, and that your subsequent distribution is in compliance with any data use agreements.

Access control

Source tables

As you reference tables in your notebook, you will see corresponding linkages appear in the project tree. Access to your notebook is subsequently determined based on access to the data in all of these source tables. In order to reset your notebook's source tables, you can restart the kernel while the notebook is running, or click More -> Clear cell outputs while the notebook is stopped.

Access levels

In order to view a notebook, you must first have view access to the corresponding project, and in order to run and edit the notebook, you must also have edit access to that project.
Additionally, your access to a notebook is governed by your access to its source tables. In order to run a notebook and see its outputs, you must have data access to all source tables. If you have metadata access, you will be able to see cell inputs in a notebook (that is, the code), but not outputs. If you only have overview (or no) access to the source tables, you will not be able to see notebook contents.
If you are an editor of the notebook and have lost access to the source table(s), you can easily restore visibility into the notebook by selecting Clear cell outputs. This will disconnect the notebook from all source tables and clear its outputs, while restoring your access to the notebook and its code.

External internet access

If a notebook contains data with export restrictions, internet access within the running notebook will be disabled. You can still install packages and other dependencies by customizing the notebook dependency script.

Notebook types

Python

Python notebooks are based off the jupyter/scipy-notebook image, which contains a variety of common scientific packages for Python. The redivis-python library is also pre-installed.
1
# max_results, variables parameters are optional
2
df = redivis.table("my-table").to_dataframe(max_results=100, variables=["var1", "var2"])
Copied!

R

R notebooks are based off the jupyter/r-notebook image, which contains tidyverse packages and other common scientific packages for R. The redivis-r library is also pre-installed.
1
# max_results parameter is optional
2
data <- redivis::table('my-table')$to_tibble(max_results=100)
Copied!

Stata

In order to use Stata notebooks, you must have a valid license for Stata version 17, or be a member of an organization that provides such a license. You can input this license information in the Settings tab of your workspace.
Stata notebooks are built on top of Python notebooks, and come with the officially-supported pystata package pre-installed and configured. This package provides a %%stata magic command (among others), allowing you to pass data from a python dataframe and execute arbitrary stata code.
For example:

SAS

In order to use SAS notebooks, you must have a valid license for SAS 9.4, or be a member of an organization that provides such a license. Please contact us if you would like to add a SAS license.
SAS notebooks are built on top of Python notebooks, and come with the officially-supported saspy package pre-installed and configured. This package provides an interface to pass data from a python dataframe and execute arbitrary SAS code.
Last modified 1mo ago