A notebook node in a Redivis project provides an integrated environment for performing data analysis, connecting Redivis tables to the scientific compute stack in Python, R, SAS, and Stata. Redivis notebooks are built on top of Jupyter notebooks and the
.ipynb notebook format.
From within a notebook, you will be able to query and analyze any data within your project. Importantly, because data referenced in a notebook never leaves Redivis, you can securely analyze data that would otherwise be bound by export restrictions.
Create a notebook by clicking on any table in a project you have edit access to and selecting + Notebook. By default this notebook will come with code that references the table it was created from.
Notebook nodes need to be started in order to edit or execute cells. When first clicking on a notebook node, you will see a read-only view of its contents (including cell outputs). Click the purple Start notebook button in the top right section to connect this notebook to compute resources.
All Redivis notebooks support real-time collaboration, allowing multiple editors to edit and run cells in a running notebook. When another editor is active in a notebook, you will see a colored cursor associated with them (much like a Google Doc).
Notebooks are currently deployed with access to 2 CPUs and 8GB working memory, alongside a 50GB hard disk. These values may fluctuate based on system utilization. As a rule of thumb, notebooks will easily support interactive analysis of tables up to ~1GB; if your table is larger, try reducing it first by creating a transform.
All notebooks come with a number of common packages pre-installed, depending on the notebook type. Each notebook can also be extended with a dependencies startup script, which contains a series of commands that are run within the notebook container on startup. These commands are executed within the system shell; for example:
# Install python packages via pip (or conda)pip install plotly# Install R packages by invoking R directlyR -e 'install.packages("vioplot")'
Notebooks have a maximum lifetime of 6 hours, and after 30 minutes of inactivity, any running notebook will automatically be stopped. All notebooks are automatically saved as they are edited.
Notebook state (e.g., stored variables) is persisted between restarts, allowing you to resume work where it was left off without needing to re-run all previous cells. This is implemented via the
dill library in python, and the
save.image() command in R. For python notebooks, there are certain documented limitations with the dill library, and in some scenarios you may still need to rerun your cells when starting your notebook.
Notebooks can be downloaded as PDF, HTML, and .ipynb files by clicking the three-dot More button at the top right of the notebook. You will be given the option of whether to include cell outputs in your export — it is important that you ensure the outputs displayed in your notebook do not contain sensitive data, and that your subsequent distribution is in compliance with any data use agreements.
As you reference tables in your notebook, you will see corresponding linkages appear in the project tree. Access to your notebook is subsequently determined based on access to the data in all of these source tables. In order to reset your notebook's source tables, you can restart the kernel while the notebook is running, or click More -> Clear cell outputs while the notebook is stopped.
In order to view a notebook, you must first have view access to the corresponding project, and in order to run and edit the notebook, you must also have edit access to that project.
Additionally, your access to a notebook is governed by your access to its source tables. In order to run a notebook and see its outputs, you must have data access to all source tables. If you have metadata access, you will be able to see cell inputs in a notebook (that is, the code), but not outputs. If you only have overview (or no) access to the source tables, you will not be able to see notebook contents.
If you are an editor of the notebook and have lost access to the source table(s), you can easily restore visibility into the notebook by selecting Clear cell outputs. This will disconnect the notebook from all source tables and clear its outputs, while restoring your access to the notebook and its code.
If a notebook contains data with export restrictions, internet access within the running notebook will be disabled. You can still install packages and other dependencies by customizing the notebook dependency script.
# max_results, variables parameters are optionaldf = redivis.table("my-table").to_dataframe(max_results=100, variables=["var1", "var2"])
# max_results parameter is optionaldata <- redivis::table('my-table')$to_tibble(max_results=100)
In order to use Stata notebooks, you must have a valid license for Stata version 17, or be a member of an organization that provides such a license. You can input this license information in the Settings tab of your workspace.
Stata notebooks are built on top of Python notebooks, and come with the officially-supported pystata package pre-installed and configured. This package provides a
%%stata magic command (among others), allowing you to pass data from a python dataframe and execute arbitrary stata code.
In order to use SAS notebooks, you must have a valid license for SAS 9.4, or be a member of an organization that provides such a license. Please contact us if you would like to add a SAS license.
SAS notebooks are built on top of Python notebooks, and come with the officially-supported saspy package pre-installed and configured. This package provides an interface to pass data from a python dataframe and execute arbitrary SAS code.