Notebook concepts
Last updated
Was this helpful?
Last updated
Was this helpful?
Notebooks provide a highly flexible compute environment for working with data on Redivis. In a notebook, you can in your workflow, , perform analyses in , store and download files, and generate an output table for downstream analysis.
Redivis notebooks support the following kernels (programming languages). For more details and examples on how to use notebooks in each language, consult the language-specific documentation:
A notebook can generate an output table as the result of its execution. This output table is created programmatically, e.g.:
As you perform your analysis, you may generate files that are stored on the notebook's hard disk. There are two locations that you should write files to: /out
for persistent storage, and /scratch
for temporary storage.
Any files written to persistent storage will be available when the notebook is stopped, and will be restored to the same state when the notebook is run again. Alternatively, any files written to temporary storage will only exist for the duration of the current notebook session.
To write files to these directories, use the standard tools of your programming language for writing files. E.g.,:
All notebooks are automatically saved as you go. Every time a notebook is stopped, all cell inputs are saved to the notebook version history, giving you a historical record of all code that was run. Additionally, all cell outputs from the last notebook session will be preserved, as will any files written to the /out
directory.
When starting a notebook, you'll be presented with the option to "Clear all outputs and start". This can be helpful in that it will reset all access rules associated with the notebook, since there is no longer any data associated with the notebook.
Choosing this option will clear all output cells in your notebook, any files saved in the /out
directory, and any output tables from the notebook.
You can click the three-dot More menu to open the logs for this notebook. Opening the logs when a notebook is stopped will show the logs from the notebook's previous run.
Activity is determined based on the Jupyter kernel — if you have a long-running computation, the notebook will be considered as active for the entire time.
All Redivis notebooks support real-time collaboration, allowing multiple editors to edit and run cells in a running notebook. When another editor is active in a notebook, you will see a colored cursor associated with them. Workflow viewers will see a read-only version of the notebook.
To change a notebook's primary source table, either right-click on the notebook or click the three-dot (âµ—
) icon and select the "change source table" option.
All notebooks come with a number of common packages pre-installed. You can install additional packages by clicking the Edit dependencies button in the notebook start modal or toolbar.
For more detailed information about the default dependencies and adding new packages, consult the documentation for your notebook type:
Moreover, files written to the /out
directory are always available, and will persist across notebook sessions. This allows for workflows where you can cache certain results between notebook sessions, avoiding the need to rerun time-intensive computations.
The files in the /scratch
directory are only available when the notebook is running, and will be cleared once it is stopped. The default "working directory" of all notebooks is /scratch
– this is where files will be written if you do not specify another location.
You can view the files in either directory by pressing the Files button at the top right of the notebook.
Every time you stop your notebook, all cell inputs (your code and markdown) will be saved and associated with that notebook session. You can view the code from all previous sessions by pressing the History button at the top right of your notebook, allowing you to view and share your code as it was any previous point in time.
In order to view a notebook, you must first have view access to the corresponding workflow, and in order to run and edit the notebook, you must also have edit access to that workflow.
If a notebook contains data with export restrictions, access to the external internet will be disabled while the notebook is running.
When the internet is disabled in a notebook you can still specify packages and other startup scripts in the Dependencies modal that will be installed on notebook start. Additionally, if any of your packages require internet access to run, you'll need to attempt to "preload" any content using a post-install script. For example, if you're using the tidycensus
package in R, you could preload content as follows:
Notebooks can be downloaded as PDF, HTML, and .ipynb files by clicking the three-dot More button at the top right of the notebook.
You will be given the option of whether to include cell outputs in your export — it is important that you ensure the outputs displayed in your notebook do not contain sensitive data, and that your subsequent distribution is in compliance with any data use agreements.
From within your notebook, you can load any data available in your workflow. You can reference the primary source table of the notebook via the special _source_
identifier, or reference any other table in the workflow by its name. To ensure that your notebook doesn't break when tables get renamed, make sure to use the for non-primary tables. For example:
You can these files anytime.
Create a notebook by clicking on a in a workflow and selecting + Notebook. This table will become the default source table for your new notebook and will have pre-generated code that references the table's data.
Notebook nodes need to be started in order to edit or execute cells. Click the purple Start notebook button in the top right to start the notebook and provision compute resources. You can also elect to "Clear outputs and start", which will remove all outputs and reset any in the notebook.
By default notebooks are provisioned with 32GB memory and 2 CPU cores, with compute power comparable to most personal computers. You can view and alter the notebook's in the More menu.
The default notebooks have a maximum lifetime of 6 hours, and after 30 minutes of inactivity, any running notebook will automatically be stopped. If you are use a notebook with , these values can be modified.
Notebooks are subject to certain .
Notebooks offer special capabilities for on the notebook's hard disk. Any files you've stored in a notebook's /out
and /scratch
directories will be available in the files modal. This modal can allow you to preview and download specific file outputs from your notebook.
You can list files in either directory by pressing the corresponding tab, and click on any file to view it. Redivis supports interactive previews for many in the file inspector, and you can also download the file for further inspection and analysis. To download all files in a directory, click the Download all button in the files modal.
Your access to a notebook is determined by your corresponding access to all tables (and their antecedent datasets) referenced by the notebook. These linkages persist across notebook sessions, as a future session could reference data from a previous session. In order to reset the tables referenced by your notebook, which will also clear all outputs in the notebook, you can choose to Clear outputs and start when .
Additionally, your access to a notebook is governed by your to its source tables. In order to run a notebook and see its outputs, you must have data access to all source tables. If you have metadata access, you will be able to see cell inputs in a notebook (that is, the code), but not outputs. If you only have overview (or no) access to the source tables, you will not be able to see notebook contents.
Typically, you will be able to download any files written to the notebooks /out
or /scratch
directories. However, if a notebook references data with export restrictions, you will not be able to download these files, unless the file size is smaller than the relevant specified on source datasets.