Notebooks provide a flexible computation environment for data analysis, connecting Redivis tables to the scientific compute stack in Python, R, Stata, and SAS. Redivis notebooks are built on top of Jupyter notebooks and the .ipynb
notebook format.
From within a notebook, you will be able to query and analyze any table within your workflow, and (optionally) generate an output table from the notebook. Because data referenced in a notebook never leaves Redivis, you can securely analyze data that would otherwise be bound by export restrictions.
As you work in a workflow, nodes colors and symbols will change on the tree view to help you keep track of your work progress.
Starting
Spinning circular icon
The notebook is being provisioned and will be available for use once it has started.
Running
Purple dot on the node corner
The notebook has started and cells can be executed. It will continue running until it is stopped or times out.
Stopped
Grey background
The notebook has stopped and only a read-only view is available.
Errored
Red exclamation icon
An error has occurred during the startup process or an issue has happened requiring a restart.
Stale
Yellow background
An upstream table has been changed since the notebook but last started.
Incomplete access
All black background, or dashed borders
You don't have full access the node. Click on the Incomplete access button in the top bar to begin applying for access to the relevant datasets.
Notebooks provide a highly flexible compute environment for working with data on Redivis. In a notebook, you can reference any table in your workflow, install dependencies, perform analyses in Python, R, Stata, or SAS, store and download files, and generate an output table for downstream analysis.
From within your notebook, you can load any data available in your workflow. You can reference the primary source table of the notebook via the special _source_
identifier, or reference any other table in the workflow by its name. To ensure that your notebook doesn't break when tables get renamed, make sure to use the qualified reference for non-primary tables. For example:
Redivis notebooks support the following kernels (programming languages). For more details and examples on how to use notebooks in each language, consult the language-specific documentation:
Python notebooksR notebooksStata notebooksSAS notebooksA notebook can generate an output table as the result of its execution. This output table is created programmatically, e.g.:
As you perform your analysis, you may generate files that are stored on the notebook's hard disk. There are two locations that you should write files to: /out
for persistent storage, and /scratch
for temporary storage.
Any files written to persistent storage will be available when the notebook is stopped, and will be restored to the same state when the notebook is run again. Alternatively, any files written to temporary storage will only exist for the duration of the current notebook session.
To write files to these directories, use the standard tools of your programming language for writing files. E.g.,:
You can inspect and download these files anytime.
Create a notebook by clicking on a table node in a workflow and selecting + Notebook. This table will become the default source table for your new notebook and will have pre-generated code that references the table's data.
Notebook nodes need to be started in order to edit or execute cells. Click the purple Start notebook button in the top right to start the notebook and provision compute resources. You can also elect to "Clear outputs and start", which will remove all outputs and reset any referenced tables in the notebook.
By default notebooks are provisioned with 32GB memory and 2 CPU cores, with compute power comparable to most personal computers. You can view and alter the notebook's compute resources in the More menu.
All notebooks are automatically saved as you go. Every time a notebook is stopped, all cell inputs are saved to the notebook version history, giving you a historical record of all code that was run. Additionally, all cell outputs from the last notebook session will be preserved, as will any files written to the /out
directory.
When starting a notebook, you'll be presented with the option to "Clear all outputs and start". This can be helpful in that it will reset all access rules associated with the notebook, since there is no longer any data associated with the notebook.
Choosing this option will clear all output cells in your notebook, any files saved in the /out
directory, and any output tables from the notebook.
You can click the three-dot More menu to open the logs for this notebook. Opening the logs when a notebook is stopped will show the logs from the notebook's previous run.
The default notebooks have a maximum lifetime of 6 hours, and after 30 minutes of inactivity, any running notebook will automatically be stopped. If you are use a notebook with paid custom compute, these values can be modified.
Activity is determined based on the Jupyter kernel — if you have a long-running computation, the notebook will be considered as active for the entire time.
All Redivis notebooks support real-time collaboration, allowing multiple editors to edit and run cells in a running notebook. When another editor is active in a notebook, you will see a colored cursor associated with them. Workflow viewers will see a read-only version of the notebook.
To change a notebook's primary source table, either right-click on the notebook or click the three-dot (ⵗ
) icon and select the "change source table" option.
Notebooks are subject to certain concurrency and duration limits.
All notebooks come with a number of common packages pre-installed. You can install additional packages by clicking the Edit dependencies button in the notebook start modal or toolbar.
For more detailed information about the default dependencies and adding new packages, consult the documentation for your notebook type:
Notebooks offer special capabilities for files written to specific directories on the notebook's hard disk. Any files you've stored in a notebook's /out
and /scratch
directories will be available in the files modal. This modal can allow you to preview and download specific file outputs from your notebook.
Moreover, files written to the /out
directory are always available, and will persist across notebook sessions. This allows for workflows where you can cache certain results between notebook sessions, avoiding the need to rerun time-intensive computations.
The files in the /scratch
directory are only available when the notebook is running, and will be cleared once it is stopped. The default "working directory" of all notebooks is /scratch
– this is where files will be written if you do not specify another location.
You can view the files in either directory by pressing the Files button at the top right of the notebook.
You can list files in either directory by pressing the corresponding tab, and click on any file to view it. Redivis supports interactive previews for many file types in the file inspector, and you can also download the file for further inspection and analysis. To download all files in a directory, click the Download all button in the files modal.
Every time you stop your notebook, all cell inputs (your code and markdown) will be saved and associated with that notebook session. You can view the code from all previous sessions by pressing the History button at the top right of your notebook, allowing you to view and share your code as it was any previous point in time.
Your access to a notebook is determined by your corresponding access to all tables (and their antecedent datasets) referenced by the notebook. These linkages persist across notebook sessions, as a future session could reference data from a previous session. In order to reset the tables referenced by your notebook, which will also clear all outputs in the notebook, you can choose to Clear outputs and start when starting the notebook.
In order to view a notebook, you must first have view access to the corresponding workflow, and in order to run and edit the notebook, you must also have edit access to that workflow.
Additionally, your access to a notebook is governed by your access to its source tables. In order to run a notebook and see its outputs, you must have data access to all source tables. If you have metadata access, you will be able to see cell inputs in a notebook (that is, the code), but not outputs. If you only have overview (or no) access to the source tables, you will not be able to see notebook contents.
If a notebook contains data with export restrictions, access to the external internet will be disabled while the notebook is running.
When the internet is disabled in a notebook you can still specify packages and other startup scripts in the Dependencies modal that will be installed on notebook start. Additionally, if any of your packages require internet access to run, you'll need to attempt to "preload" any content using a post-install script. For example, if you're using the tidycensus
package in R, you could preload content as follows:
Typically, you will be able to download any files written to the notebooks /out
or /scratch
directories. However, if a notebook references data with export restrictions, you will not be able to download these files, unless the file size is smaller than the relevant size-based export restrictions specified on source datasets.
Notebooks can be downloaded as PDF, HTML, and .ipynb files by clicking the three-dot More button at the top right of the notebook.
You will be given the option of whether to include cell outputs in your export — it is important that you ensure the outputs displayed in your notebook do not contain sensitive data, and that your subsequent distribution is in compliance with any data use agreements.
Notebooks on Redivis provide a highly flexible computational environment. Notebooks can be used for anything from quick visualizations to training sophisticated ML models on a large corpus of data.
Understanding the compute resources available, and when to modify which parameters, can help you take full and efficient advantage of the high-performance computing resources on Redivis.
The default notebook configuration on Redivis is always free, and provides a performant environment for working with most datasets. The computational resources in the default notebook are comparable to a typical personal computer, though likely with substantially better network performance.
The default free notebook configuration offers:
2 vCPUs (Intel Ice Lake or Cascade Lake)
32GB RAM
100GB SSD:
IOPS: 170,000 read | 90,000 write
Throughput: 660MB/s read | 350MB/s write
16Gbps networking
No GPU (see custom environments below)
6 hr max duration
30min idle timeout (no code is being written or executed)
For scenarios where you need additional computational resources, you can choose a custom compute configuration for your notebook. This enables you to specify CPU, memory, GPU, and hard disk resources, while also giving you control over the notebook's max duration and idle timeout.
In order to customize the compute configuration for your notebook, click the Edit compute configuration button in the notebook start modal or toolbar.
Redivis makes available nearly every machine type available on Google Cloud. These machines can scale from small servers all the way to massively powerful VMs with thousands of cores, terabytes of memory, and dozens of state-of-the-art GPUs.
These machine types are classified by four high-level compute platforms: General purpose, memory optimized, compute optimized, and GPU. Choose the platform, and machine type therein, that is mose appropriate for your workload.
All custom machines have an associated hourly cost (charged by the second). This cost is determined by the then-current price for that machine configuration on Google Cloud.
In order to run a custom machine, you must first purchase compute credits, and have enough credits to run the notebook for at least 15 minutes. If you run low on credits and don't have credit auto-purchase configured, you will receive various alerts as your credits run low, and ultimately the notebook will shut down when you are out of credits.
All notebooks on Redivis use either Python, R, Stata, or SAS. While Redivis notebooks are highly performant and scalable, the coding paradigms in these languages can introduce bottlenecks when working with very large tabular data. If you are running into issues with performance we suggest:
Use transforms to clean and reduce the size of your data before analyzing them further in a notebook. When possible, this will often be the most performant and cost-efficient approach.
Adjust the compute resources in your notebook. This may help to resolve these bottlenecks depending on what is causing them!
< 1GB: probably doesn't matter, use what suits you!
1-10GB: probably fine for a notebook, though a transform might be faster.
10-100GB: maybe doable in a notebook, but you'll want to make sure to apply the right programming methodologies. Try to pre-cut your data if you can.
>100GB: You should probably cut the data first in a transform, unless if you really know what you're doing.
Python notebooks provide a mechanism to interface between the python scientific stack and data on Redivis.
As a general workflow, you'll use the redivis-python library to load data from the table(s) in your workflow, and then leverage python and its ecosystem to perform your analyses. You can optionally create an output table from your notebook, which can then be used like any other table in your workflow.
The specific approaches to working with data in a notebook will be informed in part by the size and types of data that you are working with. Some common approaches are outlined below, and you can consult the full redivis-python docs for comprehensive information:
Python notebooks on Redivis are based off the jupyter/pytorch-notebook base image (version cuda12-python-3.12), which contains a variety of common scientific packages for Python running on Ubuntu 24.04. The latest version of the redivis-python library is also installed. To view all installed python packages, run pip list
from within a running notebook.
To further customize your compute environment, you can specify various dependencies by clicking the Dependencies button at the top-right of your notebook. Here you will see three tabs: Packages, pre_install.sh, and post_install.sh.
Use packages to specify the specific python packages that you would like to install via PIP. When adding a new package, it will be pinned to the latest version of that package, but you can specify another version if preferred.
For more complex dependency management, you can also specify shell scripts under pre/post_install.sh
. These scripts are executed on either side of the package installation, and are used to execute arbitrary code in the shell. Common use cases might include using apt
to install system packages (apt-get update && apt-get install -y <package>
), or using mamba
to install from conda, which can be helpful for certain libraries (mamba install <package>
).
When loading tabular data into your notebook, you'll typically bring it in as some sort of data frame. Specifically, you can load your data as:
The specific type of data frame is up to your preference, though there may be performance and memory implications that will matter for larger tables.
If your table contains geospatial variable(s), you can take advantage of geopandas to utilize GIS functions and visualization. Calling to_geopandas_dataframe()
on a Redivis table with a variable of the geography type will return an instance of a geopandas.DataFrame, with that variable specified as the data frame's geometry variable.
If your table contains more than one geography variable, the first variable will be chosen as the geometry. You can explicitly specify the geography variable via the geography_variable
parameter.
If you'd prefer to work with your geospatial data as a string, you can use any of the other table.to_* methods. In these cases, the geography variable will be represented as a WKT-encoded string.
Typically, tabular data is loaded into memory for analysis. This is often the most performant option, but if your data exceeds available memory, you'll need to consider other approaches for working with data at this scale.
Often, the best solution is to limit the amount of data that is coming into your notebook. To do so, you can:
Leverage transforms to first filter / aggregate your data
Select only specific variables from a table by passing the variables=list(str)
argument.
Pre-filter data via a SQL query from within your notebook, via the redivis.query() method.
Pre-process data as it is loaded into your notebook, via the batch_preprocessor
argument.
If your data is still pushing memory limits, there are two primary options. You can either store data on disk, or process data as a stream:
Hard disks are often much larger than available memory, and by loading data first to disk, you can significantly increase the amount of data available in the notebook. Moreover, modern columnar data formats support partitioning and predicate pushdown, allowing us to perform highly performant analyses on these disk-backed dataframes.
The general approach for these disk-backed dataframes is to lazily evaluate our computation, only pulling content into memory after all computations have been applied, and ideally the data has been reduced. The redivis.Table
methods to_dask_dataframe()
, to_polars_lazyframe()
, and to_arrow_dataset()
all return a disk-backed dataframe:
All three of these libraries also support various forms of batched processing, which allows you to process your data similar to the streaming methodology outlined below. While it will generally be faster to just process the stream directly, it can be helpful to first load a table to disk as you experiment with a streaming approach:
By streaming data into your notebook, you can process data in batches of rows, avoiding the need to load more than a small chunk of data into memory at a time. This approach is the most scalable, since it won't be limited by available memory or disk. For this, we can use the Table.to_arrow_batch_iterator()
method:
Unstructured data files on Redivis are represented by file index tables, or specifically, tables that contain a file_id
variable. If you have file index tables in your workflow, you can analyze the files represented in those tables within your notebook. Similarly to working with tabular data, we can either download all files, or iteratively process them:
Redivis notebooks offer the ability to materialize notebook outputs as a new table node in your workflow. This table can then be processed by transforms, read into other notebooks, exported, or even re-imported into a dataset.
To create an output table, use the redivis.current_notebook().create_output_table()
method, passing in any of the following as the first argument:
A string file path to any parquet file
Redivis will automatically handle any type inference in generating the output table, mapping your data type to the appropriate Redivis type.
If an output table for the notebook already exists, by default it will be overwritten. You can pass append=True
to append, rather than overwrite, the table. In order for the append to succeed, all variables in the appended table, which are also present in the existing table, must have the same type.
As you perform your analysis, you may generate files that are stored on the notebook's hard disk. There are two locations that you should write files to: /out
for persistent storage, and /scratch
for temporary storage.
Any files written to persistent storage will be available when the notebook is stopped, and will be restored to the same state when the notebook is run again. Alternatively, any files written to temporary storage will only exist for the duration of the current notebook session.
R notebooks provide a mechanism to interface between the R scientific stack and data on Redivis.
As a general workflow, you'll use the redivis-r library to load data from the table(s) in your workflow, and then leverage R and its ecosystem to perform your analyses. You can optionally create an output table from your notebook, which can then be used like any other table in your workflow.
The specific approaches to working with data in a notebook will be informed in part by the size and types of data that you are working with. Some common approaches are outlined below, and you can consult the full redivis-r docs for comprehensive information:
R notebooks on Redivis are based off the jupyter/r-notebook base image (version r-4.4.1), which contains a variety of common scientific packages for R running on Ubuntu 24.04. The latest version of the redivis-r library is also installed. To view all installed R packages, execute the following:
To further customize your compute environment, you can specify various dependencies by clicking the Dependencies button at the top-right of your notebook. Here you will see three tabs: Packages, pre_install.sh, and post_install.sh.
Use packages to specify the specific R packages that you would like to install. When adding a new package, it will be pinned to the latest version of that package, but you can specify another version if preferred. If a given package and version exists on conda, it will be installed from there, otherwise the package will be installed via R's devtools::install()
.
For more complex dependency management, you can also specify shell scripts under pre/post_install.sh
. These scripts are executed on either side of the package installation, and are used to execute arbitrary code in the shell. Common use cases might include using apt
to install system packages (apt-get update && apt-get install -y <package>
), using mamba
to install from conda, or executing R code to install additional dependencies. To execute R code in the shell, you should run:
When loading tabular data into your notebook, you'll typically bring it in as some sort of data frame. Specifically, you can load your data as:
The specific type of data frame is up to your preference, though there may be performance and memory implications that will matter for larger tables.
If your table contains geospatial variable(s), you can take advantage of the sf (simple features) package to utilize GIS functions and visualization. By default, calling Table$to_sf_tibble()
on a Redivis table with a variable of the geography type will return an instance of a SF tibble, with that variable specified as the corresponding geometry column.
If your table contains more than one geography variable, the first variable will be chosen as the geometry column. You can explicitly specify the geography variable via the geography_variable
parameter.
If you'd prefer to work with your geospatial data as a string, you can use any of the other table$to_*
methods. In these cases, the geography variable will be represented as a WKT-encoded string.
Typically, tabular data is loaded into memory for analysis. This is often the most performant option, but if your data exceeds available memory, you'll need to consider other approaches for working with data at this scale.
Often, the best solution is to limit the amount of data that is coming into your notebook. To do so, you can:
Leverage transforms to first filter / aggregate your data
Select only specific variables from a table by passing the variables=list(str)
argument.
Pre-filter data via a SQL query from within your notebook, via the redivis$query() method.
Pre-process data as it is loaded into your notebook, via the batch_preprocessor
argument.
If your data is still pushing memory limits, there are two primary options. You can either store data on disk, or process data as a stream:
Hard disks are often much larger than available memory, and by loading data first to disk, you can significantly increase the amount of data available in the notebook. Moreover, modern columnar data formats support partitioning and predicate pushdown, allowing us to perform highly performant analyses on these disk-backed dataframes.
The general approach for these disk-backed dataframes is to lazily evaluate our computation, only pulling content into memory after all computations have been applied, and ideally the data has been reduced. The methods to_arrow_dataset()
returns a disk-backed dataframe that supports most dplyr methods:
Arrow datasets also support batched processing, which allows you to process your data similar to the streaming methodology outlined below. While it will generally be faster to just process the stream directly, it can be helpful to first load a table to disk as you experiment with a streaming approach:
arrow.RecordBatch documentation >
By streaming data into your notebook, you can process data in batches of rows, avoiding the need to load more than a small chunk of data into memory at a time. This approach is the most scalable, since it won't be limited by available memory or disk. To do so, we can use the Table$to_arrow_batch_reader()
method
Unstructured data files on Redivis are represented by file index tables, or specifically, tables that contain a file_id
variable. If you have file index tables in your workflow, you can analyze the files represented in those tables within your notebook. Similarly to working with tabular data, we can either download all files, or iteratively process them:
Redivis notebooks offer the ability to materialize notebook outputs as a new table node in your workflow. This table can then be processed by transforms, read into other notebooks, exported, or even re-imported into a dataset.
To create an output table, use the redivis$current_notebook()$create_output_table()
method, passing in any of the following as the first argument:
A string file path to any parquet file
Redivis will automatically handle any type inference in generating the output table, mapping your data type to the appropriate Redivis type.
If an output table for the notebook already exists, by default it will be overwritten. You can pass append=TRUE
to append, rather than overwrite, the table. In order for the append to succeed, all variables in the appended table, which are also present in the existing table, must have the same type.
As you perform your analysis, you may generate files that are stored on the notebook's hard disk. There are two locations that you should write files to: /out
for persistent storage, and /scratch
for temporary storage.
Any files written to persistent storage will be available when the notebook is stopped, and will be restored to the same state when the notebook is run again. Alternatively, any files written to temporary storage will only exist for the duration of the current notebook session.
Stata notebooks are available for those researchers who are more comfortable using Stata and its ecosystem. These are built off the same base image as python notebooks, but include the official pystata library to allow for the execution of Stata in a notebook environment.
Working with Stata in a notebook environment is slightly different than the Stata desktop application, in that we need to utilize python to pass data into Stata. This step is quite simple, and doesn't require any expertise in python – see working with tabular data below.
Because Stata is proprietary software, you will need to provide a license for Stata 16 or later in order to enable Stata notebooks on Redivis. Organizations can specify license information in their settings, which will make Stata notebooks available to all members of their organization. Alternatively, you can provide your own stata license in your workspace.
Stata notebooks are based off the python notebook base image, and can combine both Stata and Python dependencies to create novel workflows.
To further customize your compute environment, you can specify various dependencies by clicking the Dependencies button at the top-right of your notebook. Here you will see three tabs: Packages, pre_install.sh, and post_install.sh.
Use packages to specify the python packages that you would like to install. When adding a new package, it will be pinned to the latest version of that package, but you can specify another version if preferred.
In order to install Stata packages via ssc
, you should use the pre- and post- install shell scripts. These scripts are executed on either side of the python package installation, and are used to execute arbitrary code in the shell. Here you can execute stata code to run ssc install
, and you can also use apt
to install system packages (apt-get update && apt-get install -y <package>
), or mamba
to install from conda. E.g.
In order to load data into Stata, we first pull it into a data frame in python, and then pass that variable into Stata. If you're unfamiliar with python, you can just copy+paste the below into the first cell of your notebook to load the data in python.
View the Table.to_pandas_dataframe() python documentation ->
Next, in a separate cell, we use the %%stata
"magic" at the start of our cell to specify that this is stata code. We include the -d df
argument to pass in the df variable from python into Stata, and include the -force
flag to tell Stata to overwrite any current dataset that we have.
Any subsequent cells that execute stata code should be prefixed by %%stata
if they are more than one line, or by %stata
if the code to be executed is all on one line:
You can also use the %%mata
command to execute Mata code:
Through various packages, Stata offers some support for geospatial datatypes. However, we can't pass geospatial data from python natively, and instead need to first create a shapefile that can then be loaded into Stata.
View the Table.to_geopandas_dataframe() python documentation ->
If your data is too big to fit into memory, you may need to first download the data as a CSV, and then read that file into Stata:
Redivis notebooks offer the ability to materialize notebook outputs as a new table node in your workflow. This table can then be processed by transforms, read into other notebooks, exported, or even re-imported into a dataset.
To create an output table, we first need to pass our Stata data back to python, using the -dout
flag. We can then use the redivis.current_notebook().create_output_table()
method in python to output our data.
If an output table for the notebook already exists, by default it will be overwritten. You can pass append=True
to append, rather than overwrite, the table. In order for the append to succeed, all variables in the appended table, which are also present in the existing table, must have the same type.
As you perform your analysis, you may generate files and figures that are stored on the notebook's hard disk. There are two locations that you should write files to: /out
for persistent storage, and /scratch
for temporary storage. By default, the output location is set to /scratch
.
Any files written to persistent storage will be available when the notebook is stopped, and will be restored to the same state when the notebook is run again. Alternatively, any files written to temporary storage will only exist for the duration of the current notebook session.
SAS notebooks are available for those researchers who are more comfortable using SAS and its ecosystem. These are built off the same base image as python notebooks, but include the official SASPy library to allow for the execution of SAS in a notebook environment.
Working with SAS a notebook environment is slightly different than the SAS desktop application, in that we need to utilize python to interchange data with SAS. This step is quite simple, and doesn't require any expertise in python – see working with tabular data below.
Because SAS is proprietary software, you will need to have a licensed version of SAS 9.4 in order to enable SAS notebooks on Redivis. Organizations can specify license information in their settings, which will make SAS notebooks available to all members of their organization. Alternatively, you can provide your own SAS license in your workspace.
SAS notebooks are based off the python notebook base image, and can combine SAS with Python and optional python dependencies to create novel workflows.
To further customize your compute environment, you can specify various dependencies by clicking the Dependencies button at the top-right of your notebook. Here you will see three tabs: Packages, pre_install.sh, and post_install.sh.
Use packages to specify the python packages that you would like to install. When adding a new package, it will be pinned to the latest version of that package, but you can specify another version if preferred.
To manage system dependencies, and for more complicated workflows, you can use the pre- and post- install shell scripts. These scripts are executed on either side of the python package installation, and are used to execute arbitrary code in the shell. For example, you can also use apt
to install system packages (apt-get update && apt-get install -y <package>
), or mamba
to install from conda.
In order to load data into SAS, we first pull it into a data frame in python, and then pass that variable into SAS. If you're unfamiliar with python, you can just copy+paste the below into the first cell of your notebook to load the data in python.
View the Table.to_pandas_dataframe() python documentation ->
Now that we have the dataset "df" in SAS, we can run SAS code against the data. To do so, we must prefix any SAS cell with the line %%SAS sas_session
:
SAS offers some support for geospatial datatypes. However, we can't pass geospatial data from python natively, and instead need to first create a shapefile that can then be loaded into SAS.
View the Table.to_geopandas_dataframe() python documentation ->
Next, we can load this shapefile via SAS:
If your data is too big to fit into memory, you may need to first download the data as a CSV, and then read that file into SAS:
Redivis notebooks offer the ability to materialize notebook outputs as a new table node in your workflow. This table can then be processed by transforms, read into other notebooks, exported, or even re-imported into a dataset.
To create an output table, we first need to pass our SAS data back to python. We can then use the redivis.current_notebook().create_output_table()
method in python to output our data.
If an output table for the notebook already exists, by default it will be overwritten. You can pass append=True
to append, rather than overwrite, the table. In order for the append to succeed, all variables in the appended table, which are also present in the existing table, must have the same type.
As you perform your analysis, you may generate files and figures that are stored on the notebook's hard disk. There are two locations that you should write files to: /out
for persistent storage, and /scratch
for temporary storage.
Any files written to persistent storage will be available when the notebook is stopped, and will be restored to the same state when the notebook is run again. Alternatively, any files written to temporary storage will only exist for the duration of the current notebook session.
Redivis notebooks are built on top of Jupyter notebooks and the .ipynb
notebook format. To execute code in a notebook you'll need to use the Jupyter interface. Here are some basics for getting started.
Cells are the basic building block of a notebook. You can create Code cells or Markdown cells depending on what you'd like the contents to be. Cells are completely independent and can be collapsed or reordered.
You can write and execute code in a code cell. The notebook kernel you selected when you created the notebook will determine what kind of code you can execute (Python, R, Stata, or SAS).
When you run a code cell the results are displayed below the cell as the cell’s output. The output might be text, figures, or HTML tables.
You can document your processes and annotate your work using markdown cells. You can input markdown language into these cells such as headers #
lists -
, and more. See a full breakdown of markdown options.
When building an analysis you will probably work on it in discrete pieces, organizing related ideas into cells and moving forward once previous parts work correctly. Each of these pieces of analysis can be computed and annotated in a different cell.
You can run a cell by typing Shift-Enter
/ clicking the Play
button in the toolbar / clicking the Run
menu item. This will execute the current cell, show any output, and jump to the next cell below.
Cells can be run in any order and updated independently. The number to the left of the cell will show the place in the sequence that the cell was executed in.
Cells can be reordered by typing control shift ↑
or control shift ↓
. While cells can be reordered, we suggest you keep cells in sequential order to maintain reproducibility.
All Redivis notebooks start with text at the top in a markdown cell to help you get started. This and any other cell can be collapsed if it contains information you don't need at the moment but want to save for later by clicking the blue bar to the left of the cell. The way that you leave cells collapsed or expanded in your notebook when you stop it is how the next viewer or editor will find them.
You can press the tab
key to autocomplete functions in your chosen coding language. You also can hover over variable names to get more information about them.
There are many guides about using the Jupyter/notebook format. If you want to dive deeper into conceptual usages, keyboard shortcuts, and the Jupyter interface you can start with the official Jupyter documentation.