Analyze data in notebooks

Overview

Notebooks allow you to analyze and visualize data in projects without needing to configure an environment on a local machine or server, or export data from Redivis. This makes for easy iteration on your data workflow, and seamless collaboration.

1. Create a notebook

Once you've created a project and added a dataset to it that you'd like to work with, we strongly recommend doing any sort of data cleaning or reshaping using transforms first before making a notebook. Transforms are designed for large datasets and iterative data cleaning; using them first will greatly improve the notebook experience.
Learn more in the Reshape tables in transform guide.
Once you have a table you're ready to analyze, you can create a notebook by clicking the + Notebook button at any time. You'll need to name it and choose a language. (You can only use Stata and SAS if you or your organization has a license configured.) You can also import a notebook file to build on something already in progress.
Notebooks can only reference tables within their project, so we recommend keeping all related work together in the same project.

2. Define dependencies

All notebooks come with a number of common packages pre-installed, depending on the notebook type. But if there is something specific you'd like to include, you can do so in a dependencies startup script.
You can configure these dependencies by clicking the Dependencies button in the toolbar.
Learn more in the Notebooks reference section.

3. Start the notebook

Notebook nodes need to be started in order to edit or execute cells. When first clicking on a notebook node, you will see a read-only view of its contents (including cell outputs). Click the Start notebook button in the toolbar to connect this notebook to compute resources.
Notebooks are currently deployed with access to 2 CPUs and 8GB working memory, alongside a 50GB hard disk. These values may fluctuate based on system utilization.

4. Import data

To do meaningful work in the notebook you'll need to bring in a table.
If you created this notebook by clicking the + Notebook button with a table selected, you will see that the table is already referenced at the top of the notebook, and starting this notebook executes the script to pull it in and show a preview of the records.
You can reference any tables in this project by replicating this script and executing it. As a rule of thumb, notebooks will easily support interactive analysis of tables up to ~1GB; if your table is larger, try reducing it first by creating a transform.

Python

Python notebooks are based off the jupyter/scipy-notebook image, which contains a variety of common scientific packages for Python. The redivis-python library is also pre-installed.
1
# max_results, variables parameters are optional
2
df = redivis.table("my-table").to_dataframe(max_results=100, variables=["var1", "var2"])
Copied!

R

R notebooks are based off the jupyter/r-notebook image, which contains tidyverse packages and other common scientific packages for R. The redivis-r library is also pre-installed.
1
# max_results, variables parameters are optional
2
data <- redivis::table('my-table')$to_tibble(max_results=100, variables=c("name","city","state"))
Copied!

5. Analyze data

At this point, you have all the tools you need to work with your data in your chosen language. You can use the respective Redivis libraries to run analyses and build visualizations to serve your research.
Notebooks are built on Jupyter notebooks and have similar capabilities. If you find that you need to make changes to your data editing pipeline, you can run changes in transforms and propagate them directly to your notebook.
When your analysis is done, you can export a read-only copy of your notebook to .ipynb, PDF, or HTML format.
Learn more in the Notebooks reference section.

Next steps

Share and collaborate

All Redivis notebooks support real-time collaboration, allowing multiple editors to edit and run cells in a running notebook. When another editor is active in a notebook, you will see a colored cursor associated with them (much like a Google Doc).
Share your project to work with collaborators in real time, and make it public so that others can fork off of and build upon your work.

Cite datasets in your publications

If the work you're doing leads to a publication, make sure to reference the dataset pages from datasets you've used for information from the data administrators on how to correctly cite it.