Links

Analyze data in notebooks

Overview

Notebooks allow you to analyze and visualize data in projects without needing to configure an environment on a local machine or server, or export data from Redivis. This makes for easy iteration on your data workflow, and seamless collaboration.

1. Create a notebook

Once you've created a project and added a dataset to it that you'd like to work with, we strongly recommend doing any sort of data cleaning or reshaping using transforms first before making a notebook. Transforms are designed for large datasets and iterative data cleaning; using them first will greatly improve the notebook experience.
Learn more in the Reshape tables in transform guide.
Once you have a table you're ready to analyze, you can create a notebook by clicking the + Notebook button at any time. You'll need to name it and choose a language. (You can only use Stata and SAS if you or your organization has a license configured.) You can also import a notebook file to build on something already in progress.

Python

Python notebooks are based off the jupyter/scipy-notebook image, which contains a variety of common scientific packages for Python. The redivis-python library is also pre-installed.

R

R notebooks are based off the jupyter/r-notebook image, which contains tidyverse packages and other common scientific packages for R. The redivis-r library is also pre-installed.
Notebooks can only reference tables within their project, so we recommend keeping all related work together in the same project.

2. Define dependencies

All notebooks come with a number of common packages pre-installed, depending on the notebook type. But if there is something specific you'd like to include, you can do so in a dependencies startup script.
You can configure these dependencies by clicking the Dependencies button in the toolbar.
Learn more in the Notebooks reference section.

3. Start the notebook

Notebook nodes need to be started in order to edit or execute cells. When first clicking on a notebook node, you will see a read-only view of its contents (including cell outputs). Click the Start notebook button in the toolbar to connect this notebook to compute resources.
Notebooks are currently deployed with access to 2 CPUs and 8GB working memory, alongside a 50GB hard disk. These values may fluctuate based on system utilization.

4. Referencing data

To do meaningful work in your notebook, you'll want to bring in a table or associated files.

Referencing tables

If you created this notebook by clicking the + Notebook button with a table selected, you will see that the table is already referenced at the top of the notebook, and starting this notebook executes the script to pull it in and show a preview of the records.
You can reference any tables in this project by replicating this script and executing it. As a rule of thumb, notebooks will easily support interactive analysis of tables up to ~1GB; if your table is larger, try reducing it first by creating a transform.

Python

# max_results, variables parameters are optional
df = redivis.table("my-table").to_dataframe(max_results=100, variables=["var1", "var2"])

R

# max_results, variables parameters are optional
data <- redivis::table('my-table')$to_tibble(max_results=100, variables=c("name","city","state"))

Referencing files

Any files stored in Redivis datasets can be referenced by their globally unique file_id. You can also reference these file_id's in any derivative tables, allowing you to query and download specific subsets of files.
When working with large files, you'll want to consider saving the files to disk and/or working with the streaming interfaces to reduce memory overhead and improve performance.

Python

file = redivis.file("rnwk-acs3famee.pVr4Gzq54L3S9pblMZTs5Q")
# Download the file
download_location = file.download("./my-downloads")
f = open(download_location, "r")
# Read the file into a variable
file_content = file.read(as_text=True)
print(file_content)
# Stream the file as bytes or text
with file.stream() as f:
f.read(100) # read 100 bytes
with TextIOWrapper(file.stream()) as f:
f.readline() # read first line

R

file <- redivis::file("s335-8ey8zt7bx.qKmzpdttY2ZcaLB0wbRB7A")
# Download a file
file$download("/path/to/dir/", overwrite=TRUE)
# Read a file
data <- file$read(as_text=TRUE)
# Stream a file (callback gets called with each chunk)
data <- file$stream(function(x) {
print(length(x))
})

5. Analyze data

At this point, you have all the tools you need to work with your data in your chosen language. You can use the respective Redivis libraries to run analyses and build visualizations to serve your research.
Notebooks are built on Jupyter notebooks and have similar capabilities. If you find that you need to make changes to your data editing pipeline, you can run changes in transforms and propagate them directly to your notebook.
When your analysis is done, you can export a read-only copy of your notebook to .ipynb, PDF, or HTML format.
Learn more in the Notebooks reference section.

Next steps

Share and collaborate

All Redivis notebooks support real-time collaboration, allowing multiple editors to edit and run cells in a running notebook. When another editor is active in a notebook, you will see a colored cursor associated with them (much like a Google Doc).
Share your project to work with collaborators in real time, and make it public so that others can fork off of and build upon your work.

Cite datasets in your publications

If the work you're doing leads to a publication, make sure to reference the dataset pages from datasets you've used for information from the data administrators on how to correctly cite it.