Redivis Documentation
API DocumentationRedivis Home
  • Introduction
  • Redivis for open science
    • FAIR data practices
    • Open access
    • Data repository characteristics
    • Data retention policy
    • Citations
  • Guides
    • Getting started
    • Discover & access data
      • Discover datasets
      • Apply to access restricted data
      • Create a study
    • Analyze data in a workflow
      • Reshape data in transforms
      • Work with data in notebooks
      • Running ML workloads
      • Example workflows
        • Analyzing large tabular data
        • Create an image classification model
        • Fine tuning a Large Language Model (LLM)
        • No-code visualization
        • Continuous enrollment
        • Select first/last encounter
    • Export & publish your work
      • Export to other environments
      • Build your own site with Observable
    • Create & manage datasets
      • Create and populate a dataset
      • Upload tabular data as tables
      • Upload unstructured data as files
      • Cleaning tabular data
    • Administer an organization
      • Configure access systems
      • Grant access to data
      • Generate a report
      • Example tasks
        • Emailing subsets of members
    • Video guides
  • Reference
    • Your account
      • Creating an account
      • Managing logins
      • Single Sign-On (SSO)
      • Workspace
      • Studies
      • Compute credits and billing
    • Datasets
      • Documentation
      • Tables
      • Variables
      • Files
      • Creating & editing datasets
      • Uploading data
        • Tabular data
        • Geospatial data
        • Unstructured data
        • Metadata
        • Data sources
        • Programmatic uploads
      • Version control
      • Sampling
      • Exporting data
        • Download
        • Programmatic
        • Google Data Studio
        • Google Cloud Storage
        • Google BigQuery
        • Embedding tables
    • Workflows
      • Workflow concepts
      • Documentation
      • Data sources
      • Tables
      • Transforms
        • Transform concepts
        • Step: Aggregate
        • Step: Create variables
        • Step: Filter
        • Step: Join
        • Step: Limit
        • Step: Stack
        • Step: Order
        • Step: Pivot
        • Step: Rename
        • Step: Retype
        • Step: SQL query
        • Variable selection
        • Value lists
        • Optimization and errors
        • Variable creation methods
          • Common elements
          • Aggregate
          • Case (if/else)
          • Date
          • DateTime
          • Geography
          • JSON
          • Math
          • Navigation
          • Numbering
          • Other
          • Statistical
          • String
          • Time
      • Notebooks
        • Notebook concepts
        • Compute resources
        • Python notebooks
        • R notebooks
        • Stata notebooks
        • SAS notebooks
        • Using the Jupyter interface
      • Access and privacy
    • Data access
      • Access levels
      • Configuring access
      • Requesting access
      • Approving access
      • Usage rules
      • Data access in workflows
    • Organizations
      • Administrator panel
      • Members
      • Studies
      • Workflows
      • Datasets
      • Permission groups
      • Requirements
      • Reports
      • Logs
      • Billing
      • Settings and branding
        • Account
        • Public profile
        • Membership
        • Export environments
        • Advanced: DOI configuration
        • Advanced: Stata & SAS setup
        • Advanced: Data storage locations
        • Advanced: Data egress configuration
    • Institutions
      • Administrator panel
      • Organizations
      • Members
      • Datasets
      • Reports
      • Settings and branding
    • Quotas and limits
    • Glossary
  • Additional Resources
    • Events and press
    • API documentation
    • Redivis Labs
    • Office hours
    • Contact us
    • More information
      • Product updates
      • Roadmap
      • System status
      • Security
      • Feature requests
      • Report a bug
Powered by GitBook
On this page
  • Overview
  • 1. Create a notebook
  • 2. Define dependencies
  • 3. Compute resources
  • 4. Start the notebook
  • 5. Load data
  • Referencing tables
  • Referencing files
  • 6. Analyze data
  • 7. Create an output table
  • Next steps

Was this helpful?

Export as PDF
  1. Guides
  2. Analyze data in a workflow

Work with data in notebooks

Last updated 4 months ago

Was this helpful?

Overview

Redivis notebooks a performant, flexible environment for analysis that allow you to analyze and visualize data in workflows in Python, R, Stata, or SAS. With the notebook computation happening on Redivis, you don't need to configure an environment on a local machine or server, or export data from Redivis. This makes for easy iteration and collaboration, not to mention ensuring better security and data throughput.

Before working with a notebook you'll want to get started first by and adding data. You can then create a notebook off of any table in your workflow.

If you are working with very large tables (>10GB is a good rule of thumb), it's always a good idea to first reshape and reduce the data via , since they can be significantly more performant for large data operations than running code in Python, R, Stata, or SAS.

1. Create a notebook

Once you have a table that you're ready to analyze, you can create a notebook by clicking the + Notebook button at any time. You'll need to name it and choose a kernel (Python, R, Stata, or SAS).

Notebooks can only reference tables within their workflow, so we recommend keeping all related work together in the same workflow.

Python

R

Stata

SAS

2. Define dependencies

3. Compute resources

The default notebook configuration is free, and provides access to 2 CPUs and 32GB working memory, alongside a 60GB (SSD) disk and gigabit network. The computational powerful of these default notebooks are comparable to most personal computers, and will be more than enough for many analyses.

Clicking Edit compute configuration button in the start modal or the toolbar will allow you to choose from different preconfigured machine types. The notebook will then default to this compute configuration each time it starts up.

4. Start the notebook

Notebook nodes need to be started in order to edit or execute cells. When first clicking on a notebook node, you will see a read-only view of its contents (including cell outputs). Click the Start notebook button in the toolbar to connect this notebook to compute resources.

When you create a notebook for the first time it will start automatically.

5. Load data

To do meaningful work in your notebook, you'll want to bring in the tabular and/or unstructured data that exists in your workflow into your notebook.

Referencing tables

Notebooks come pre-populated with templated code that pulls in data from the notebook's source table. You will need to run this cell to pull the data into the notebook, and you can see that it worked because this code will print a preview of the loaded data.

import redivis

# The source table of this notebook can always be referenced as "_source_"
table = redivis.table("_source_")

# Load table as a pandas dataframe. 
# Consult the documentation for more load options.
df = table.to_pandas_dataframe()

# We can also reference any other table in this workflow by name.
df2 = redivis.table("my_other_table").to_pandas_dataframe()

print(df)
print(df2)
# The source table of this notebook can always be referenced as "_source_"
redivis_table <- redivis$table("_source_")

# Load table as a tidyverse tibble. 
# Consult the documentation for more load options.
df <- redivis_table$to_tibble()

# We can also reference any other table in this workflow by name.
df2 <- redivis$table("my_other_table")$to_tibble()

print(df)
print(df2)
# In order to load data into Stata, we first have to bring it into Python.
# This code loads the "_source_" table in the python variable `df`
# We can then pass this variable as our stata dataset.

import redivis

# The source table of this notebook can always be referenced as "_source_"
# Reference any other table in this workflow by name.
table = redivis.table("_source_")

df = table.to_pandas_dataframe(dtype_backend="numpy")
%%stata -d df -force
/*
# Use the %%stata magic to load our dataframe, specified by the -d parameter
# The -force flag replaces the the current working dataset in Stata

# The rest is just Stata code!
*/

describe
import saspy
sas = saspy.SASsession(results='HTML')

# We first load the table via python, and then pass the dataframe into SAS
df = redivis.table("_source_").to_pandas_dataframe(dtype_backend="numpy")

sas_data = sas.df2sd(df, '_df') # second argument is the name of the table in SAS
sas_data.heatmap('msrp', 'horsepower')

Referencing files

Any files with unstructured data stored in Redivis tables can be referenced by their globally unique file_id. You can also reference these file_id's in any derivative tables, allowing you to query and download specific subsets of files.

When working with large files, you'll want to consider saving the files to disk and/or working with the streaming interfaces to reduce memory overhead and improve performance.

redivis_file = redivis.file("rnwk-acs3famee.pVr4Gzq54L3S9pblMZTs5Q")

# Download the file
download_location = redivis_file.download("./my-downloads")
f = open(download_location, "r")

# Read the file into a variable
file_content = redivis_file.read(as_text=True)
print(file_content)

# Stream the file as bytes or text
with redivis_file.stream() as f:
  f.read(100) # read 100 bytes

with TextIOWrapper(redivis_file.stream()) as f:
  f.readline() # read first line
  
# We can also iterate over all files in a table
for redivis_file in redivis.table("_source_").list_files():
  # Do stuff with file
redivis_file <- redivis$file("s335-8ey8zt7bx.qKmzpdttY2ZcaLB0wbRB7A")

# Download a file
redivis_file$download("/path/to/dir/", overwrite=TRUE)

# Read a file
data <- redivis_file$read(as_text=TRUE)
  
# Stream a file (callback gets called with each chunk)
data <- redivis_file$stream(function(x) {
  print(length(x))
})

# We can also iterate over all files in a table
for (redivis_file in redivis$table("_source_")$list_files()){
  # Do stuff with file
}

6. Analyze data

At this point, you have all the tools you need to work with your data in your chosen language. The Python, R, Stata, and SAS ecosystems contain myriad tools and libraries for performing sophisticated data analysis and visualization.

7. Create an output table

Notebooks can produce an output table, which you can sanity check and further analyze in your workflow by including in other notebooks or exporting to other systems.

# Read table into a pandas dataframe
df = redivis.table('_source_').to_pandas_dataframe()

# Perform various data manipulation actions
df2 = df.apply(some_processing_fn)

# Create an output table with the contents of this dataframe
redivis.current_notebook().create_output_table(df2)
# Read table into a tibble
tbl = redivis$table('_source_')$to_tibble()

# Perform various data manipulation actions
tbl2 = tbl %>% mutate(...)

# Create an output table with the contents of this dataframe
redivis$current_notebook()$create_output_table(tbl2)
%%stata -doutd df2
/*
  Once this cell executes, the current dataset will be pushed 
  to the python variable df2
*/
rename v* newv*
# Via python, pass this dataframe to the output table
redivis.current_notebook().create_output_table(df2)
# Convert a SAS table to a pandas dataframe
df = sas_table.to_df()

# Create an output table with the contents of this dataframe
redivis.current_notebook().create_output_table(df)

Next steps

Share and collaborate

All Redivis notebooks support real-time collaboration, allowing multiple editors to edit and run cells in a running notebook. When another editor is active in a notebook, you will see a colored cursor associated with them (much like a Google Doc).

Cite datasets in your publications

Python notebooks come pre-installed with a variety of common scientific packages for python.

R notebooks come pre-installed with a variety of common scientific packages for R.

Stata notebooks are based off of python notebooks, but offer affordances for moving data between Python and Stata.

SAS notebooks are based off of python notebooks, but offer affordances for moving data between Python and SAS.

All notebooks come with a number of common packages pre-installed, depending on the . But if there is something specific you'd like to include, you can add or write a by clicking the Edit dependencies button in the start modal or the toolbar.

Learn more in the reference section.

If you're working with larger tables, creating an ML model, or performing other particularly intensive tasks, you may choose to configure additional for the notebook. This will cost an hourly rate to run based on your chosen environment, and require you to purchase on your account.

Learn more in the reference section.

You can reference any other tables in this workflow by replicating this script and executing it with a different table reference. As a rule of thumb, notebooks will easily support interactive analysis of tables up to ~1GB; if your table is larger, try reducing it first by creating a , or make sure to familiarize yourself with the tools for working with larger tables in the notebook's programming language.

The notebook interface is based off of , and has similar capabilities. You can also export a read-only copy of your notebook as an .ipynb, PDF, or HTML file.

Learn more in the reference section.

to work with collaborators in real time, and make it public so that others can fork off of and build upon your work.

If the work you're doing leads to a publication, make sure to reference the dataset pages from datasets you've used for information from the data administrators on .

Learn more about working with python notebooks.
Learn more about working with R notebooks.
Learn more about working with Stata notebooks.
Learn more about working with SAS notebooks.
compute resources
compute credits
Compute resources
transform
See more examples in the Python notebooks reference.
See more examples in the R notebooks reference.
See more examples in the Stata notebooks reference.
See more examples in the SAS notebooks reference.
Jupyter notebooks
Notebooks
how to correctly cite it
creating a workflow
transforms
notebook type
versioned packages
pre-/post- install script
Notebooks
See more examples in the R notebooks reference.
See more examples in the R notebooks reference.
Share your workflow
See more examples in the SAS notebooks reference.
See more examples in the Python notebooks reference.
See more examples in the Python notebooks reference.
See more examples in the Stata notebooks reference.