Work with data in notebooks
Last updated
Last updated
Redivis notebooks a performant, flexible environment for analysis that allow you to analyze and visualize data in projects in Python, R, Stata, or SAS. With the notebook computation happening on Redivis, you don't need to configure an environment on a local machine or server, or export data from Redivis. This makes for easy iteration and collaboration, not to mention ensuring better security and data throughput.
Before working with a notebook you'll want to get started first by creating a project and adding data. You can then create a notebook off of any table in your project.
If you are working with very large tables (>10GB is a good rule of thumb), it's always a good idea to first reshape and reduce the data via transforms, since they can be significantly more performant for large data operations than running code in Python, R, Stata, or SAS.
Once you have a table that you're ready to analyze, you can create a notebook by clicking the + Notebook button at any time. You'll need to name it and choose a kernel (Python, R, Stata, or SAS).
Notebooks can only reference tables within their project, so we recommend keeping all related work together in the same project.
Python notebooks come pre-installed with a variety of common scientific packages for python. Learn more about working with python notebooks.
R notebooks come pre-installed with a variety of common scientific packages for R. Learn more about working with R notebooks.
Stata notebooks are based off of python notebooks, but offer affordances for moving data between Python and Stata. Learn more about working with Stata notebooks.
SAS notebooks are based off of python notebooks, but offer affordances for moving data between Python and SAS. Learn more about working with SAS notebooks.
All notebooks come with a number of common packages pre-installed, depending on the notebook type. But if there is something specific you'd like to include, you can add versioned packages or write a pre-/post- install script by clicking the Edit dependencies button in the start modal or the toolbar.
Learn more in the Notebooks reference section.
The default notebook configuration is free, and provides access to 2 CPUs and 32GB working memory, alongside a 60GB (SSD) disk and gigabit network. The computational powerful of these default notebooks are comparable to most personal computers, and will be more than enough for many analyses.
If you're working with larger tables, creating an ML model, or performing other particularly intensive tasks, you may choose to configure additional compute resources for the notebook. This will cost an hourly rate to run based on your chosen environment, and require you to purchase compute credits on your account.
Clicking Edit compute configuration button in the start modal or the toolbar will allow you to choose from different preconfigured machine types. The notebook will then default to this compute configuration each time it starts up.
Learn more in the Compute resources reference section.
Notebook nodes need to be started in order to edit or execute cells. When first clicking on a notebook node, you will see a read-only view of its contents (including cell outputs). Click the Start notebook button in the toolbar to connect this notebook to compute resources.
When you create a notebook for the first time it will start automatically.
To do meaningful work in your notebook, you'll want to bring in the tabular and/or unstructured data that exists in your project into your notebook.
Notebooks come pre-populated with templated code that pulls in data from the notebook's source table. You will need to run this cell to pull the data into the notebook, and you can see that it worked because this code will print a preview of the loaded data.
You can reference any other tables in this project by replicating this script and executing it with a different table reference. As a rule of thumb, notebooks will easily support interactive analysis of tables up to ~1GB; if your table is larger, try reducing it first by creating a transform, or make sure to familiarize yourself with the tools for working with larger tables in the notebook's programming language.
Any files with unstructured data stored in Redivis tables can be referenced by their globally unique file_id
. You can also reference these file_id's in any derivative tables, allowing you to query and download specific subsets of files.
When working with large files, you'll want to consider saving the files to disk and/or working with the streaming interfaces to reduce memory overhead and improve performance.
At this point, you have all the tools you need to work with your data in your chosen language. The Python, R, Stata, and SAS ecosystems contain myriad tools and libraries for performing sophisticated data analysis and visualization.
The notebook interface is based off of Jupyter notebooks, and has similar capabilities. You can also export a read-only copy of your notebook as an .ipynb, PDF, or HTML file.
Learn more in the Notebooks reference section.
Notebooks can produce an output table, which you can sanity check and further analyze in your project by including in other notebooks or exporting to other systems.
All Redivis notebooks support real-time collaboration, allowing multiple editors to edit and run cells in a running notebook. When another editor is active in a notebook, you will see a colored cursor associated with them (much like a Google Doc).
Share your project to work with collaborators in real time, and make it public so that others can fork off of and build upon your work.
If the work you're doing leads to a publication, make sure to reference the dataset pages from datasets you've used for information from the data administrators on how to correctly cite it.