Redivis Documentation
API DocumentationRedivis Home
  • Introduction
  • Redivis for open science
    • FAIR data practices
    • Open access
    • Data repository characteristics
    • Data retention policy
    • Citations
  • Guides
    • Getting started
    • Discover & access data
      • Discover datasets
      • Apply to access restricted data
      • Create a study
    • Analyze data in a workflow
      • Reshape data in transforms
      • Work with data in notebooks
      • Running ML workloads
      • Example workflows
        • Analyzing large tabular data
        • Create an image classification model
        • Fine tuning a Large Language Model (LLM)
        • No-code visualization
        • Continuous enrollment
        • Select first/last encounter
    • Export & publish your work
      • Export to other environments
      • Build your own site with Observable
    • Create & manage datasets
      • Create and populate a dataset
      • Upload tabular data as tables
      • Upload unstructured data as files
      • Cleaning tabular data
    • Administer an organization
      • Configure access systems
      • Grant access to data
      • Generate a report
      • Example tasks
        • Emailing subsets of members
    • Video guides
  • Reference
    • Your account
      • Creating an account
      • Managing logins
      • Single Sign-On (SSO)
      • Workspace
      • Studies
      • Compute credits and billing
    • Datasets
      • Documentation
      • Tables
      • Variables
      • Files
      • Creating & editing datasets
      • Uploading data
        • Tabular data
        • Geospatial data
        • Unstructured data
        • Metadata
        • Data sources
        • Programmatic uploads
      • Version control
      • Sampling
      • Exporting data
        • Download
        • Programmatic
        • Google Data Studio
        • Google Cloud Storage
        • Google BigQuery
        • Embedding tables
    • Workflows
      • Workflow concepts
      • Documentation
      • Data sources
      • Tables
      • Transforms
        • Transform concepts
        • Step: Aggregate
        • Step: Create variables
        • Step: Filter
        • Step: Join
        • Step: Limit
        • Step: Stack
        • Step: Order
        • Step: Pivot
        • Step: Rename
        • Step: Retype
        • Step: SQL query
        • Variable selection
        • Value lists
        • Optimization and errors
        • Variable creation methods
          • Common elements
          • Aggregate
          • Case (if/else)
          • Date
          • DateTime
          • Geography
          • JSON
          • Math
          • Navigation
          • Numbering
          • Other
          • Statistical
          • String
          • Time
      • Notebooks
        • Notebook concepts
        • Compute resources
        • Python notebooks
        • R notebooks
        • Stata notebooks
        • SAS notebooks
        • Using the Jupyter interface
      • Access and privacy
    • Data access
      • Access levels
      • Configuring access
      • Requesting access
      • Approving access
      • Usage rules
      • Data access in workflows
    • Organizations
      • Administrator panel
      • Members
      • Studies
      • Workflows
      • Datasets
      • Permission groups
      • Requirements
      • Reports
      • Logs
      • Billing
      • Settings and branding
        • Account
        • Public profile
        • Membership
        • Export environments
        • Advanced: DOI configuration
        • Advanced: Stata & SAS setup
        • Advanced: Data storage locations
        • Advanced: Data egress configuration
    • Institutions
      • Administrator panel
      • Organizations
      • Members
      • Datasets
      • Reports
      • Settings and branding
    • Quotas and limits
    • Glossary
  • Additional Resources
    • Events and press
    • API documentation
    • Redivis Labs
    • Office hours
    • Contact us
    • More information
      • Product updates
      • Roadmap
      • System status
      • Security
      • Feature requests
      • Report a bug
Powered by GitBook
On this page
  • Overview
  • Characteristics
  • Univariate statistics
  • Univariate plots
  • Editing metadata
  • Exporting metadata
  • Variable types
  • String
  • Integer
  • Float
  • Boolean
  • Date
  • DateTime
  • Time
  • Geography
  • Conversion rules

Was this helpful?

Export as PDF
  1. Reference
  2. Datasets

Variables

Last updated 6 months ago

Was this helpful?

Overview

Variables are present in all on Redivis. Conceptually, they represent something that is being measured in the data. When viewed in the table cells view, variables are represented by the table's columns.

In order to view variables, you must have the source dataset(s) for the given table.

Characteristics

The characteristics on a variable help researchers understand what that variable measures and how it was collected. All characteristics (except for variable type) are indexed by the Redivis search engine — better and accurate metadata will help researchers find your dataset!

These characteristics will only be considered in search results if the user has to the underlying dataset.

Field

Notes

Name

The name of the variable, limited to 60 characters. Must be unique within the table. All variable names are case-insensitive, and can only use alpha-numeric characters and underscores, and cannot start with a number.

Type

Label

Optional. A short, human-readable description of the variable name. Limited to 256 characters.

Description

Optional. A longer space for notes about the variable's creation methods, coding, or supplementary information. Limited to 5000 characters.

Value labels

Univariate statistics

Redivis automatically computes certain statistics for each variable, depending on that variable's type and number of distinct values. To view summary statistics, click on a variable in a .

Statistic

Description

Count

Distinct

Non-null

Min, max

Min lng, Min lat, Max lng, Max lat

For geography variables: the bounding box containing all geospatial data in a variable

μ, σ

The mean and sample (unbiased) standard deviation of the variable. Only available for continuous variables (integer, float, date, dateTime, time).

Univariate plots

Histogram

A frequency chart of observations, sorted into 64 buckets, clamped to three standard deviations of the mean on either side. To switch between a linear and logarithmic y-axis, click on the bottom left corner of the chart.

Only shown for continuous variables with more than 64 distinct values.

Box plot

A visual display of the distribution of values by frequency. Shown are the minimum, 25%, median, 75% and maximum value. To include or exclude outliers in the calculations, click on the label on the bottom left corner of the chart.

Only shown for continuous variables with a meaningful number of discrete values.

Frequency table

A table showing the frequency of common values for that variable, limited to 10,000 values. If a variable's values are highly heterogenous, no frequency table will be displayed. You can right click on values to get a quick entry point to filter on that value in the query tab.

Map

A table with variable of type geography shows a heatmap of the geospatial dispersion of that variable.

Editing metadata

To edit a variable's label, description, or value labels, click the Edit metadata button on the right of any table. This will allow you to any of the variables in this table, or you can navigate to the "All tables" tab to edit the same variable across multiple tables.

In the metadata editor, you can edit a variable's metadata fields as you would in any spreadsheet, and save changes when you're done.

Exporting metadata

Variable types

All variables in Redivis have a type associated with them. All types in Redivis support NULL values; that is, an empty cell.

String

A string can be used to store any textual data (UTF-8 encoded). Moreover, a string is a "universal" data type — any other type can be converted to a string without error.

Size: 2 bytes + UTF-8 encoded string size (1 byte for ASCII characters).

Integer

A 64-bit signed integer. Supports any integer value between -9,223,372,036,854,775,808 and 9,223,372,036,854,775,807 .

Size: 8 bytes per cell.

Float

Size: 8 bytes per cell.

Boolean

A representation of either TRUE or FALSE

Size: 1 byte per cell.

Date

Represents a calendar date independent of time zone. Supports any value between 0001-01-01 and 9999-12-31

Size: 8 bytes per cell

Format:

'YYYY-[M]M-[D]D'

  • YYYY: Four digit year

  • [M]M: One or two digit month

  • [D]D: One or two digit day

DateTime

Represents a year, month, day, hour, minute, second, and subsecond, independent of timezone. Supports any value between 0001-01-01 00:00:00 and 9999-12-31 23:59:59.999999

Size: 8 bytes per cell

Format:

'YYYY-[M]M-[D]D[( |T)[H]H:[M]M:[S]S[.DDDDDD]]'
  • YYYY: Four digit year

  • [M]M: One or two digit month

  • [D]D: One or two digit day

  • ( |T): A space or a Tseparator

  • [H]H: One or two digit hour (valid values from 00 to 23)

  • [M]M: One or two digit minutes (valid values from 00 to 59)

  • [S]S: One or two digit seconds (valid values from 00 to 59)

  • [.DDDDDD]: Up to six fractional digits (i.e. up to microsecond precision)

Time

Represents a time, independent of a specific date. Supports values between 00:00:00 and 23:59:59.999999

Size: 8 bytes per cell

Format:

'[H]H:[M]M:[S]S[.DDDDDD]'
  • [H]H: One or two digit hour (valid values from 00 to 23)

  • [M]M: One or two digit minutes (valid values from 00 to 59)

  • [S]S: One or two digit seconds (valid values from 00 to 59)

  • [.DDDDDD]: Up to six fractional digits (i.e. up to microsecond precision)

Geography

Size: cell-content dependent

Format:

'(POINT|MULTIPOINT|LINESTRING|MULTILINESTRING|POLYGON|MULTIPOLYGON)(...)'

-- Multiple of above geometries may be wrapped with
'(GEOMETRYCOLLECTION)(...)'

Conversion rules

Some types can be implicitly converted within a query (e.g., 1 (integer) < 2.2 (float)), in other circumstances, you will explicitly need to convert the type of a variable before performing certain operations (e.g., "1.0" (string) < 2.2 (float)).

Integer

Float

Returns a close but potentially not exact FLOAT64 value.

Integer

Boolean

Returns FALSE if x is 0, TRUE otherwise.

Float

Integer

Returns the closest INT64 value. Halfway cases such as 1.5 or -0.5 round away from zero.

Float

String

Returns an approximate string representation.

Date

DateTime

Returns a DateTime at midnight on the corresponding date. For example, if x is 1970-01-01, returns 1970-01-01 00:00:00

DateTime

Date

Returns the part of the DateTime which is the calendar date. Note that this will not round the DateTime to the nearest Date. E.g., if x is 1970-01-01 23:59:59 , returns 1970-01-01 .

DateTime

Time

Returns the part of the DateTime which is the clock time. E.g., if x is 1970-01-01 23:59:59 , returns 23:59:59 .

Boolean

Integer

Returns 1 if x is TRUE, 0 otherwise.

Boolean

String

Returns "true" if x is TRUE, "false" otherwise.

String

Float

Returns x as a FLOAT64 value, interpreting it as having the same form as a valid FLOAT64 literal. Also supports casts from "inf", "+inf", "-inf", and "nan". Conversions are case-insensitive.

String

Boolean

Returns TRUE if x is "true" and FALSE if x is "false" All other values of x are invalid and throw an error instead of casting to BOOL. STRINGs are case-insensitive when converting to BOOL.

String

Geography

Geography

String

Required. The data type of the variable. .

Optional. A map of each value in the data to a longer string of information. For example, a variable with records 0 , 1 , and 2 might have value labels for 0 = No, 1 = Yes, 2 = Don't know. Value labels will be shown inline on the of a table, alongside the in the univariate statistics.

The total number of values for this variable in the . Does not include any values.

The number of unique values in the table. Does not include as a value. (For example, a variable with values 0, 1, and null will have a distinct of 2.)

The percentage of values in the table which are not . Calculated by dividing the Count by the total number of records in the table.

For variables (integer, float, date, dateTime, time): the minimum and maximum of this variable.

For variables: the minimum and maximum length of all values.

The box plot uses the BigQuery method, which will estimate quantiles () for larger datasets. As such, the box plot values should only be used as an approximation.

When working in bulk, you can also .

You can export your metadata at any time (in either of the above formats) by clicking Export file in the variable metadata editor. Variable metadata is also available in JSON format via the .

A 64-bit (double precision) decimal value.

All values in brackets are optional. If your Date is not in this format, you can convert it using . Redivis will attempt to convert other , when possible.

All values in brackets are optional. If your DateTime is not in this format, you can convert it using . Redivis will attempt to convert other , when possible.

All values in brackets are optional. If your Time is not in this format, you can convert it using . Redivis will attempt to convert other , when possible.

Represents a Point, Multipoint, Linestring, MultiLinestring, Polygon, or MultiPolygon geometry, as specified by the .

You can create or manipulate geographic variables using various .

The is used to show cell values, though each geography cell can also be viewed on a map by clicking on that cell within a table.

Variable types can be converted within the interface in a . The following conversions are supported:

Use new variable method.

Use new variable method.

APPROX_QUANTILES
±0.57%, 95% CI
upload metadata directly
variables.list API endpoint
floating point
GeoJSON standard
new variable methods
WKT format
transform
workflow
continuous
string
Learn more about variable types
tables
table
frequency table values
cells view
table
metadata access
metadata access
null
null
null
format strings
common date formats on upload
format strings
common datetime formats on upload
format strings
common time formats on upload
Look at univariate statistics by selecting a variable
ST_GEOGFROM
ST_ASTEXT