Metadata

Overview

Redivis determines variable names and types during data upload. Additionally, it will automatically parse certain metadata based on the uploaded file format:

  • SAS (.sas7bdat): labels

  • Stata (.dta): labels and value labels

  • SPSS (.sav): labels and value labels

For other file types (e.g., csv), you will need to augment the metadata directly. To apply metadata in bulk, you can upload a file containing metadata information directly from your computer. This file can either be a CSV or JSON.

Is your metadata stuck in a PDF? We're truly sorry — if you can, please let the data provider know that it is essential that they provide metadata in a machine-readable format; hopefully in time this will change.

While you can just upload the PDF to the dataset's documentation, you'll be doing your researchers a huge service if you can add structured metadata to the variables. That might mean some manual copying and pasting from the PDF, or you could consider the various (and imperfect) online PDF to CSV conversion tools, or this python library.

If you don't have the bandwidth, consider asking for your researchers to contribute by making them a dataset editor.

CSV metadata format

The CSV should be formatted without a header, with each row corresponding to a variable, with column 1 as the name, 2 as the label, 3 as the description. If the variable doesn't have a label or description, leave these columns empty.

variable1_name,variable1_label,variable1_description
variable2_name,variable2_label,variable2_description

For example:

sex,patient sex,patient's recorded sex
id,patient identifier,unique patient identifier

JSON metadata format

When uploading a JSON file, specify the name, label, description, and valueLabels using the appropriately named attributes in the object corresponding to each variable. If the variable doesn't have a label, description, or value labels, you don't need to include these attributes.

For example:

// JSON format is an array of objects, with each object representing a variable
[
    {
        "name": "sex",
        "label": "patient sex",
        "description": "patient's recorded sex",
        "valueLabels": [
            {
                "value": 1,
                "label": "Male"
            },
            {
                "value": 2,
                "label": "Female"
            }
        ]
    }
]

To upload value labels in bulk, you must use the JSON format. We no longer support bulk upload of value labels via CSV.

Last updated