If you want us to analyze a dataset from your own database, the following should be read carefully. Not taking this advice into account may create substantial additional preparatory work.
Statistical analysis software (Stata, R, SAS, SPSS, etc.) is based on a uniform, rectangular data structure: the lines represent the cases (e.g., patients) and the columns represent the variables e.g. identification number, sex, age, hemoglobin level. Such a file contains only one line per case (wide format). In this format, multiple measurements of a variable over time (e.g. the developing of laboratory values) must be characterized by several variables (e.g., BLOOD1, BLOOD2, etc.).
In order to process data with the software used by our Statistics division, certain conditions have to be met:
- Avoid data collection in Excel because there is no audit trail nor access control as required by the human research act (HFG). Rather, use a proper database like REDCap. If nevertheless done, the guidance below should be followed.
- Stata or R data files can be input directly. A labelled dataset is preferred. If not available, a data dictionary with explanation of the dataset is required.
- ASCII files e.g. .txt- and .csv-files require special precautionary measures concerning the separator and the coding of missing values. Their use should be limited to cases for which other ways of conversion do not exist.
- Data-files from other statistical software such as SAS or SPSS may also be possible but need to be checked carefully beforehand.
- Under no circumstances should data be input in word processing software.