File and data structure
File and data structure specifications essential for the data analysis
The following should be read carefully. Not taking this advice into account may create substantial additional work in the planning of the data input.
Common analysis software is based on a uniform, rectangular data structure of the data that is to be processed. In this arrangement, the lines represent the cases (e.g., patients) and the columns represent the variables, so that a file per case contains only one line with the variables, such as id-number, age, blood pressure, etc. In this set-up, multiple measurements of a variable must be characterised over time (e.g., the developing of laboratory values) by several variables (e.g., BLOOD1, BLOOD2, etc.) and cannot be mistaken to be several values of one variable. Personal data categorically have to be made anonymous (no names or initials, date of birth always without indication of the day).
In order to process already-computed data with the software existing at the institute, certain conditions have to be met. The most common form of data management is in Excel files. A separate leaflet for this topic is available (see the PDF file excel2sas below). SAS files created in versions 6 and 8 can be input directly, and files from older versions must be exported/imported. SPSS files can be converted as long as the variable names are SAS permissible (see Excel data sheet). Unstructured ('flat") ASCII files (e.g., the formats .txt and .csv) require special precautionary measures concerning the separator and the coding of missing values. Their use should be limited to cases for which other ways of conversion do not exist. Under no circumstances should data be input in word processing software. Word files are completely useless, and small isolated Word tables can be used only restrictedly.