Chapter 18

Quantitative Data Management

Data Management

You should review data as soon as it is collected in order to check for:

Consent forms are complete and signed

No duplicate identification numbers were given

No data is missing

Scoring was done correctly

Handwriting is legible

2

Managing the Data

In order to more efficiently manage the data consider:

Setting up a tracking system

Keeping data secure

Developing a filing system for your data

Ensuring each file is complete

Coding items as needed

3

Selecting the Software for the Database

When thinking about what software you need consider:

Is your data quantitative, qualitative, or both?

How will you analyze the data?

Will the data analysis be simple or complex?

How do you want to present your results?

4

Selecting the Software for the Database

When thinking about what software you need consider:

Is there support for the software?

What is the cost of the software?

Is the software compatible with your system?

5

Database Creation

Pilot test the software if never used before

Develop a codebook

Create the database

Input the data

6

Considerations in Building a Database

Think ahead to how you will use the data

Provide meaningful names for variables

Put accuracy checks in place

Test the database with preliminary analysis

7

Chapter 19

Basic Quantitative Data Analysis

Data Cleaning

Check for odd symbols, truncated or overlong times

Recheck scoring

Recheck coding categories

Compare one variable value with value in second variable

Look for outliers

2

Reasons for Missing Data

Participant skipped item or questionnaire, purposely or inadvertently

Participant withdrew, became ill, or died

Had to omit all or part of the data collection

Poor directions or poorly worded question

Data missed during data entry

3

Categorizing Missing Data

Missing completely at random (MCAR)

Missing at random (MAR)

Missing not at random (MNAR)

4

Replacing Missing Data

Complete case analysis is when you drop any participant from the analysis when they have missing data

If a lot of participants are missing data it may negatively impact the results

5

Replacing Missing Data

Principles in handling missing data are:

Some missing data cannot be replaced

Imputation uses existing information to estimate the missing values

The easiest approach is to replace missing data with the group’s mean (average) on the item

6

Replacing Missing Data

Principles in handling missing data are:

A more justifiable approach is to use the average of the individual participant’s scores or ratings on the remaining items of a multi-item scale

Missing values may be estimated from values at previous time points

7

Replacing Missing Data

Principles in handling missing data are:

Incomplete cases (participants) may be deleted and the analysis may be done on those who completed the study

A regression imputation may be done to estimate the values of the missing data

8

Replacing Missing Data

Principles in handling missing data are:

Expectation maximization uses a series of iterations to reach convergence

Multiple imputation contrasts and combines replacement values to find the best estimates

9

Visual Representations

Stem and leaf illustrates distribution of values

Box plots illustrate distribution of values

Bar and pie charts demonstrate differences between groups and subgroups

Plots can show relationships between interval level variables

10

Basic Descriptive Statistics

Normal distribution is represented by a symmetrical bell-shaped curve

Positive skew has more cases at low end of values

Negative skew has more cases at high end of values

11

Basic Descriptive Statistics

Mode is the value that occurs most often

Median is the middle score in the distribution

Mean is the average of all scores

12

Basic Descriptive Statistics

Range is the distance between the highest and lowest scores

The range or distance between these endpoints can be divided into various portions

13

Basic Descriptive Statistics

Variance is the average of the squared deviations from the mean

Standard deviation is the square root of the variance

14

Bivariate Association

Bivariate refers to relationships between a set of variables

Pearson product moment correlation coefficient represented as r is the most commonly used bivariate measure of association

15

Bivariate Association

A correlation matrix can be used to analyze multiple variables at one time to see the differences in the strengths of relationships between various pairs of variables

You may also calculate the coefficient of determination (R2)

16

Additional Measures of Association

Spearman rank-order correlation is used for ordinal data

The chi-square statistic is used for nominal data

17

Additional Measures of Association

Fisher’s Exact Test is used if there are less than five cases per cell

The Mann-Whitney U Test is used instead of chi-square if the data are ordinal

18