Feature Engineering

#aws #data

Apply your knowledge to the data and build models upon it.

Features

Not all features are relevant and all features can be harmful.

Imputation

Replace missing values with computed/inferred values.

  • Correlations between columns might be missed
  • Dropping missing data might bias the dataset

Try:

  • KNN
  • DL models (categorical labels)
  • Regression models
  • Get more data

Unbalanced Data

  • Oversampling
  • SMOTE

Outliers

The Kinesis Analytics from Kinesis can be used to identify outliers.

Binning

Bucketization: smooth out uncertainty in the measurements

Transformation

  • One-hot encoding
  • Scaling
  • Normalization
  • Shuffling