Table of Contents
There are many methods to import data into R. However, importing data into a matrix or data frame is only a mere step into the preparation. Within this R tutorial, we will create a data.frame instead of importing the data.
Many organizations perform employee yearly performance ratings within a few weeks into the new year and based on the employee ratings, employees may be able to be put up for promotion if they hit a certain rank.
Employee Promotion Rankings
- Rank range between 19-25 are eligible for promotion
- Rank range between 14-18 are not eligible for promotion
- Rank range lower than or equal to 13 must perform a performance probationary period.
Employee Performance Rankings
- Rating of 1 – Needs Improvement
- Rating of 2 – Not Meeting Expectations
- Rating of 3 – Meeting Expectations
- Rating of 4 – Exceeding Expectations
- Rating of 5 – Strongly Exceeding Expectations
Below is a table with the ratings for each of the five employees.
Employee ID Date Gender Age R1 R2 R3 R4 R5 A12OI 1/22/2018 F 55 5 4 5 4 3 C90R2 1/23/2018 M 37 5 3 4 3 5 LOI98 1/24/2018 F 23 3 5 4 3 3 M908Y 1/25/2018 M 43 4 3 5 3 5 J908R 1/25/2018 F 33 2 2 3 2 3 BNL98 1/22/2018 F 42 5 3 3 4 5 EW09P 1/23/2018 M 58 2 3 3 2 2 QA214 1/24/2018 F 31 4 4 5 3 1 JU87Y 1/25/2018 M 22 3 4 4 4 3 CO43R 1/26/2018 M 38 5 4 3 5 4
As you can see, each of the employees is rated by their boss in five separate areas of work to distinguish the task rating for each employee.
With the given above data, we can create the employee data frame. We will complete these by using the c() function which returns a vector (a one-dimensional array).
Input:
empid <- c("A12OI", "C90R2", "LOI98", "M908Y", "J908R", "BNL98", "EW09P", "QA214", "JU87Y", "CO43R") date <- c("1/22/2018", "1/23/2018", "1/24/2018", "1/25/2018", "1/26/2018", "1/22/2018", "1/23/2018", "1/24/2018", "1/25/2018", "1/26/2018") gender <- c("F", "M", "F", "M", "F", "F", "M", "F", "M", "M") age <- c(54, 37, 23, 43, 33, 42, 58, 31, 22, 38) r1 <- c(5, 5, 3, 4, 2, 5, 2, 4, 3, 5) r2 <- c(4, 3, 5, 3, 2, 3, 3, 4, 4, 4) r3 <- c(5, 4, 4, 5, 3, 3, 3, 5, 4, 3) r4 <- c(4, 3, 3, 3, 2, 4, 2, 3, 4, 5) r5 <- c(3, 5, 3, 5, 3, 5, 2, 1, 3, 4)
Now that the vectors are created, let’s move onto creating the data.frame with the data.
Input:
empratings <- data.frame(empid, date, gender, age, r1, r2, r3, r4, r5, stringsAsFactors = FALSE)
Creating the data.matrix is very simple as we only added the vectors that were created for each employee and will return a matrix.
Input:
empratings
Output:
empid date gender age r1 r2 r3 r4 r5 1 A12OI 1/22/2018 F 54 5 4 5 4 3 2 C90R2 1/23/2018 M 37 5 3 4 3 5 3 LOI98 1/24/2018 F 23 3 5 4 3 3 4 M908Y 1/25/2018 M 43 4 3 5 3 5 5 J908R 1/26/2018 F 33 2 2 3 2 3 6 BNL98 1/22/2018 F 42 5 3 3 4 5 7 EW09P 1/23/2018 M 58 2 3 3 2 2 8 QA214 1/24/2018 F 31 4 4 5 3 1 9 JU87Y 1/25/2018 M 22 3 4 4 4 3 10 CO43R 1/26/2018 F 38 5 4 3 5 4
Before we move to the next step, it’s a good idea to be familiar with the arithmetic and logical operators in R.
Arithmetic Operators
Operator Description Example - Subtraction 5 - 1 = 4 + Addition 5 + 1 = 6 * Multiplication 5 * 3 = 15 / Division 10 / 2 = 5 ^ or ** Exponentiation 2*2*2*2*2 as 2 to the power of 5 x%%y Modulus 5%%2 is 1 x%/%y Integer Division 5%/%2 is 2
Logical Operators
Operator Description Example < less than 5 < 10 <= less than or equal to <= 5 > greater than 10 > 5 >= greater than or equal to >= 10 == exactly equal to == 10 != not equal to != 5 !x not x x <- c(5), !x x | y x or y x <- c(5), y <- c(10), x | y x & y x and y x <- c(5), y <- c(10), x & y isTRUE(x) tests whether x is true x <- TRUE, isTRUE(x) [1] FALSE
Learning the above arithmetic and logical operators will help a great deal in solving the next few tasks for this R tutorial.
With the above employee data and operators, we can now create an additional column in the empratings data.frame to sum the five employees ratings.
Input:
empratings$total <- (r1 + r2 + r3 + r4 + r5) empratings
As you will see below, the total column is added to the end of the total five rankings.
Output:
empid date gender age r1 r2 r3 r4 r5 total 1 A12OI 1/22/2018 F 54 5 4 5 4 3 21 2 C90R2 1/23/2018 M 37 5 3 4 3 5 20 3 LOI98 1/24/2018 F 23 3 5 4 3 3 18 4 M908Y 1/25/2018 M 43 4 3 5 3 5 20 5 J908R 1/26/2018 F 33 2 2 3 2 3 12 6 BNL98 1/22/2016 F 42 5 3 3 4 5 20 7 EW09P 1/23/2018 M 58 2 3 3 2 2 12 8 QA214 1/24/2018 F 31 4 4 5 3 1 17 9 JU87Y 1/25/2018 M 22 3 4 4 4 3 18 10 CO43R 1/26/2018 F 38 5 4 3 5 4 21
Before moving forward, let’s find the average rank of each individual’s ranks. We can accomplish this by taking the total of the ranks and dividing by the count of ranks.
Input:
empratings$average <- (empratings$total /5) empratings
Output:
empid date gender age r1 r2 r3 r4 r5 total average 1 A12OI 1/22/2018 F 54 5 4 5 4 3 21 4.2 2 C90R2 1/23/2018 M 37 5 3 4 3 5 20 4.0 3 LOI98 1/24/2018 F 23 3 5 4 3 3 18 3.6 4 M908Y 1/25/2018 M 43 4 3 5 3 5 20 4.0 5 J908R 1/26/2018 F 33 2 2 3 2 3 12 2.4 6 BNL98 1/22/2016 F 42 5 3 3 4 5 20 4.0 7 EW09P 1/23/2018 M 58 2 3 3 2 2 12 2.4 8 QA214 1/24/2018 F 31 4 4 5 3 1 17 3.4 9 JU87Y 1/25/2018 M 22 3 4 4 4 3 18 3.6 10 CO43R 1/26/2018 F 38 5 4 3 5 4 21 4.2
As you can see above, the mean column is now added to the empratings data matrix for each employee. Could this have any impact on the employee promotion? Promotions will most likely l be based on the total ranks and not the employee ranking average.
Now that we have the total of the rankings for each employee, let’s create an additional variable to categorize the three performance rankings.
Input:
empratings$performance[empratings$total <= 25 & empratings$total >= 19] <- "Promotion Eligible" empratings$performance[empratings$total <= 18 & empratings$total >= 14] <- "Not Promotion Eligible" empratings$performance[empratings$total <= 13] <- "Performance Probation" empratings
Output:
empid date gender age r1 r2 r3 r4 r5 total average performance 1 A12OI 1/22/2018 F 54 5 4 5 4 3 21 4.2 Promotion Eligible 2 C90R2 1/23/2018 M 37 5 3 4 3 5 20 4.0 Promotion Eligible 3 LOI98 1/24/2018 F 23 3 5 4 3 3 18 3.6 Not Promotion Eligible 4 M908Y 1/25/2018 M 43 4 3 5 3 5 20 4.0 Promotion Eligible 5 J908R 1/26/2018 F 33 2 2 3 2 3 12 2.4 Performance Probation 6 BNL98 1/22/2016 F 42 5 3 3 4 5 20 4.0 Promotion Eligible 7 EW09P 1/23/2018 M 58 2 3 3 2 2 12 2.4 Performance Probation 8 QA214 1/24/2018 F 31 4 4 5 3 1 17 3.4 Not Promotion Eligible 9 JU87Y 1/25/2018 M 22 3 4 4 4 3 18 3.6 Not Promotion Eligible 10 CO43R 1/26/2018 F 38 5 4 3 5 4 21 4.2 Promotion Eligible
Now with the additional performance column, we can select observations by only pulling the employees that are Promotion Eligible, Not Promotion Eligible and Performance Probation.
Below are a few data.matrix created for selection to group each level() that we created:
Promotion Eligible Employees
Input:
promotionEligible <- empratings[empratings$performance=="Promotion Eligible",] promotionEligible
Output:
empid date gender age r1 r2 r3 r4 r5 total average performance 1 A12OI 1/22/2018 F 54 5 4 5 4 3 21 4.2 Promotion Eligible 2 C90R2 1/23/2018 M 37 5 3 4 3 5 20 4.0 Promotion Eligible 4 M908Y 1/25/2018 M 43 4 3 5 3 5 20 4.0 Promotion Eligible 6 BNL98 1/22/2016 F 42 5 3 3 4 5 20 4.0 Promotion Eligible 10 CO43R 1/26/2018 F 38 5 4 3 5 4 21 4.2 Promotion Eligible
Not Promotion Eligible Employees
Input:
notPromotionEligible <- empratings[empratings$performance=="Not Promotion Eligible",] notPromotionEligible
Output:
empid date gender age r1 r2 r3 r4 r5 total average performance 3 LOI98 1/24/2018 F 23 3 5 4 3 3 18 3.6 Not Promotion Eligible 8 QA214 1/24/2018 F 31 4 4 5 3 1 17 3.4 Not Promotion Eligible 9 JU87Y 1/25/2018 M 22 3 4 4 4 3 18 3.6 Not Promotion Eligible
Performance Probation Employees
Input:
performanceProbation <- empratings[empratings$performance=="Performance Probation",] performanceProbation
Output:
empid date gender age r1 r2 r3 r4 r5 total average performance 5 J908R 1/26/2018 F 33 2 2 3 2 3 12 2.4 Performance Probation 7 EW09P 1/23/2018 M 58 2 3 3 2 2 12 2.4 Performance Probation
Now with the given data above, a manager will be able to meet with each employee and discuss what actions need to be taken to increase employee performance throughout the new year.
With the newly added data for each employee, the manager could possibly want only data that’s necessary for the promotion. With this information given, the only data that would technically be needed is empid, date(ensure it’s the correct year), total and performance. We will be able to pull this data by using a subset().
Promotion Eligible subset()
Input:
promotionEligible <- subset(empratings,performance=="Promotion Eligible", select=c(empid,date,total,performance)) promotionEligible
Output:
empid date total performance 1 A12OI 1/22/2018 21 Promotion Eligible 2 C90R2 1/23/2018 20 Promotion Eligible 4 M908Y 1/25/2018 20 Promotion Eligible 6 BNL98 1/22/2016 20 Promotion Eligible 10 CO43R 1/26/2018 21 Promotion Eligible
Subsets are a great function to only pull data that’s necessary and exclude all filler data that has no means for an outcome.