https://gexijin.github.io/learnR/index.html * Preface * 1 Step into R programming-the iris flower dataset + 1.1 Getting started + 1.2 Data frames have rows and columns: the Iris flower dataset + 1.3 Analyzing one set of numbers + 1.4 Analyzing a column of categorical values + 1.5 Analyzing the relationship between two sets of numbers + 1.6 Testing the differences between two groups + 1.7 Testing the difference among multiple groups (ANOVA) * 2 Visualizing the iris flower data set + 2.1 Basic concepts of R graphics + 2.2 The ggplot2 package makes plotting intuitive + 2.3 Scatter plots matrix + 2.4 Star and segment diagrams + 2.5 Parallel coordinate plot + 2.6 Bar plot with error bar + 2.7 Box plot and death to the dynamite plots + 2.8 Combining plots + 2.9 Hierarchical clustering and heat map + 2.10 Projecting high-dimensional data with principal component analysis (PCA) + 2.11 Classification: Predicting the odds of binary outcomes * 3 Data structures + 3.1 Basic concepts o 3.1.1 Expressions o 3.1.2 Logical Values o 3.1.3 Variables o 3.1.4 Functions o 3.1.5 Looking for help and example Code + 3.2 Data structures o 3.2.1 Vectors o 3.2.2 Matrices o 3.2.3 Data Frames o 3.2.4 Strings and string vectors o 3.2.5 Lists * 4 Importing data and managing files + 4.1 Enter data manually + 4.2 Project-oriented workflow o 4.2.1 Create a project in a new folder o 4.2.2 Create a script file and comment (!) o 4.2.3 Copy data files to the new directory o 4.2.4 Import data files o 4.2.5 Check and convert data types o 4.2.6 Close a project when you are done + 4.3 Reading files directly using read.table + 4.4 General procedure to read data into R: + 4.5 Data manipulation in a data frame + 4.6 Data transformation using the dplyr * 5 The heart attack data set (I) + 5.1 Begin your analysis by examining each column separately + 5.2 Possible correlation between two numeric columns? + 5.3 Associations between categorical variables? + 5.4 Associations between a categorical and a numeric variables? + 5.5 Associations between multiple columns? * 6 The heart attack dataset (II) + 6.1 Scatter plot in ggplot2 + 6.2 Histograms and density plots + 6.3 Box plots and Violin plots + 6.4 Bar plot with error bars + 6.5 Statistical models are easy; interpretations and verifications are not! * 7 Advanced topics + 7.1 Introduction to R Markdown + 7.2 Tidyverse + 7.3 Interactive plots made easy with Plotly + 7.4 Shiny Apps o 7.4.1 Install the Shiny package by typing this in the console. o 7.4.2 Create a Shiny web app is a piece of cake o 7.4.3 Let's play! o 7.4.4 Pubish your app + 7.5 Define your own function * 8 The state dataset + 8.1 Reading in and manipulating data + 8.2 Visualizing data + 8.3 Analyzing the relationship among variables + 8.4 The whole picture of the data set + 8.5 Linear model analysis + 8.6 Conclusion * 9 The game sales dataset + 9.1 Visualization of categorical variables + 9.2 Correlation among numeric variables + 9.3 Analysis of score and count + 9.4 Analysis of sales o 9.4.1 By Year.Release o 9.4.2 By Region o 9.4.3 By Rating o 9.4.4 By Genre o 9.4.5 by Score o 9.4.6 By Rating & Genre & Critic score o 9.4.7 By Platform + 9.5 Effect of platform type to priciple components + 9.6 Models for global sales + 9.7 Conclusion * 10 The messy salary data + 10.1 Read in and clean data + 10.2 Information about the variables o 10.2.1 Income and age o 10.2.2 Ethnic group o 10.2.3 Sex o 10.2.4 Union descreption + 10.3 Analysis of gross income o 10.3.1 Age o 10.3.2 Sex o 10.3.3 Ethnic Group + 10.4 Analysis of gross income type o 10.4.1 Gross Type o 10.4.2 Ethnic and Sex and age + 10.5 LM and ANOVA analysis + 10.6 Conclusion Learn R through examples Learn R through examples Xijin Ge, Jianli Qi and Rong Fan 2020-05-26 Preface Aimed for total beginners, this book is written based on the philosophy that people learn faster when they are shown examples and case studies. Instead of explaining the rules, the book largely centers on the analysis of several datasets from the very beginning. So this is an alternative to traditional, more rigorous textbooks on R programming. We start with small and clean datasets and gradually transition into big, messy ones. With each dataset, we hope to tell a story through the analysis. We invite you, our courageous reader, to take on this journey with us. Motivated readers, such as biologists, could easily work their way through this book and learn by themselves. I would encourage you to type in the example code and see the outputs. And then work on the challenges and exercises. It originally started as materials for 2-hour hands-on workshops intended to give a quick introduction/demonstration for students and researchers who are totally new to R. The workshop has been given many times to different audiences ranging from high-school students to mathematicians. For a 2-hour session, I have to keep it gentle, interactive, and fun, sometimes at the expense of rigor. Instead of explaining all the rules, grammar, and syntax, I found it is easier to focus on one dataset and walk them through some of the analyses possible with R. This material later evolved into as a one-credit online class and then a three credit class. We stick with the unconventional approach of focusing on datasets and examples. Many students have contributed to this material. Notably, Quazi Irfan who worked as teaching assistant, fixed many errors and gave constructive feedback. In the fall of 2018, a group of highly motivated students in the STAT 442 Exploratory Data Analysis worked on some of the datasets presented here. They are Samuel Ivanecky, Kory Heier, Audrey Bunge, Jacie McDonald, Shae Olson, Nathan Thirsten, and Alex Wieseler. Some of the plots in this book are inspired by them. Any comments and suggestions to make this book better would be welcome. This includes typos, errors, and organizational issues. The best place to reach out is through the GitHub issues page. If you do not like to create yet another account, you can email us Xijin.Ge@sdstate.edu. Chapter 1 Step into R programming Chapter 2 Visualizing data set Chapter 3 Data structures Chapter 4 Data importing Chapter 5 Heart attack data set I Chapter 6 Heart attack data set II Chapter 7 Advanced topics Chapter 8 State data set Chapter 9 Game sale data set Chapter 10 Employee salary data set