Course category: Diploma Courses
Course Duration: 4
OBJECTIVE: -
This course will teach how to program in R and how to use R for effective data analysis. The students will learn how to install and configure R necessary for an analytics programming environment and gain basic analytic skills via this high-level analytical language. The course covers fundamental knowledge in R programming. Popular R packages for data science will be introduced as working examples.
COURSE DURATION:-
Course Details
Overview operator’s data structurer Work space, Installation of R studio, comparison of R vs Python,
Know to things before start learning R, introduction to descriptive and inferential statistics, Basics skills
required for R. The real life example for R usage. The Application of R in commercial industry. Social
media platform and R.
Tables, charts and plots. Visualizing Measures of Central Tendency, Variation, and Shape. Box plots,
Pareto diagrams. How to find the mean, median standard deviation and quantiles of a set of
observations? Students may experiment with real as well as artificial data sets. Data uploading
shortcuts. Data Preparation, Data filtering and cleaning. Making data frames and other subsets and renaming. Package installation. Usage of RCmdr, Histograms/Q-Q plot (Different colour applications.
Eg. Topocolor, heatweave etc).
Set operations, simulation of various properties. Bays’ rule. Generate and Visualize Discrete and
continuous distributions using the statistical environment. Demonstration of CDF and PDF uniform
and normal, binomial Poisson distributions. Students are expected to generate artificial data using the
chosen statistical environment and explore various distribution and its properties. Various parameter
changes may be studied. Study of binomial distribution. Plots of density and distribution functions.
Normal approximation to the Binomial distribution. Central limit theorem. How to generate random
numbers. Study how to select a random sample with replacement from normal and uniform
distribution. Students can use the built in functions to explore random sample selection. How to
calculate the correlation between two variables? How to make scatter plots? Use the scatterplot to
investigate the relationship between two variables. How to calculate and plot the residual.
How to compute confidence intervals for the mean when the standard deviation is known. How to
perform tests of hypotheses about the mean when the variance is known. How to compute the p-
value? Explore the connection between the critical region, the test statistic, and the p-value.
Normality testing as Serial 4. Parametric and non-parametric tests. K-S test.
How to perform a significance test for testing the mean of a population with unknown standard?
deviation. Compare populations means from two Normal distributions with unknown variance Tests
of Hypotheses for One Proportion, Tests of Hypotheses for Comparing Two Proportions. T-test, Z-test,
anova(Parametric), Wilcoxon, Signed rank test, Kruskal wallis, Mann-whitney test.
Introduction to data science and data mining, Statistical learning vs Machine learning, Big data
predictive analysis, Regression, Classification, clustering case studies: Data analysis, Mining stream
data, Social Network. Ggplot 2 Graphics.