Introducing xda: R package for exploratory data analysis

Posted on June 17, 2016August 10, 2016 by ujjwalkarn

This R package contains several tools to perform initial exploratory analysis on any input dataset. It includes custom functions for plotting the data as well as performing different kinds of analyses such as univariate, bivariate and multivariate investigation which is the first step of any predictive modeling pipeline. This package can be used to get a good sense of any dataset before jumping on to building predictive models. You can install the package from GitHub.

The functions currently included in the package are mentioned below:

numSummary(mydata) function automatically detects all numeric columns in the dataframe mydata and provides their summary statistics
charSummary(mydata) function automatically detects all character columns in the dataframe mydata and provides their summary statistics
Plot(mydata, dep.var) plots all independent variables in the dataframe mydata against the dependant variable specified by the dep.var parameter
removeSpecial(mydata, vec) replaces all special characters (specified by vector vec) in the dataframe mydata with NA
bivariate(mydata, dep.var, indep.var) performs bivariate analysis between dependent variable dep.var and independent variable indep.var in the dataframe mydata

Installation

There are 2 ways of installing xda:

Using devtools:

The devtools package needs to be installed first. To install devtools, please follow instructions here. Then, use the following commands to install `xda`:

library(devtools)
install_github(ujjwalkarn/xda)

Alternatively, you can also try the following to install xda:


install.packages(githubinstall)
library(githubinstall)
githubinstall(xda)

Usage

Update: See usage instructions and latest updates to the package here. The package is constantly under development and more functionalities will be added soon. Will also add this to CRAN in the coming days. Pull requests to add more functions are welcome!

15 thoughts on “Introducing xda: R package for exploratory data analysis”

Pingback: Introducing xda: R package for exploratory data analysis – Mubashir Qasim
Douglas Skinner says:

June 17, 2016 at 8:06 pm

I’m a little wary about installing packages from GitHub. Partly because I don’t know much about it. Have only used packages from CRAN. Any thoughts?

LikeLike

Reply
1. ujjwalkarn says:
  
  June 17, 2016 at 8:38 pm
  
  Currently this package is only available on GitHub. I have updated the blog above to include another way of installing this. Do try and let me know! You can read more about installing packages from GitHub here: http://goo.gl/eBJegn.
  
  LikeLike
  
  Reply
Pingback: Introducing xda: R package for exploratory data analysis | 神刀安全网
ok stupid (@ohkay_stupid) says:

June 21, 2016 at 1:12 am

Great package! I’ve been trying to write my own version of numSummary but never got it working as well as I’d like. Kudos for this one!

LikeLike

Reply
RAY says:

June 21, 2016 at 3:50 am

Hi again Ujjwal!

Just d/l the latest version to my Rstudio. (still the same version 0.1 as before?)

xda working great,
but I tried your example (testing for “missing values” or NAs?):

iris9 <- iris;
iris9[1,2]<-"?"
iris9[2,2]<-"@"
iris9[3,2]<-"???"
iris9<-removeSpecial(iris9,c("@","???"))
head(iris9)

It returns:
Sepal.Length Sepal.Width Petal.Length Petal.Width
1 5.1 1.4 0.2
2 4.9 1.4 0.2
3 4.7 1.3 0.2
4 4.6 3.1 1.5 0.2

Ok!
But when I run:
numSummary(iris9)

the “miss” and “miss%” columns
are still zero…0!

Shouldn’t these 2 column values
be different from zero? (we have NAs now in 3 rows!).

What is a “missing value”,
can you give a simple complete example of numSummary()
where the
“miss” and “miss%” columns are not zero?

Thanks again, Ujjwal!
Hope you can answer my question.

RAY
SF

LikeLike

Reply
1. ujjwalkarn says:
  
  June 22, 2016 at 10:11 am
  
  Seems like a bug, will fix it as soon as possible. Thanks again for flagging!
  
  LikeLike
  
  Reply
Roja Priya says:

September 9, 2018 at 9:48 pm

Hi, Thanks a lot for your explanation which is really nice. I have read all your posts here. It is amazing!!! You have been helping many application.
Thanks for your blogs that are very helpful to learn the things .
data science course in chennai

LikeLike

Reply
360digitmg1 says:

February 24, 2020 at 1:38 pm

Great post i must say and thanks for the information. Education is definitely a sticky subject. However, is still among the leading topics of our time. I appreciate your post and look forward to more.360DigiTMG data analytics course malaysia
360DigiTMG data science course
360DigiTMG tableau course
360DigiTMG

LikeLike

Reply
course training says:

August 2, 2021 at 3:43 pm

Just saying thanks will not just be sufficient, for the fantastic lucidity in your writing. I will instantly grab your feed to stay informed of any updates.
data scientist course

LikeLike

Reply
Dettifoss IT Solutions says:

November 8, 2021 at 3:17 pm

really good post, i certainly love this site, keep on it.
servicenow training in hyderabad

LikeLike

Reply
digitmgbangalore says:

March 30, 2022 at 10:41 pm

Informative Post. The information you have posted is very useful and sites you have referred was good. Thanks for sharing.
Data Science Course with Placement

LikeLike

Reply
digitmgbangalore says:

April 24, 2022 at 11:23 pm

Really this article is truly one of the best in article history and am a collector of old “items” and sometimes read new items if i find them interesting which is one that I found quite fascinating and should be part of my collection. Very good work!
Data Scientist Course in Gurgaon

LikeLike

Reply
digitmgbangalore says:

May 10, 2022 at 2:02 am

Nice Post i have read this article and if I can I would like to suggest some cool tips or advice and perhaps you could write future articles that reference this article. I want to know more!
Data Analytics Course in Gurgaon

LikeLike

Reply
deekshitha260 says:

June 5, 2023 at 1:14 am

Thanks for this intelligent post on data science. The statistics and data presented here are accurate, which is always a pleasure to read. This post definitely has something in it for data science newbies and experts, which, in itself is a difficult feat to achieve. I would strongly recommend this post to anybody who wants to know something about data science and its current trends.power bi course malaysia

LikeLike

Reply