Introducing xda: R package for exploratory data analysis

This R package contains several tools to perform initial exploratory analysis on any input dataset. It includes custom functions for plotting the data as well as performing different kinds of analyses such as univariate, bivariate and multivariate investigation which is the first step of any predictive modeling pipeline. This package can be used to get a good sense of any dataset before jumping on to building predictive models. You can install the package from GitHub.

The functions currently included in the package are mentioned below:

  • numSummary(mydata) function automatically detects all numeric columns in the dataframe mydata and provides their summary statistics
  • charSummary(mydata) function automatically detects all character columns in the dataframe mydata and provides their summary statistics
  • Plot(mydata, dep.var) plots all independent variables in the dataframe mydata against the dependant variable specified by the dep.var parameter
  • removeSpecial(mydata, vec) replaces all special characters (specified by vector vec) in the dataframe mydata with NA
  • bivariate(mydata, dep.var, indep.var) performs bivariate analysis between dependent variable dep.var and independent variable indep.var in the dataframe mydata

Installation

There are 2 ways of installing xda:

  • Using devtools: 

The devtools package needs to be installed first. To install devtools, please follow instructions here. Then, use the following commands to install `xda`:

library(devtools)
install_github(ujjwalkarn/xda)
  • Alternatively, you can also try the following to install xda:

install.packages(githubinstall)
library(githubinstall)
githubinstall(xda)

Read more about githubinstall here.

Usage

Update: See usage instructions and latest updates to the package here. The package is constantly under development and more functionalities will be added soon. Will also add this to CRAN in the coming days. Pull requests to add more functions are welcome!


15 thoughts on “Introducing xda: R package for exploratory data analysis

  1. Hi again Ujjwal!

    Just d/l the latest version to my Rstudio. (still the same version 0.1 as before?)

    xda working great,
    but I tried your example (testing for “missing values” or NAs?):

    iris9 <- iris;
    iris9[1,2]<-"?"
    iris9[2,2]<-"@"
    iris9[3,2]<-"???"
    iris9<-removeSpecial(iris9,c("@","???"))
    head(iris9)

    It returns:
    Sepal.Length Sepal.Width Petal.Length Petal.Width
    1 5.1 1.4 0.2
    2 4.9 1.4 0.2
    3 4.7 1.3 0.2
    4 4.6 3.1 1.5 0.2

    Ok!
    But when I run:
    numSummary(iris9)

    the “miss” and “miss%” columns
    are still zero…0!

    Shouldn’t these 2 column values
    be different from zero? (we have NAs now in 3 rows!).

    What is a “missing value”,
    can you give a simple complete example of numSummary()
    where the
    “miss” and “miss%” columns are not zero?

    Thanks again, Ujjwal!
    Hope you can answer my question.

    RAY
    SF

    Like

  2. Really this article is truly one of the best in article history and am a collector of old “items” and sometimes read new items if i find them interesting which is one that I found quite fascinating and should be part of my collection. Very good work!
    Data Scientist Course in Gurgaon

    Like

  3. Thanks for this intelligent post on data science. The statistics and data presented here are accurate, which is always a pleasure to read. This post definitely has something in it for data science newbies and experts, which, in itself is a difficult feat to achieve. I would strongly recommend this post to anybody who wants to know something about data science and its current trends.power bi course malaysia

    Like

Leave a reply to ujjwalkarn Cancel reply