Implementation of the BlockingQueue Interface That Works Properly When Accessed via Multiple Threads

A BlockingQueue is typically used to have on thread produce objects, which another thread consumes. Here is a diagram that illustrates this principle:

blocking-queue
A BlockingQueue with one thread putting into it, and another thread taking from it.

The producing thread will keep producing new objects and insert them into the queue, until the queue reaches some upper bound on what it can contain. It’s limit, in other words. If the blocking queue reaches its upper limit, the producing thread is blocked while trying to insert the new object. It remains blocked until a consuming thread takes an object out of the queue.

The consuming thread keeps taking objects out of the blocking queue, and processes them. If the consuming thread tries to take an object out of an empty queue, the consuming thread is blocked until a producing thread puts an object into the queue.

BlockingQueue methods come in four forms, with different ways of handling operations that cannot be satisfied immediately, but may be satisfied at some point in the future: one throws an exception, the second returns a special value (either null or false, depending on the operation), the third blocks the current thread indefinitely until the operation can succeed, and the fourth blocks for only a given maximum time limit before giving up. These methods are summarized in the following table:

Throws exception Special value Blocks Times out
Insert add(e) offer(e) put(e) offer(e, time, unit)
Remove remove() poll() take() poll(time, unit)
Examine element() peek() not applicable not applicable

The code below defines an implementation of the BlockingQueue interface that works properly when accessed via multiple threads since it’s synchronized properly

class SimpleBlockingQueue<E> implements BlockingQueue<E> {
    /**
     * The queue consists of a List of E's.
     */
    final private List<E> mList;

Add a new E to the end of the queue, blocking if necessary for space to become available

 public void put(E e) throws InterruptedException {
        synchronized(this) {
            if (e == null)
                throw new NullPointerException();

            // Wait until the queue is not full.
            while (isFull()) {
                // System.out.println("BLOCKING ON PUT()");
                wait();
            }

            // Add e to the ArrayList.
            mList.add(e);
            
            // Notify that the queue may have changed state, e.g., "no
            // longer empty".
            notifyAll();
        }
    }

Remove the E at the front of the queue, blocking until there’s something in the queue

public E take() throws InterruptedException {
        synchronized(this) {
            // Wait until the queue is not empty.
            while (mList.isEmpty()) {
                // System.out.println("BLOCKING ON TAKE()");
                wait();
            }

            final E e = mList.remove(0);
        
            // Notify that the queue may have changed state, e.g., "no
            // longer full".
            notifyAll();
            return e;
        }
    }

 

Using Java Thread.join() as a simple barrier synchronizer [Android]

Starts a Thread for each element in the List of input Strings and uses Thread.join() to wait for all the Threads to finish. This implementation doesn’t require any Java synchronization mechanisms other than what’s provided by Thread

 private volatile List<String> mInput = null;

/**
* The array of words to find.
*/
final String[] mWordsToFind;

/**
* The List of worker Threads that were created.
*/
private List mWorkerThreads;

Launch process for each and every string

// This List holds Threads so they can be joined when their processing is done.
mWorkerThreads = new LinkedList();

// Create and start a Thread for each element in the
// mInput.
for (int i = 0; i < mInput.size(); ++i) {
    // Each Thread performs the processing designated by
    // the processInput() method of the worker Runnable.
    Thread t = new Thread(makeTask(i));

    // Add to the List of Threads to join.
    mWorkerThreads.add(t);

    // Start the Thread to process its input in the background.
    t.start();
}

Barrier synchronization: using thread.join() to wait for the completion of all the other threads

// Barrier synchronization.
for (Thread thread : mWorkerThreads)
   try 
   {
      thread.join();
   } 
      catch (InterruptedException e) 
   {
      printDebugging("join() interrupted");
   }

If t is a Thread object whose thread is currently executing,

t.join();

causes the current thread to pause execution until t‘s thread terminates.

Overloads of join allow the programmer to specify a waiting period.

Like sleep, join responds to an interrupt by exiting with an InterruptedException.

Based on https://github.com/douglascraigschmidt/LiveLessons/blob/master/ThreadJoinTest/src/ThreadJoinTest.java

Using The Device’s Accelerometer And Magnometer To Orient A Compass [Android]

Initialization

private String TAG = "SensorCompass";

// Sensors & SensorManager
private Sensor accelerometer;
private Sensor magnetometer;
private SensorManager mSensorManager;

// Storage for Sensor readings
private float[] mGravity = null;
private float[] mGeomagnetic = null;

// Rotation around the Z axis
private double mRotationInDegress;

OnCreate

protected void onCreate(Bundle savedInstanceState) {

		// ...

		// Get a reference to the SensorManager
		mSensorManager = (SensorManager) getSystemService(SENSOR_SERVICE);

		// Get a reference to the accelerometer
		accelerometer = mSensorManager
				.getDefaultSensor(Sensor.TYPE_ACCELEROMETER);

		// Get a reference to the magnetometer
		magnetometer = mSensorManager
				.getDefaultSensor(Sensor.TYPE_MAGNETIC_FIELD);

		// Exit unless both sensors are available
		if (null == accelerometer || null == magnetometer)
			finish();

	}

Register this class as a listener for the accelerometer events and for magnetometer events:

@Override
	protected void onResume() {
		super.onResume();


		// Register for sensor updates

		mSensorManager.registerListener(this, accelerometer,
				SensorManager.SENSOR_DELAY_NORMAL);

		mSensorManager.registerListener(this, magnetometer,
				SensorManager.SENSOR_DELAY_NORMAL);
	}

The onPause method unregister this class as a listener for all sesnors

@Override
protected void onPause() {
	super.onPause();

	// Unregister all sensors
	mSensorManager.unregisterListener(this);
}

The onSensorChanged method process the incoming sensors’ events

@Override
public void onSensorChanged(SensorEvent event) {

	// Acquire accelerometer event data
	
	if (event.sensor.getType() == Sensor.TYPE_ACCELEROMETER) {

		mGravity = new float[3];
		System.arraycopy(event.values, 0, mGravity, 0, 3);

	} 
	
	// Acquire magnetometer event data
	
	else if (event.sensor.getType() == Sensor.TYPE_MAGNETIC_FIELD) {

		mGeomagnetic = new float[3];
		System.arraycopy(event.values, 0, mGeomagnetic, 0, 3);

	}

	// If we have readings from both sensors then
	// use the readings to compute the device's orientation
	// and then update the display.

	if (mGravity != null && mGeomagnetic != null) {

		float rotationMatrix[] = new float[9];

		// Users the accelerometer and magnetometer readings
		// to compute the device's rotation with respect to
		// a real world coordinate system

		boolean success = SensorManager.getRotationMatrix(rotationMatrix,
				null, mGravity, mGeomagnetic);

		if (success) {

			float orientationMatrix[] = new float[3];

			// Returns the device's orientation given
			// the rotationMatrix

			SensorManager.getOrientation(rotationMatrix, orientationMatrix);

			// Get the rotation, measured in radians, around the Z-axis
			// Note: This assumes the device is held flat and parallel
			// to the ground

			float rotationInRadians = orientationMatrix[0];

			// Convert from radians to degrees
			mRotationInDegress = Math.toDegrees(rotationInRadians);

			// Request redraw
			mCompassArrow.invalidate();

			// Reset sensor event data arrays
			mGravity = mGeomagnetic = null;

		}
	}

}

 

Data Scientists, With Great Power Comes Great Responsibility

It is a good time to be a data scientist.

With great power comes great responsibilityIn 2012 the Harvard Business Review hailed the role of data scientist “The sexiest job of the 21st century”. Data scientists are working at both start-ups and well-established companies like Twitter, Facebook, LinkedIn and Google receiving a total average salary of $98k ($144k for US respondents only) .

Data – and the insights it provides – gives the business the upper hand to better understand the clients, prospects and the overall operation. Till recently, it was not uncommon for million- and -billion- dollar deals to be accepted or rejected based on the intuition & instinct. Data scientists add value to the business by leading to informed and timely decision-making process using quantifiable, data driven evidence and by translating the data into actionable insights.

So you have a rewarding corporate day job, how about doing data science for social good?

You have been endowed with tremendous data science and leadership powers and the world needs them! Mission-driven organizations are tackling huge social issues like poverty, global warming and public health. Many have tons of unexplored data that could help them make a bigger impact, but don’t have the time or skills to leverage it. Data science has the power to move the needle on critical issues but organizations need access to data superheroes like you to use it

DataKind Blog 

There are a few of programs that exist specifically to facilitate this, the United Nations #VisualizeChange challenge is the one I’ve just taken.

As the Chief Information Technology Officer, I invite the global community of data scientists to partner with the United Nations in our mandate to harness the power of data analytics and visualization to uncover new knowledge about UN related topics such as human rights, environmental issues, and political affairs.

Ms. Atefeh Riazi – Chief Information Technology Officer at United Nations

The United Nations UNITE IDEAS published a number of data visualization challenges. For the latest challenge, #VisualizeChange: A World Humanitarian Summit Data Challenge , we were provided with unstructured information from nearly 500 documents that the consultation process has generated as per July 2015. The qualitative data is categorized in emerging themes and sub-themes that have been identified according to a developed taxonomy. The challenge was to process the consultation data in order to develop an original and thought provoking illustration of information collected through the consultation process.

Over the weekend I’ve built an interactive visualization using open-source tools (R and Shiny) to help and identify innovative ideas and innovative technologies in humanitarian action, especially on communication and IT technology. By making it to the top 10 finalists, the solution is showcased here, as well as on the Unite Ideas platform and other related events worldwide, so I hope that this visualization will be used to uncover new knowledge.

#VisualizeChange Top 10 Visualizations

Opening these challenges to the public helps raising awareness – during the process of analysing the data and designing the visualization I’ve learned on some of most pressing humanitarian needs such as Damage and Need Assessment, Communication, Online Payment and more and on the most promising technologies such as Mobile, Data Analytics, Social Media, Crowdsourcing and more.

#VisualizeChange Innovative Ideas and Technologies

Kaggle is another great platform where you can apply your data science skills for social good. How about applying image classification algorithms to automate the right whale recognition process using a dataset of aerial photographs of individual whale? With fewer than 500 North Atlantic right whales left in the world’s oceans, knowing the health and status of each whale is integral to the efforts of researchers working to protect the species from extinction.

Right Whale Recognition

There are other excellent programs.

The DSSG program ran by the University of Chicago, where aspiring data scientists take on real-world problems in education, health, energy, transportation, economic development, international development and work for three months on data mining, machine learning, big data, and data science projects with social impact.

DataKind bring together top data scientists with leading social change organizations to collaborate on cutting-edge analytics and advanced algorithms to maximize social impact.

Bayes Impact  is a group of practical idealists who believe that applied properly, data can be used to solve the world’s biggest problems.

Are you aware of any other organizations and platforms doing data science for social good? Feel free to share.

Tools & Technologies

R for analysis & visualization
Shiny.io for hosting the interactive R script
The complete source code and the data is hosted here

 

Bioinformatic Is Cool, I Mean Really Cool

I’ve been fascinated by genomic research for years.  While successfully implementing a fairly large and diverse set of algorithms (segmentation, image processing and machine learning) on text, image and other semi-structured datasets, till recently I didn’t have much exposure to the exciting field of bioinformatics, processing DNA and RNA sequences.

After completing Bioinformatics Methods I  and Bioinformatics Methods II (thank you Professor Nicholas Provart, University of Toronto) I have a better appreciation of the important roles of Bioinformatics in medicinal sciences and in drug discovery, diagnosis and disease management, but also a better appreciation of the complexity involved with the processing of large biological datasets.

bioinformatics

Topics covered in these two courses include multiple sequence alignments, phylogenetics, gene expression data analysis, and protein interaction networks, in two separate parts. The first part, Bioinformatic Methods I, dealt with databases, Blast, multiple sequence alignments, phylogenetics, selection analysis and metagenomics. The second part, Bioinformatic Methods II, dealt with motif searching, protein-protein interactions, structural bioinformatics, gene expression data analysis, and cis-element predictions.

Please find below a short list of tools and resources I’ve used while completing the different labs:

NCBI/Blast I

http://www.ncbi.nlm.nih.gov/

Multiple Sequence Alignments

http://megasoftware.net (download tool)

http://dialign.gobics.de/

http://mafft.cbrc.jp/alignment/server/

Phylogenetic

https://code.google.com/p/jmodeltest2/

http://www.megasoftware.net/

http://evolution.genetics.washington.edu/phylip.html

http://bar.utoronto.ca/webphylip/

http://mobyle.pasteur.fr/cgi-bin/portal.py#forms::fastdnaml

Selection Analysis

http://selecton.tau.ac.il/

http://bar.utoronto.ca/EMBOSS

http://www.datamonkey.org/                 

NEXT GENERATION SEQUENCING APPLICATIONS: RNA-SEQ AND METAGENOMICS                                    

http://mpss.udel.edu

http://mockler-jbrowse.mocklerlab.org/jbrowse.athal/?loc=Chr2%3A12414112..12415692

http://www.ncbi.nlm.nih.gov/pubmed/20410051

METAGENOMICS

http://metagenomics.anl.gov/

Protein Domain, Motif and Profile Analysis

http://www.ncbi.nlm.nih.gov/guide/domains-structures/

http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi               

http://smart.embl-heidelberg.de/

http://pfam.xfam.org/search

http://www.ebi.ac.uk/InterProScan/

Protein-protein interactions               

http://mips.gsf.de/proj/ppi/  

http://dip.doe-mbi.ucla.edu/dip/Main.cgi

http://www.thebiogrid.org/index.php

http://www.cytoscape.org/download.html   (download tool)             

Structural Bioinformatics

http://pdb.org  

http://www.ncbi.nlm.nih.gov/Structure/VAST/vastsearch.html  

http://pymol.org/edu (download tool)            

Gene Expression Analysis

http://www.ncbi.nlm.nih.gov/sra  

http://bar.utoronto.ca/BioC/

Gene Expression Data Analysis

http://bar.utoronto.ca/ntools/cgi-bin/ntools_expression_angler.cgi

http://bar.utoronto.ca/affydb/cgi-bin/affy_db_exprss_browser_in.cgi

http://bioinfo.cau.edu.cn/agriGO/analysis.php

http://bar.utoronto.ca/efp/

http://atted.jp/

http://bar.utoronto.ca/ntools/cgi-bin/ntools_venn_selector.cgi

Cis regulatory element mapping and prediction

http://bar.utoronto.ca/ntools/cgi-bin/BAR_Promomer.cgi

http://www.bioinformatics2.wsu.edu/cgi-bin/Athena/cgi/home.pl

http://jaspar.genereg.net/

http://meme-suite.org/tools/meme

Additional Coursers

https://class.coursera.org/molevol-002 Computational Molecular Evolution

https://www.coursera.org/course/pkubioinfo Bioinformatics

https://www.coursera.org/course/programming1 Python

https://www.coursera.org/course/webapplications Web Applications (Ruby on Rails)

https://www.coursera.org/course/epigenetics Epigenetics

https://www.coursera.org/course/genomicmedicine Genomic & Precision Medicine

https://www.coursera.org/course/usefulgenetics Useful Genetics

Book

Zvelebil & Baum 2008 Understanding Bioinformatics. Garland Science, New York

Understanding Bioinformatics

[FIXED] No Symbols For Paired Apple Watch iOS 9 / watchOS 2 / Xcode 7 [Beta 5]

I’m trying to get my watchOS 2 app running on my Apple Watch.

With Beta 4, I was able to run the WatchKit App (including HealthKit) and debug it on the Watch.

When upgrading to Beta 5, I couldn’t select the “iPhone + Apple watch” combination, as it was marked as Ineligible in the Active Scheme, with the following message  “xxx iPhone + yyy Apple Watch (no symbols for paired apple watch)”.

After many hours of trials and mainly errors, the solution that eventually worked for me is based on this post, but to simplify things I’ve followed the initial steps, changed the directory name to work with Beta 5 and zipped the relevant folders. So the simplified version for Beta 5 is:

  1. Download the zip file from here to the target directory:
    ~/Users/<user>/Library/Developer/Xcode
  2. Unzip the file. You should see a new folder named “WatchOS DeviceSupport” under your Xcode folder, with another folder named “2.0 (13S5325c)”.

XcodeBugFolderStructure

  1. Open project on Xcode 7 beta.
  2. Disconnect and connect your iPhone.

That’s all – it worked for me, I hope it will work for you as well.

The Evolving Role of the Chief Data Officer

In recent years, there has been a significant rise in the appointments of Chief Data Officers (CDOs).

Although this role is still very new, Gartner predicts that 25 percent of organizations will have a CDO by 2017, with that figure rising to 50 percent in heavily regulated industries such as banking and insurance. Underlying this change is an increasing recognition of the value of data as an asset.

Last week the CDOForum held an event chaired by Dr. Shonali Krishnaswamy Head Data Analytics Department I2R, evaluating the role of the Chief Data Officer and looking into data monetization strategies and real-life Big Data case studies.

According to Debra Logan, Vice President and Gartner Fellow, the

Chief Data Officer (CDO) is a senior executive who bears responsibility for the firm’s enterprise wide data and information strategy, governance, control, policy development, and effective exploitation. The CDO’s role will combine accountability and responsibility for information protection and privacy, information governance, data quality and data life cycle management, along with the exploitation of data assets to create business value.

To succeed in this role, the CDO should never be “siloed” and work closely with other senior leaders to innovate and to transform the business:

  • With the Chief Operating Officers (COO) and with the Chief Marketing Officer (CMO) on creating new business models, including data driven products and services, mass experimentation and on ways to acquire, grow and retain customers including personalization, profitability and retention.
  • With the COO on ways to optimize the operation, counter frauds and threats including business process operations, infrastructure & asset efficiency, counter fraud and public safety and defense.
  • With the Chief Information Officer (CIO) on ways to maximize insights, ensure trust and improve IT economics, including enabling full spectrum of analytics and optimizing big data & analytics infrastructure.
  • With the Chief Human Resource Officer (CHRO) on ways to transform management processes including planning and performance management, talent management, health & benefits optimization, incentive compensation management and human capital management.
  • With the Chief Risk Officer (CRO), CFO and COO on managing risk including risk adjusted performance, financial risk and IT risk & security.

To unleash the true power of data, many CDOs are expanding their role as a way of expanding scope and creating an innovation agenda, moving from Basics (data strategy, data governance, data architecture, data stewardship, data integration and data management) to Advanced, implementing machine learning & predictive analytics, big data solutions, developing new products and services and enhancing customer experience.

Conclusion

Organizations have struggled for decades with the value of their data assets. Having a new chief officer leading all the enterprise-wide management of data assets will ensure maximum benefits to the organization.

 

Summing multiple columns of a Data Frame and merging with the first unique row [R Tips & Tricks]

Taking the mtcars dataset as an example, I need to split the data frame by “cyl” and sum by multiple columns, “wt” and “drat”.

Next, I will need to merge the “wt” and “drat” sums to the first unique record by “cyl”.

Let’s start with the raw data, the mtcars data frame:

library("plyr")
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Summing over multiple columns by using the ddply function:

df2 <- ddply(mtcars, c("cyl"), function(x) colSums(x[c("wt", "drat")]))
df2
##   cyl     wt  drat
## 1   4 25.143 44.78
## 2   6 21.820 25.10
## 3   8 55.989 45.21

Using duplicated is quick way to find the first unique instance by “cyl”:

df3 <- mtcars[!duplicated(mtcars$cyl),]
df3
##                    mpg cyl disp  hp drat   wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.62 16.46  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.32 18.61  1  1    4    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.44 17.02  0  0    3    2

Before merging the two data frames (df2 and df3), we remove “wt” and “draft” from df3, to be replaced later by the columns’ sum from df2:

drops <- c("wt","drat")
df3.dropped <- df3[,!(names(df3) %in% drops)]
df3.dropped
##                    mpg cyl disp  hp  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 16.46  0  1    4    4
## Datsun 710        22.8   4  108  93 18.61  1  1    4    1
## Hornet Sportabout 18.7   8  360 175 17.02  0  0    3    2

We will now merge the two data frames to produce the final data frame:

merged.left <- merge(x = df2, y = df3.dropped, by = "cyl", all.x=TRUE)
merged.left
##   cyl     wt  drat  mpg disp  hp  qsec vs am gear carb
## 1   4 25.143 44.78 22.8  108  93 18.61  1  1    4    1
## 2   6 21.820 25.10 21.0  160 110 16.46  0  1    4    4
## 3   8 55.989 45.21 18.7  360 175 17.02  0  0    3    2

Extracting ID from multiple strings formats using strsplit [R Tips & Tricks]

I needed to extract an ID across multiple formats and different datasets, where the possible patterns are:

ID-xx-yy, ID-xx, ID

For example:

00018523-01-02, 078-11, 789522314H

strsplit did the work:

strsplit("00018523-01-02","-")[[1]][1]
## [1] "00018523"
strsplit("078-11","-")[[1]][1]
## [1] "078"
strsplit("789522314H","-")[[1]][1]
## [1] "789522314H"

Agile Development of Data Products with R and Shiny: A Practical Approach

Many companies are interested in turning their data assets into products and services.  It’s not limited anymore to online firms like LinkedIn or Facebook, but there is a variety of companies in offline industries (GM, Apple etc) who have started to develop products and services based on analytics.

But how do you succeed at developing and launching data products?

I would like to suggest a framework building on the idea of Lean Startup and the Minimum Viable Product (MVP), to support the rapid modelling and development of innovative data products. These principles can be applied when launching a new tech start-up, starting a small business, or when starting a new initiative within a large corporation.

Agile Development of Data Products

A minimum viable product has just those core features that allow the product to be deployed, and no more. The product is typically deployed to a subset of possible customers, such as early adopters that are thought to be more forgiving, more likely to give feedback, and able to grasp a product vision from an early prototype or marketing information
http://en.wikipedia.org/wiki/Minimum_viable_product

Some of the benefits of prototyping and developing MVP:

1.       You can get valuable feedback from the users early in the project.

2.       Different stakeholders can compare if the model matches the specification.

3.       It allows the model developer some insight into the accuracy of initial project estimates and whether the deadlines and milestones proposed can be successfully met.

Our fully functional model will look like that:

Simulator Screenshot

Before we dive into the details, feel free to play around with the prototype and get familiar with the model http://benefits.shinyapps.io/BenefitsSimulation/

Ready to go?

Let’s go through the different stages of the process, following a simple example and using R and Shiny for modelling and prototyping. I’ve also published the code in my github repository http://github.com/ofirsh/BenefitsSimulation , feel free to fork it and to play with it.

Ideas

This is where your creativity should kick in! What problem do you try to solve by using data?

Are you aware of a gap in the industry that you currently work in?

Let’s follow a simple example:

Effective employee benefits will significantly reduce staff turnover and companies with the most effective benefits are using them to influence the behavior of their staff and their bottom line, as opposed to simply being competitive

How can you find this optimal point, balancing between the increasing cost of employee benefits and the need to retain staff and reduce staff turnover?

Model

We will assume a (simplified) model that links the attrition rates to the benefits provided to the employee.

I know, it’s simplified. I’m also aware that there are many other relevant parameters in a real-life scenario.

But it’s just an example, so let’s move on.

LinearModel

Our simplified model depends on the following parameters:

  1. Number of Employees
  2. Benefits Saturation ($): we assume a linear dependency between the attrition rate and the benefits provided by the company. As the benefits increase, attrition rate drops, to 0% attrition at the point of Benefits Saturation. Any increase of benefits above the Benefits Saturation point will not have an impact on the attrition rates.
  3. Benefits ($): benefits provided by the company
  4. Max Attrition (%): maximal attrition rates at lowest benefits (100$)
  5. Training Period (months): number of months required to train a new employee
  6. Salary ($)

This model demonstrates the balance between increasing the benefits and the overall cost, and reducing the attrition rate and the associated cost related to hiring and training of new staff.

We will use R to implement our model:

R is an open-source software environment for statistical computing and graphics. The R language is widely used among statisticians and data miners for developing statistical software and data analysis.
http://en.wikipedia.org/wiki/R_%28programming_language%29

We will use R-Studio, a powerful and productive user interface for R. It’s free and open source, and works great on Windows, Mac, and Linux.

Let’s create a file named server.R and write some code. You can find the full source code in my github repository:

Cutoff is a function that models the attrition vs. benefits, where cutoffx is the value of Benefits Saturation:

cutoff <- function(minx, maxx,maxy,cutoffx,x)

{

ysat <- (x>=cutoffx)*0

slope <- ( 0 – maxy ) / ( cutoffx – minx )

yslope <- (maxy + (x-minx)*slope)*(x

return(ysat + yslope)

}

Calculating the different cost components:

benefitsCost <- input$numberee * input$benefits

attritionCost <- input$salary * nTrainingMonths * input$numberee * (currentAttrition / 100)

overallCost <- benefitsCost + attritionCost

The value of the variables starting with input$ is retrieved from the sliding bars, which are part of the User Interface (UI):

NumberOfEESlider

When changing the value of the slider named “Number Of Empolyees” from a value of 200 to a value of 300, the value of the variable input$numberee will change from 200 to 300 accordingly.

Benefits is a sequence of 20 numbers from 100 to 500, covering the range of benefits:

benefits <- round(seq(from = 100, to = 500, length.out = 20))

Let’s plot the benefits cost, the attrition cost and the overall cost as a function of the benefits:

Cost Vs Benefits

The different cost components are calculated below:

benefitsCostV <- input$numberee * benefits

attritionCostV <- input$salary * nTrainingMonths * input$numberee * (attrition / 100) totalCostV <- benefitsCostV + attritionCostV

And now we can plot the different cost components:

plot(benefitsCostV ~ benefits,col = “red”, main = “Cost vs. Benefits”, xlab = “benefits($)”, ylab = “cost($)”)

lines(benefits,benefitsCostV, col = “red”, lwd = 3)

points(benefits,attritionCostV, col = “blue”)

lines(benefits,attritionCostV, col = “blue”,lwd = 3)

points(benefits,totalCostV, col = “purple”)

lines(benefits,totalCostV, col = “purple”,lwd = 3)

Let’s find the minimal cost, and draw a nice orange circle around this optimal point:

minBenefitsIndex <- which.min(totalCostV)

minBenefits <- benefits[minBenefitsIndex]

minBenefitsCost <- totalCostV[minBenefitsIndex]

abline(v=minBenefits,col = “cyan”, lty = “dashed”, lwd = 1)

symbols(minBenefits,minBenefitsCost,circles=20, fg = “darkorange”, inches = FALSE, add=TRUE, lwd = 2)

Tip: Don’t spend too much time on writing the perfect R code; your model might change a lot once your stakeholders will provide their feedback.

Prototype

Shiny is web application framework for R that will turn your analyses into interactive web applications.

Let’s install Shiny and import the library functions:

install.packages(“shiny”)

library(shiny)

Once we have the model in place, we will create the user interface and link it back to the model.

Create a new file named ui.R with the User Interface (UI) elements.

For example, let’s write some text:

titlePanel(“Cost Optimization (Benefits and Talent) – Simulation”),

h5(“This interactive application simulates the impact of multiple …..

And let’s add a slider:

sliderInput(“numberee”,

“Number of Enployees:”,

min = 100,

max = 1000,

value = 200,

step = 100),

The inputId of the slider (numberee) is linking the value of the UI control (number of Employees) to the server side computation engine.

Create a Shiny account, copy-paste the token and a secret to the command line and execute in R-Studio:

shinyapps::setAccountInfo(name=’benefits’, token=’xxxx’, secret=’yyyy’)

And, deploy your code to the Shiny server:

deployApp()

Your prototype is live! http://benefits.shinyapps.io/BenefitsSimulation/

Send the URL to your stakeholders and collect their feedback. Iterate quickly and improve the model and the user interface.

Product

Once your stakeholders are happy with your prototype, it’s time to move on to the next stage and develop your data product.

The good news is that at this stage you should have a pretty good understanding of the requirements and the priorities, based on the feedback provided by your stakeholders.

It’s also the perfect time for you (or for your product development group) to focus more on hosting, architecture, design, the Software Development Life-cycle (SDLC), quality assurance, release management and more.

There are different technologies to consider when developing data products, which I will cover in future posts.

For now I will just mention an interesting option, where you can reuse your server-side R code. Using yhat you can expose your server-side R functionality via a set of web services, and consume these services from a client-side JavaScript libraries, like d3js.

Comments, questions?

Let me know.