Install MongoDB Community Edition and PyMongo on OS X

  • Install Homebew, a free and open-source software package management system that simplifies the installation of software on Apple’s macOS operating system.

/usr/bin/ruby -e “$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)

  • Ensure that you’re running the newest version of Homebrew and that it has the newest list of formulae available from the main repository

brew update

  • To install the MongoDB binaries, issue the following command in a system shell:

brew install mongodb

  • Create a data directory (-p create nested directories, but only if they don’t exist already)

mkdir -p ./data/db

  • Before running mongodb for the first time, ensure that the user account running mongodb has read and write permissions for the directory

sudo chmod 765 data

  • Run MongoDB

mongod –dbpath data/db

  • To stop MongoDB, press Control+C in the terminal where the mongo instance is running

Install PyMongo

pip install pymongo

  • In a Python interactive shell:

import pymongo

from pymongo import MongoClient

RoboMongo

  • Create a Connection

client = MongoClient()

  • Access Database Objects

MongoDB creates new databases implicitly upon their first use.

db = client.test

  • Query for All Documents in a Collection

cursor = db.restaurants.find()

for document in cursor: print(document)

  • Query by a Top Level Field

cursor = db.restaurants.find({“borough”: “Manhattan”})

for document in cursor: print(document)

  • Query by a Field in an Embedded Document

cursor = db.restaurants.find({“address.zipcode”: “10075”})

for document in cursor: print(document)

  • Query by a Field in an Array

cursor = db.restaurants.find({“grades.grade”: “B”})

for document in cursor: print(document)

 

  • Insert a Document

Insert a document into a collection named restaurants. The operation will create the collection if the collection does not currently exist.

result = db.restaurants.insert_one(

{

“address”: {            “street”: “2 Avenue”,            “zipcode”: “10075”,            “building”: “1480”,            “coord”: [-73.9557413, 40.7720266]        },

“borough”: “Manhattan”,

“cuisine”: “Italian”,

“grades”: [

{                “date”: datetime.strptime(“2014-10-01”, “%Y-%m-%d“),                “grade”: “A”,                “score”: 11            },

{                “date”: datetime.strptime(“2014-01-16”, “%Y-%m-%d“),                “grade”: “B”,                “score”: 17            }        ],

“name”: “Vella”,

“restaurant_id”: “41704620”

})

result.inserted_id

 

How To Find The Lag That Results In Maximum Cross-Correlation [R]

I have two time series and I want to find the lag that results in maximum correlation between the two time series. The basic problem we’re considering is the description and modeling of the relationship between these two time series.

In signal processing, cross-correlation is a measure of similarity of two series as a function of the lag of one relative to the other. This is also known as a sliding dot product or sliding inner-product.

For discrete functions, the cross-correlation is defined as:

corr

In the relationship between two time series (yt and xt), the series yt may be related to past lags of the x-series.  The sample cross correlation function (CCF) is helpful for identifying lags of the x-variable that might be useful predictors of yt.

In R, the sample CCF is defined as the set of sample correlations between xt+h and yt for h = 0, ±1, ±2, ±3, and so on.

A negative value for h is a correlation between the x-variable at a time before t and the y-variable at time t.   For instance, consider h = −2.  The CCF value would give the correlation between xt-2 and yt.

For example, let’s start with the first series, y1:

x <- seq(0,2*pi,pi/100)
length(x)
# [1] 201

y1 <- sin(x)
plot(x,y1,type="l", col = "green")

ser1

Adding series y2, with a shift of pi/2:

y2 <- sin(x+pi/2)
lines(x,y2,type="l",col="red")

ser2

Applying the cross correlation function (cff)

cv <- ccf(x = y1, y = y2, lag.max = 100, type = c("correlation"),plot = TRUE)

corr

The maximal correlation is calculated at a positive shift of the y1 series:

cor = cv$acf[,,1]
lag = cv$lag[,,1]
res = data.frame(cor,lag)
res_max = res[which.max(res$cor),]$lag
res_max
# [1] 44

Which means that maximal correlation between series y1 and series y2 is calculated between y1t+44 and y2t

corr-2

 

 

Android Studio Tips and Tricks

Editor Actions

If you’re unfamiliar with using Android Studio and the IntelliJ IDEA interface, this page provides some tips to help you get started with some of the most common tasks and productivity enhancements.

[table id=1 /]

It is always useful to visit Android Studio -> Preferences -> Keymap for the full list of shortcuts

Keymap

Tick Android Studio -> Preferences -> Editor -> General -> Appearance -> Show line number 

 

Modify the Shell Path in OSX to Access adb Through Terminal

The shell path for a user in OSX is a set of locations in the filing system whereby the user can use certain applications, commands and programs without the need to specify the full path to that command or program in the Terminal. This will work in all OSX operating systems.

You can find out whats in your path by launching Terminal in Applications/Utilities and entering:

echo $PATH

Adding in a Permanent Location

To make the new path stick permanently you need to create a .bash_profile file in your home directory and set the path there. This file control various Terminal environment preferences including the path.

nano  ~/.bash_profile

Create the .bash_profile file with a command line editor called nano

export PATH="/Users/ofir/Library/Android/sdk/platform-tools:$PATH"

Add in the above line which declares the new location /Users/ofir/Library/Android/sdk/platform-tools as well as the original path declared as $PATH.

So now when the Terminal is relaunched or a new window made and you check the the path by

echo $PATH

You will get the new path at the front followed by the default path locations, all the time

/Users/ofir/Library/Android/sdk/platform-tools:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/opt/X11/bin:/usr/local/git/bin

 

Bluetooth Debugging on Android Wear

I’ve just bought a new ASUS ZenWatch and I’m working on a small application to extract the sensors data from the watch.

This post ( http://developer.android.com/training/wearables/apps/bt-debugging.html ) is describing the basic steps required to setup a device for debugging and to setup a debugging session. I’ve listed below some additional steps to help with the troubleshooting if things doesn’t work as planned.

Android Debug Bridge (adb)

Android Debug Bridge (adb) is a versatile command line tool that lets you communicate with an emulator instance or connected Android-powered device. It is a client-server program that includes three components:

  • A client, which runs on your development machine. You can invoke a client from a shell by issuing an adb command. Other Android tools such as DDMS also create adb clients.
  • A server, which runs as a background process on your development machine. The server manages communication between the client and the adb daemon running on an emulator or device.
  • A daemon, which runs as a background process on each emulator or device instance.

Setup Devices for Debugging


  1. Enable USB debugging on the handheld:
    • Open the Settings app and scroll to the bottom.
    • If it doesn’t have a Developer Options setting, tap About Phone (or About Tablet), scroll to the bottom, and tap the build number 7 times.
    • Go back and tap Developer Options.
    • Enable USB debugging.
  2. Enable Bluetooth debugging on the wearable:
    1. Tap the home screen twice to bring up the Wear menu.
    2. Scroll to the bottom and tap Settings.
    3. Scroll to the bottom. If there’s no Developer Options item, tap About, and then tap the build number 7 times.
    4. Tap the Developer Options item.
    5. Enable Debug over Bluetooth.

Set Up a Debugging Session


  • On the handheld, open the Android Wear companion app.
  • Tap the menu on the top right and select Settings.
  • Enable Debugging over Bluetooth. You should see a tiny status summary appear under the option:
Host: disconnected
Target: connected
  • Connect the handheld to your machine over USB.
  • In the Android Studio open the Terminal (Alt+F12 Windows) and run:
    adb forward tcp:4444 localabstract:/adb-hub
    adb connect localhost:4444

    Note: You can use any available port that you have access to.

    • adb forward forwards socket connections from a specified local port to a specified remote port on the emulator/device instance. Map a socket through the phone that is paired to your Wear device.
    • adb connect will connect to the device
  • A message will appear on the phone to allow Wear Debugging, press OK

WearDebugger

  • In the Android Wear companion app, you should see the status change to:
Host: connected
Target: connected

Well, this is the happy flow.

While debugging I’ve encountered some issues, especially after disconnecting the phone from the USB cable.

First, print a list of all attached emulator/device instances:

adb devices
List of devices attached
LGD855bxxxxx  device

In this case you can see only the phone and not the watch, so you will need to execute

adb forward tcp:4444 localabstract:/adb-hub
adb connect localhost:4444

In other cases you may get the error message

error: more than one device/emulator

You may need to terminates the adb server process:

adb kill-server

Now you can execute the adb forward and adb connect commands

adb forward tcp:4444 localabstract:/adb-hub
adb connect localhost:4444

If you get an error message

error: device unauthorized.

Try revoking the permissions on the device – Developer options -> Revoke USB debugging authorizations. Then plug the device in and accept it again.

 

Implementation of the BlockingQueue Interface That Works Properly When Accessed via Multiple Threads

A BlockingQueue is typically used to have on thread produce objects, which another thread consumes. Here is a diagram that illustrates this principle:

blocking-queue
A BlockingQueue with one thread putting into it, and another thread taking from it.

The producing thread will keep producing new objects and insert them into the queue, until the queue reaches some upper bound on what it can contain. It’s limit, in other words. If the blocking queue reaches its upper limit, the producing thread is blocked while trying to insert the new object. It remains blocked until a consuming thread takes an object out of the queue.

The consuming thread keeps taking objects out of the blocking queue, and processes them. If the consuming thread tries to take an object out of an empty queue, the consuming thread is blocked until a producing thread puts an object into the queue.

BlockingQueue methods come in four forms, with different ways of handling operations that cannot be satisfied immediately, but may be satisfied at some point in the future: one throws an exception, the second returns a special value (either null or false, depending on the operation), the third blocks the current thread indefinitely until the operation can succeed, and the fourth blocks for only a given maximum time limit before giving up. These methods are summarized in the following table:

Throws exception Special value Blocks Times out
Insert add(e) offer(e) put(e) offer(e, time, unit)
Remove remove() poll() take() poll(time, unit)
Examine element() peek() not applicable not applicable

The code below defines an implementation of the BlockingQueue interface that works properly when accessed via multiple threads since it’s synchronized properly

class SimpleBlockingQueue<E> implements BlockingQueue<E> {
    /**
     * The queue consists of a List of E's.
     */
    final private List<E> mList;

Add a new E to the end of the queue, blocking if necessary for space to become available

 public void put(E e) throws InterruptedException {
        synchronized(this) {
            if (e == null)
                throw new NullPointerException();

            // Wait until the queue is not full.
            while (isFull()) {
                // System.out.println("BLOCKING ON PUT()");
                wait();
            }

            // Add e to the ArrayList.
            mList.add(e);
            
            // Notify that the queue may have changed state, e.g., "no
            // longer empty".
            notifyAll();
        }
    }

Remove the E at the front of the queue, blocking until there’s something in the queue

public E take() throws InterruptedException {
        synchronized(this) {
            // Wait until the queue is not empty.
            while (mList.isEmpty()) {
                // System.out.println("BLOCKING ON TAKE()");
                wait();
            }

            final E e = mList.remove(0);
        
            // Notify that the queue may have changed state, e.g., "no
            // longer full".
            notifyAll();
            return e;
        }
    }

 

Using Java Thread.join() as a simple barrier synchronizer [Android]

Starts a Thread for each element in the List of input Strings and uses Thread.join() to wait for all the Threads to finish. This implementation doesn’t require any Java synchronization mechanisms other than what’s provided by Thread

 private volatile List<String> mInput = null;

/**
* The array of words to find.
*/
final String[] mWordsToFind;

/**
* The List of worker Threads that were created.
*/
private List mWorkerThreads;

Launch process for each and every string

// This List holds Threads so they can be joined when their processing is done.
mWorkerThreads = new LinkedList();

// Create and start a Thread for each element in the
// mInput.
for (int i = 0; i < mInput.size(); ++i) {
    // Each Thread performs the processing designated by
    // the processInput() method of the worker Runnable.
    Thread t = new Thread(makeTask(i));

    // Add to the List of Threads to join.
    mWorkerThreads.add(t);

    // Start the Thread to process its input in the background.
    t.start();
}

Barrier synchronization: using thread.join() to wait for the completion of all the other threads

// Barrier synchronization.
for (Thread thread : mWorkerThreads)
   try 
   {
      thread.join();
   } 
      catch (InterruptedException e) 
   {
      printDebugging("join() interrupted");
   }

If t is a Thread object whose thread is currently executing,

t.join();

causes the current thread to pause execution until t‘s thread terminates.

Overloads of join allow the programmer to specify a waiting period.

Like sleep, join responds to an interrupt by exiting with an InterruptedException.

Based on https://github.com/douglascraigschmidt/LiveLessons/blob/master/ThreadJoinTest/src/ThreadJoinTest.java

Using The Device’s Accelerometer And Magnometer To Orient A Compass [Android]

Initialization

private String TAG = "SensorCompass";

// Sensors & SensorManager
private Sensor accelerometer;
private Sensor magnetometer;
private SensorManager mSensorManager;

// Storage for Sensor readings
private float[] mGravity = null;
private float[] mGeomagnetic = null;

// Rotation around the Z axis
private double mRotationInDegress;

OnCreate

protected void onCreate(Bundle savedInstanceState) {

		// ...

		// Get a reference to the SensorManager
		mSensorManager = (SensorManager) getSystemService(SENSOR_SERVICE);

		// Get a reference to the accelerometer
		accelerometer = mSensorManager
				.getDefaultSensor(Sensor.TYPE_ACCELEROMETER);

		// Get a reference to the magnetometer
		magnetometer = mSensorManager
				.getDefaultSensor(Sensor.TYPE_MAGNETIC_FIELD);

		// Exit unless both sensors are available
		if (null == accelerometer || null == magnetometer)
			finish();

	}

Register this class as a listener for the accelerometer events and for magnetometer events:

@Override
	protected void onResume() {
		super.onResume();


		// Register for sensor updates

		mSensorManager.registerListener(this, accelerometer,
				SensorManager.SENSOR_DELAY_NORMAL);

		mSensorManager.registerListener(this, magnetometer,
				SensorManager.SENSOR_DELAY_NORMAL);
	}

The onPause method unregister this class as a listener for all sesnors

@Override
protected void onPause() {
	super.onPause();

	// Unregister all sensors
	mSensorManager.unregisterListener(this);
}

The onSensorChanged method process the incoming sensors’ events

@Override
public void onSensorChanged(SensorEvent event) {

	// Acquire accelerometer event data
	
	if (event.sensor.getType() == Sensor.TYPE_ACCELEROMETER) {

		mGravity = new float[3];
		System.arraycopy(event.values, 0, mGravity, 0, 3);

	} 
	
	// Acquire magnetometer event data
	
	else if (event.sensor.getType() == Sensor.TYPE_MAGNETIC_FIELD) {

		mGeomagnetic = new float[3];
		System.arraycopy(event.values, 0, mGeomagnetic, 0, 3);

	}

	// If we have readings from both sensors then
	// use the readings to compute the device's orientation
	// and then update the display.

	if (mGravity != null && mGeomagnetic != null) {

		float rotationMatrix[] = new float[9];

		// Users the accelerometer and magnetometer readings
		// to compute the device's rotation with respect to
		// a real world coordinate system

		boolean success = SensorManager.getRotationMatrix(rotationMatrix,
				null, mGravity, mGeomagnetic);

		if (success) {

			float orientationMatrix[] = new float[3];

			// Returns the device's orientation given
			// the rotationMatrix

			SensorManager.getOrientation(rotationMatrix, orientationMatrix);

			// Get the rotation, measured in radians, around the Z-axis
			// Note: This assumes the device is held flat and parallel
			// to the ground

			float rotationInRadians = orientationMatrix[0];

			// Convert from radians to degrees
			mRotationInDegress = Math.toDegrees(rotationInRadians);

			// Request redraw
			mCompassArrow.invalidate();

			// Reset sensor event data arrays
			mGravity = mGeomagnetic = null;

		}
	}

}

 

Data Scientists, With Great Power Comes Great Responsibility

It is a good time to be a data scientist.

With great power comes great responsibilityIn 2012 the Harvard Business Review hailed the role of data scientist “The sexiest job of the 21st century”. Data scientists are working at both start-ups and well-established companies like Twitter, Facebook, LinkedIn and Google receiving a total average salary of $98k ($144k for US respondents only) .

Data – and the insights it provides – gives the business the upper hand to better understand the clients, prospects and the overall operation. Till recently, it was not uncommon for million- and -billion- dollar deals to be accepted or rejected based on the intuition & instinct. Data scientists add value to the business by leading to informed and timely decision-making process using quantifiable, data driven evidence and by translating the data into actionable insights.

So you have a rewarding corporate day job, how about doing data science for social good?

You have been endowed with tremendous data science and leadership powers and the world needs them! Mission-driven organizations are tackling huge social issues like poverty, global warming and public health. Many have tons of unexplored data that could help them make a bigger impact, but don’t have the time or skills to leverage it. Data science has the power to move the needle on critical issues but organizations need access to data superheroes like you to use it

DataKind Blog 

There are a few of programs that exist specifically to facilitate this, the United Nations #VisualizeChange challenge is the one I’ve just taken.

As the Chief Information Technology Officer, I invite the global community of data scientists to partner with the United Nations in our mandate to harness the power of data analytics and visualization to uncover new knowledge about UN related topics such as human rights, environmental issues, and political affairs.

Ms. Atefeh Riazi – Chief Information Technology Officer at United Nations

The United Nations UNITE IDEAS published a number of data visualization challenges. For the latest challenge, #VisualizeChange: A World Humanitarian Summit Data Challenge , we were provided with unstructured information from nearly 500 documents that the consultation process has generated as per July 2015. The qualitative data is categorized in emerging themes and sub-themes that have been identified according to a developed taxonomy. The challenge was to process the consultation data in order to develop an original and thought provoking illustration of information collected through the consultation process.

Over the weekend I’ve built an interactive visualization using open-source tools (R and Shiny) to help and identify innovative ideas and innovative technologies in humanitarian action, especially on communication and IT technology. By making it to the top 10 finalists, the solution is showcased here, as well as on the Unite Ideas platform and other related events worldwide, so I hope that this visualization will be used to uncover new knowledge.

#VisualizeChange Top 10 Visualizations

Opening these challenges to the public helps raising awareness – during the process of analysing the data and designing the visualization I’ve learned on some of most pressing humanitarian needs such as Damage and Need Assessment, Communication, Online Payment and more and on the most promising technologies such as Mobile, Data Analytics, Social Media, Crowdsourcing and more.

#VisualizeChange Innovative Ideas and Technologies

Kaggle is another great platform where you can apply your data science skills for social good. How about applying image classification algorithms to automate the right whale recognition process using a dataset of aerial photographs of individual whale? With fewer than 500 North Atlantic right whales left in the world’s oceans, knowing the health and status of each whale is integral to the efforts of researchers working to protect the species from extinction.

Right Whale Recognition

There are other excellent programs.

The DSSG program ran by the University of Chicago, where aspiring data scientists take on real-world problems in education, health, energy, transportation, economic development, international development and work for three months on data mining, machine learning, big data, and data science projects with social impact.

DataKind bring together top data scientists with leading social change organizations to collaborate on cutting-edge analytics and advanced algorithms to maximize social impact.

Bayes Impact  is a group of practical idealists who believe that applied properly, data can be used to solve the world’s biggest problems.

Are you aware of any other organizations and platforms doing data science for social good? Feel free to share.

Tools & Technologies

R for analysis & visualization
Shiny.io for hosting the interactive R script
The complete source code and the data is hosted here

 

Summing multiple columns of a Data Frame and merging with the first unique row [R Tips & Tricks]

Taking the mtcars dataset as an example, I need to split the data frame by “cyl” and sum by multiple columns, “wt” and “drat”.

Next, I will need to merge the “wt” and “drat” sums to the first unique record by “cyl”.

Let’s start with the raw data, the mtcars data frame:

library("plyr")
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Summing over multiple columns by using the ddply function:

df2 <- ddply(mtcars, c("cyl"), function(x) colSums(x[c("wt", "drat")]))
df2
##   cyl     wt  drat
## 1   4 25.143 44.78
## 2   6 21.820 25.10
## 3   8 55.989 45.21

Using duplicated is quick way to find the first unique instance by “cyl”:

df3 <- mtcars[!duplicated(mtcars$cyl),]
df3
##                    mpg cyl disp  hp drat   wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.62 16.46  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.32 18.61  1  1    4    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.44 17.02  0  0    3    2

Before merging the two data frames (df2 and df3), we remove “wt” and “draft” from df3, to be replaced later by the columns’ sum from df2:

drops <- c("wt","drat")
df3.dropped <- df3[,!(names(df3) %in% drops)]
df3.dropped
##                    mpg cyl disp  hp  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 16.46  0  1    4    4
## Datsun 710        22.8   4  108  93 18.61  1  1    4    1
## Hornet Sportabout 18.7   8  360 175 17.02  0  0    3    2

We will now merge the two data frames to produce the final data frame:

merged.left <- merge(x = df2, y = df3.dropped, by = "cyl", all.x=TRUE)
merged.left
##   cyl     wt  drat  mpg disp  hp  qsec vs am gear carb
## 1   4 25.143 44.78 22.8  108  93 18.61  1  1    4    1
## 2   6 21.820 25.10 21.0  160 110 16.46  0  1    4    4
## 3   8 55.989 45.21 18.7  360 175 17.02  0  0    3    2