Introduction to Survival Analysis


Survival analysis is generally defined as a set of methods for analysing data where the outcome variable is the time until the occurrence of an event of interest. For example, if the event of interest is heart attack, then the survival time can be the time in years until a person develops a heart attack. For simplicity, we will adopt the terminology of survival analysis, referring to the event of interest as ‘death’ and to the waiting time as ‘survival’ time, but this technique has much wider applicability. The event can be death, occurrence of a disease, marriage, divorce, etc. The time to event or survival time can be measured in days, weeks, years, etc.

The specific difficulties relating to survival analysis arise largely from the fact that only some individuals have experienced the event and, subsequently, survival times will be unknown for a subset of the study group. This phenomenon is called censoring.

In longitudinal studies exact survival time is only known for those individuals who show the event of interest during the follow-up period. For others (those who are disease free at the end of the observation period or those that were lost) all we can say is that they did not show the event of interest during the follow-up period. These individuals are called censored observations. An attractive feature of survival analysis is that we are able to include the data contributed by censored observations right up until they are removed from the risk set.

Survival and Hazard

T  –  a non-negative random variable representing the waiting time until the occurrence of an event.

The survival function, S(t), of an individual is the probability that they survive until at least time t, where t is a time of interest and T is the time of event.


The survival curve is non-increasing (the event may not reoccur for an individual) and is limited within [0,1].


F(t) – the probability that the event has occurred by duration t:


the probability density function (p.d.f.) f(t):


An alternative characterisation of the distribution of T is given by the hazard function, or instantaneous rate of occurrence of the event, defined as


The numerator of this expression is the conditional probability that the event will occur in the interval [t,t+dt] given that it has not occurred before, and the denominator is the width of the interval. Dividing one by the other we obtain a rate of event occurrence per unit of time. Taking the limit as the width of the interval goes down to zero, we obtain an instantaneous rate of occurrence.

Applying Bayes’ Rule


on the numerator of the hazard function:


Given that the event happened between time t to t+dt, the conditional probability of this event happening after time t is 1:


Dividing by dt and passing to the limit gives the useful result:


In words, the rate of occurrence of the event at duration t equals the density of events at t, divided by the probability of surviving to that duration without experiencing the event.

We will soon show that there is a one-to-one relation between the hazard and the survival function.

The derivative of S(t) is:


We will now show that the hazard function is the derivative of -log S(t):


If we now integrate from 0 to time t:




 and introduce the boundary condition S(0) = 1 (since the event is sure not to have occurred by duration 0):



we can solve the above expression to obtain a formula for the probability of surviving to duration t as a function of the hazard at all durations up to t:


One approach to estimating the survival probabilities is to assume that the hazard function follow a specific mathematical distribution. Models with increasing hazard rates may arise when there is natural aging or wear. Decreasing hazard functions are much less common but find occasional use when there is a very early likelihood of failure, such as in certain types of electronic devices or in patients experiencing certain types of transplants. Most often, a bathtub-shaped hazard is appropriate in populations followed from birth.

The figure below hows the relationship between four parametrically specified hazards and the corresponding survival probabilities. It illustrates (a) a constant hazard rate over time (e.g. healthy persons) which is analogous to an exponential distribution of survival times, (b) strictly increasing (c) decreasing hazard rates based on a Weibull model, and (d) a combination of decreasing and increasing hazard rates using a log-Normal model. These curves are illustrative examples and other shapes are possible.



The simplest possible survival distribution is obtained by assuming a constant risk over time:


Censoring and truncation

One of the distinguishing feature of the field of survival analysis is censoring: observations are called censored when the information about their survival time is incomplete; the most commonly encountered form is right censoring.


Right censoring occurs when a subject leaves the study before an event occurs, or the study ends before the event has occurred. For example, we consider patients in a clinical trial to study the effect of treatments on stroke occurrence. The study ends after 5 years. Those patients who have had no strokes by the end of the year are censored. Another example of right censoring is when a person drops out of the study before the end of the study observation time and did not experience the event. This person’s survival time is said to be censored, since we know that the event of interest did not happen while this person was under observation.

Left censoring is when the event of interest has already occurred before enrolment. This is very rarely encountered.

In a truncated sample, we do not even “pick up” observations that lie outside a certain range.

Unlike ordinary regression models, survival methods correctly incorporate information from both censored and uncensored observations in estimating important model parameters

Non-parametric Models

The very simplest survival models are really just tables of event counts: non-parametric, easily computed and a good place to begin modelling to check assumptions, data quality and end-user requirements etc. When no event times are censored, a non-parametric estimator of S(t) is 1 − F(t), where F(t) is the empirical cumulative distribution function.


When some observations are censored, we can estimate S(t) using the Kaplan-Meier product-limit estimator. An important advantage of the Kaplan–Meier curve is that the method can take into account some types of censored data, particularly right-censoring, which occurs if a patient withdraws from a study, is lost to follow-up, or is alive without event occurrence at last follow-up.

Suppose that 100 subjects of a certain type were tracked over a period of time to determine how many survived for one year, two years, three years, and so forth. If all the subjects remained accessible throughout the entire length of the study, the estimation of year-by-year survival probabilities for subjects of this type in general would be an easy matter. The survival of 87 subjects at the end of the first year would give a one-year survival probability estimate of 87/100=0.87; the survival of 76 subjects at the end of the second year would yield a two-year estimate of 76/100=0.76; and so forth.

But in real-life longitudinal research it rarely works out this neatly. Typically there are subjects lost along the way (censored) for reasons unrelated to the focus of the study.

Suppose that 100 subjects of a certain type were tracked over a period of two years determine how many survived for one year and for two years. Of the 100 subjects who are “at risk” at the beginning of the study, 3 become unavailable (censored) during the first year and 3 are known to have died by the end of the first year. Another 2 become unavailable during the second year and another 10 are known to have died by the end of the second year.


Kaplan and Meier proposed that subjects who become unavailable during a given time period be counted among those who survive through the end of that period, but then deleted from the number who are at risk for the next time period.

The table below shows how these conventions would work out for the present example. Of the 100 subjects who are at risk at the beginning of the study, 3 become unavailable during the first year and 3 die. The number surviving the first year (Year 1) is therefore 100 (at risk) – 3 (died) = 97 and the number at risk at the beginning of the second year (Year 2) is 100 (at risk) – 3 (died) – 3 (unavailable) = 94. Another 2 subjects become unavailable during the second year and another 10 die. So the number surviving Year 2 is 94 (at risk) – 10 (died) = 84.


As illustrated in the next table, the Kaplan-Meier procedure then calculates the survival probability estimate for each of the t time periods, except the first, as a compound conditional probability.


The estimate for surviving through Year 1 is simply 97/100=0.97. And if one does survive through Year 1, the conditional probability of then surviving through Year 2 is 84/94=0.8936. The estimated probability of surviving through both Year 1 and Year 2 is therefore (97/100) x (84/94)=0.8668.

Incorporating covariates: proportional hazards models

Up to now we have not had information for each individual other than the survival time and censoring status ie. we have not considered information such as the weight, age, or smoking status of individuals, for example. These are referred to as covariates or explanatory variables.

Cox Proportional Hazards Modelling

The most interesting survival-analysis research examines the relationship between survival — typically in the form of the hazard function — and one or more explanatory variables (or covariates).


where λ0(t) is the non-parametric baseline hazard function and βx is a linear parametric model using features of the individuals, transformed by an exponential function. The baseline hazard function λ0(t) does not need to be specified for the Cox model, making it semi-parametric. The baseline hazard function is appropriately named because it describes the risk at a certain time when x = 0, which is when the features are not incorporated. The hazard function describes the relationship between the baseline hazard and features of a specific sample to quantify the hazard or risk at a certain time.

The model only needs to satisfy the proportional hazard assumption, which is that the hazard of one sample is proportional to the hazard of another sample. Two samples xi and xj satisfy this assumption when the ratio is not dependent on time as shown below:


The parameters can be estimated by maximizing the partial likelihood.


Kaplan-Meier methods and Parametric Regression methods, Kristin Sainani Ph.D.

Changing the Game with Data and Insights – Data Science Singapore

Another great Data Science Singapore (DSSG) event! Hong Cao from McLaren Applied Technologies shared his insights on applications of data science at McLaren.

The first project is using economic sensors for continuous human conditions monitoring, including sleep quality, gait and activities, perceived stress and cognitive performance.


Gait outlier analysis provides unique insight on fatigue levels while exercising, probability of injury and post surgery performance and recovery.

Gait Analysis Data Science


A related study looks into how biotelemetry assist in patient treatment such as ALS (Amyotrophic Lateral Sclerosis) disease progression monitoring. The prototype tools collect heart rate, activity and speech data to analyse disease progression.


HRV (Heart Rate Variability) features are extracted from both the time and from the frequency domains.


Activity score is derived from the three-axis accelerometer data.


The second project was a predictive failure POC, to help determine the condition of Haul Trucks in order to predict when a failure might happen. The cost of having an excavator go down in the field is $5 million a day, while the cost of losing a haul truck is $1.8 million per day. If you can prevent it from going down in the field, that makes a huge difference


How To Find The Lag That Results In Maximum Cross-Correlation [R]

I have two time series and I want to find the lag that results in maximum correlation between the two time series. The basic problem we’re considering is the description and modeling of the relationship between these two time series.

In signal processing, cross-correlation is a measure of similarity of two series as a function of the lag of one relative to the other. This is also known as a sliding dot product or sliding inner-product.

For discrete functions, the cross-correlation is defined as:


In the relationship between two time series (yt and xt), the series yt may be related to past lags of the x-series.  The sample cross correlation function (CCF) is helpful for identifying lags of the x-variable that might be useful predictors of yt.

In R, the sample CCF is defined as the set of sample correlations between xt+h and yt for h = 0, ±1, ±2, ±3, and so on.

A negative value for h is a correlation between the x-variable at a time before t and the y-variable at time t.   For instance, consider h = −2.  The CCF value would give the correlation between xt-2 and yt.

For example, let’s start with the first series, y1:

x <- seq(0,2*pi,pi/100)
# [1] 201

y1 <- sin(x)
plot(x,y1,type="l", col = "green")


Adding series y2, with a shift of pi/2:

y2 <- sin(x+pi/2)


Applying the cross correlation function (cff)

cv <- ccf(x = y1, y = y2, lag.max = 100, type = c("correlation"),plot = TRUE)


The maximal correlation is calculated at a positive shift of the y1 series:

cor = cv$acf[,,1]
lag = cv$lag[,,1]
res = data.frame(cor,lag)
res_max = res[which.max(res$cor),]$lag
# [1] 44

Which means that maximal correlation between series y1 and series y2 is calculated between y1t+44 and y2t




Data Scientists, With Great Power Comes Great Responsibility

It is a good time to be a data scientist.

With great power comes great responsibilityIn 2012 the Harvard Business Review hailed the role of data scientist “The sexiest job of the 21st century”. Data scientists are working at both start-ups and well-established companies like Twitter, Facebook, LinkedIn and Google receiving a total average salary of $98k ($144k for US respondents only) .

Data – and the insights it provides – gives the business the upper hand to better understand the clients, prospects and the overall operation. Till recently, it was not uncommon for million- and -billion- dollar deals to be accepted or rejected based on the intuition & instinct. Data scientists add value to the business by leading to informed and timely decision-making process using quantifiable, data driven evidence and by translating the data into actionable insights.

So you have a rewarding corporate day job, how about doing data science for social good?

You have been endowed with tremendous data science and leadership powers and the world needs them! Mission-driven organizations are tackling huge social issues like poverty, global warming and public health. Many have tons of unexplored data that could help them make a bigger impact, but don’t have the time or skills to leverage it. Data science has the power to move the needle on critical issues but organizations need access to data superheroes like you to use it

DataKind Blog 

There are a few of programs that exist specifically to facilitate this, the United Nations #VisualizeChange challenge is the one I’ve just taken.

As the Chief Information Technology Officer, I invite the global community of data scientists to partner with the United Nations in our mandate to harness the power of data analytics and visualization to uncover new knowledge about UN related topics such as human rights, environmental issues, and political affairs.

Ms. Atefeh Riazi – Chief Information Technology Officer at United Nations

The United Nations UNITE IDEAS published a number of data visualization challenges. For the latest challenge, #VisualizeChange: A World Humanitarian Summit Data Challenge , we were provided with unstructured information from nearly 500 documents that the consultation process has generated as per July 2015. The qualitative data is categorized in emerging themes and sub-themes that have been identified according to a developed taxonomy. The challenge was to process the consultation data in order to develop an original and thought provoking illustration of information collected through the consultation process.

Over the weekend I’ve built an interactive visualization using open-source tools (R and Shiny) to help and identify innovative ideas and innovative technologies in humanitarian action, especially on communication and IT technology. By making it to the top 10 finalists, the solution is showcased here, as well as on the Unite Ideas platform and other related events worldwide, so I hope that this visualization will be used to uncover new knowledge.

#VisualizeChange Top 10 Visualizations

Opening these challenges to the public helps raising awareness – during the process of analysing the data and designing the visualization I’ve learned on some of most pressing humanitarian needs such as Damage and Need Assessment, Communication, Online Payment and more and on the most promising technologies such as Mobile, Data Analytics, Social Media, Crowdsourcing and more.

#VisualizeChange Innovative Ideas and Technologies

Kaggle is another great platform where you can apply your data science skills for social good. How about applying image classification algorithms to automate the right whale recognition process using a dataset of aerial photographs of individual whale? With fewer than 500 North Atlantic right whales left in the world’s oceans, knowing the health and status of each whale is integral to the efforts of researchers working to protect the species from extinction.

Right Whale Recognition

There are other excellent programs.

The DSSG program ran by the University of Chicago, where aspiring data scientists take on real-world problems in education, health, energy, transportation, economic development, international development and work for three months on data mining, machine learning, big data, and data science projects with social impact.

DataKind bring together top data scientists with leading social change organizations to collaborate on cutting-edge analytics and advanced algorithms to maximize social impact.

Bayes Impact  is a group of practical idealists who believe that applied properly, data can be used to solve the world’s biggest problems.

Are you aware of any other organizations and platforms doing data science for social good? Feel free to share.

Tools & Technologies

R for analysis & visualization for hosting the interactive R script
The complete source code and the data is hosted here


The Evolving Role of the Chief Data Officer

In recent years, there has been a significant rise in the appointments of Chief Data Officers (CDOs).

Although this role is still very new, Gartner predicts that 25 percent of organizations will have a CDO by 2017, with that figure rising to 50 percent in heavily regulated industries such as banking and insurance. Underlying this change is an increasing recognition of the value of data as an asset.

Last week the CDOForum held an event chaired by Dr. Shonali Krishnaswamy Head Data Analytics Department I2R, evaluating the role of the Chief Data Officer and looking into data monetization strategies and real-life Big Data case studies.

According to Debra Logan, Vice President and Gartner Fellow, the

Chief Data Officer (CDO) is a senior executive who bears responsibility for the firm’s enterprise wide data and information strategy, governance, control, policy development, and effective exploitation. The CDO’s role will combine accountability and responsibility for information protection and privacy, information governance, data quality and data life cycle management, along with the exploitation of data assets to create business value.

To succeed in this role, the CDO should never be “siloed” and work closely with other senior leaders to innovate and to transform the business:

  • With the Chief Operating Officers (COO) and with the Chief Marketing Officer (CMO) on creating new business models, including data driven products and services, mass experimentation and on ways to acquire, grow and retain customers including personalization, profitability and retention.
  • With the COO on ways to optimize the operation, counter frauds and threats including business process operations, infrastructure & asset efficiency, counter fraud and public safety and defense.
  • With the Chief Information Officer (CIO) on ways to maximize insights, ensure trust and improve IT economics, including enabling full spectrum of analytics and optimizing big data & analytics infrastructure.
  • With the Chief Human Resource Officer (CHRO) on ways to transform management processes including planning and performance management, talent management, health & benefits optimization, incentive compensation management and human capital management.
  • With the Chief Risk Officer (CRO), CFO and COO on managing risk including risk adjusted performance, financial risk and IT risk & security.

To unleash the true power of data, many CDOs are expanding their role as a way of expanding scope and creating an innovation agenda, moving from Basics (data strategy, data governance, data architecture, data stewardship, data integration and data management) to Advanced, implementing machine learning & predictive analytics, big data solutions, developing new products and services and enhancing customer experience.


Organizations have struggled for decades with the value of their data assets. Having a new chief officer leading all the enterprise-wide management of data assets will ensure maximum benefits to the organization.


Agile Development of Data Products with R and Shiny: A Practical Approach

Many companies are interested in turning their data assets into products and services.  It’s not limited anymore to online firms like LinkedIn or Facebook, but there is a variety of companies in offline industries (GM, Apple etc) who have started to develop products and services based on analytics.

But how do you succeed at developing and launching data products?

I would like to suggest a framework building on the idea of Lean Startup and the Minimum Viable Product (MVP), to support the rapid modelling and development of innovative data products. These principles can be applied when launching a new tech start-up, starting a small business, or when starting a new initiative within a large corporation.

Agile Development of Data Products

A minimum viable product has just those core features that allow the product to be deployed, and no more. The product is typically deployed to a subset of possible customers, such as early adopters that are thought to be more forgiving, more likely to give feedback, and able to grasp a product vision from an early prototype or marketing information

Some of the benefits of prototyping and developing MVP:

1.       You can get valuable feedback from the users early in the project.

2.       Different stakeholders can compare if the model matches the specification.

3.       It allows the model developer some insight into the accuracy of initial project estimates and whether the deadlines and milestones proposed can be successfully met.

Our fully functional model will look like that:

Simulator Screenshot

Before we dive into the details, feel free to play around with the prototype and get familiar with the model

Ready to go?

Let’s go through the different stages of the process, following a simple example and using R and Shiny for modelling and prototyping. I’ve also published the code in my github repository , feel free to fork it and to play with it.


This is where your creativity should kick in! What problem do you try to solve by using data?

Are you aware of a gap in the industry that you currently work in?

Let’s follow a simple example:

Effective employee benefits will significantly reduce staff turnover and companies with the most effective benefits are using them to influence the behavior of their staff and their bottom line, as opposed to simply being competitive

How can you find this optimal point, balancing between the increasing cost of employee benefits and the need to retain staff and reduce staff turnover?


We will assume a (simplified) model that links the attrition rates to the benefits provided to the employee.

I know, it’s simplified. I’m also aware that there are many other relevant parameters in a real-life scenario.

But it’s just an example, so let’s move on.


Our simplified model depends on the following parameters:

  1. Number of Employees
  2. Benefits Saturation ($): we assume a linear dependency between the attrition rate and the benefits provided by the company. As the benefits increase, attrition rate drops, to 0% attrition at the point of Benefits Saturation. Any increase of benefits above the Benefits Saturation point will not have an impact on the attrition rates.
  3. Benefits ($): benefits provided by the company
  4. Max Attrition (%): maximal attrition rates at lowest benefits (100$)
  5. Training Period (months): number of months required to train a new employee
  6. Salary ($)

This model demonstrates the balance between increasing the benefits and the overall cost, and reducing the attrition rate and the associated cost related to hiring and training of new staff.

We will use R to implement our model:

R is an open-source software environment for statistical computing and graphics. The R language is widely used among statisticians and data miners for developing statistical software and data analysis.

We will use R-Studio, a powerful and productive user interface for R. It’s free and open source, and works great on Windows, Mac, and Linux.

Let’s create a file named server.R and write some code. You can find the full source code in my github repository:

Cutoff is a function that models the attrition vs. benefits, where cutoffx is the value of Benefits Saturation:

cutoff <- function(minx, maxx,maxy,cutoffx,x)


ysat <- (x>=cutoffx)*0

slope <- ( 0 – maxy ) / ( cutoffx – minx )

yslope <- (maxy + (x-minx)*slope)*(x

return(ysat + yslope)


Calculating the different cost components:

benefitsCost <- input$numberee * input$benefits

attritionCost <- input$salary * nTrainingMonths * input$numberee * (currentAttrition / 100)

overallCost <- benefitsCost + attritionCost

The value of the variables starting with input$ is retrieved from the sliding bars, which are part of the User Interface (UI):


When changing the value of the slider named “Number Of Empolyees” from a value of 200 to a value of 300, the value of the variable input$numberee will change from 200 to 300 accordingly.

Benefits is a sequence of 20 numbers from 100 to 500, covering the range of benefits:

benefits <- round(seq(from = 100, to = 500, length.out = 20))

Let’s plot the benefits cost, the attrition cost and the overall cost as a function of the benefits:

Cost Vs Benefits

The different cost components are calculated below:

benefitsCostV <- input$numberee * benefits

attritionCostV <- input$salary * nTrainingMonths * input$numberee * (attrition / 100) totalCostV <- benefitsCostV + attritionCostV

And now we can plot the different cost components:

plot(benefitsCostV ~ benefits,col = “red”, main = “Cost vs. Benefits”, xlab = “benefits($)”, ylab = “cost($)”)

lines(benefits,benefitsCostV, col = “red”, lwd = 3)

points(benefits,attritionCostV, col = “blue”)

lines(benefits,attritionCostV, col = “blue”,lwd = 3)

points(benefits,totalCostV, col = “purple”)

lines(benefits,totalCostV, col = “purple”,lwd = 3)

Let’s find the minimal cost, and draw a nice orange circle around this optimal point:

minBenefitsIndex <- which.min(totalCostV)

minBenefits <- benefits[minBenefitsIndex]

minBenefitsCost <- totalCostV[minBenefitsIndex]

abline(v=minBenefits,col = “cyan”, lty = “dashed”, lwd = 1)

symbols(minBenefits,minBenefitsCost,circles=20, fg = “darkorange”, inches = FALSE, add=TRUE, lwd = 2)

Tip: Don’t spend too much time on writing the perfect R code; your model might change a lot once your stakeholders will provide their feedback.


Shiny is web application framework for R that will turn your analyses into interactive web applications.

Let’s install Shiny and import the library functions:



Once we have the model in place, we will create the user interface and link it back to the model.

Create a new file named ui.R with the User Interface (UI) elements.

For example, let’s write some text:

titlePanel(“Cost Optimization (Benefits and Talent) – Simulation”),

h5(“This interactive application simulates the impact of multiple …..

And let’s add a slider:


“Number of Enployees:”,

min = 100,

max = 1000,

value = 200,

step = 100),

The inputId of the slider (numberee) is linking the value of the UI control (number of Employees) to the server side computation engine.

Create a Shiny account, copy-paste the token and a secret to the command line and execute in R-Studio:

shinyapps::setAccountInfo(name=’benefits’, token=’xxxx’, secret=’yyyy’)

And, deploy your code to the Shiny server:


Your prototype is live!

Send the URL to your stakeholders and collect their feedback. Iterate quickly and improve the model and the user interface.


Once your stakeholders are happy with your prototype, it’s time to move on to the next stage and develop your data product.

The good news is that at this stage you should have a pretty good understanding of the requirements and the priorities, based on the feedback provided by your stakeholders.

It’s also the perfect time for you (or for your product development group) to focus more on hosting, architecture, design, the Software Development Life-cycle (SDLC), quality assurance, release management and more.

There are different technologies to consider when developing data products, which I will cover in future posts.

For now I will just mention an interesting option, where you can reuse your server-side R code. Using yhat you can expose your server-side R functionality via a set of web services, and consume these services from a client-side JavaScript libraries, like d3js.

Comments, questions?

Let me know.

Level up with Massive Open Online Courses (MOOC)

Last week I was attending the IDC FutureScape, an annual event where IDC, a leading market research, analysis and advisory firm, shared their top 10 decision imperatives for the 2015 CIO agenda.

The keynote speaker, Sandra Ng, Group Vice President at ICT Practice, went through the slides mentioning the newest technologies and keywords like Big Data and Analytics, Data Science, Internet of Things, Digital Transformation, IT as a service (ITaaS), Cyber Security, DevOps , Application Provisioning and more.

IDC Predictions

Eight years ago when I’ve completed my M.Sc in Computer Science, most of these technologies were either very new or didn’t exist at all. Since the IT landscape is evolving so fast, how can we keep up and stay relevant as IT professionals?

There is obviously the traditional way of registering for an instructor led training, be it PMP, Prince2, advanced .NET or Java, sitting in smaller groups for a couple of days, having a nice lunch and getting a colorful certificate.

But there are other options to access a world-class education.

A massive open online course (MOOC) is an online course aimed at unlimited participation and open access via the web. In addition to traditional course materials such as videos, readings, and problem sets, MOOCs provide interactive user forums that help build a community for students, professors, and teaching assistants (TAs).

Are you keen to learn more on how Google cracked house number identification in Street View, achieving more than 98% recognition rates on these blurry images?

StreetviewWhy don’t you join Stanford’s Andrew Ng and his online class of 100,000 students attending his famous Machine Learning course? I took this course two years ago, and this guy is awesome! So awesome that Baidu, the Chinese search engine, just hired him as a chief scientist to open a new artificial intelligence lab in Silicon Valley.

You can also join Stanford’s’ professors, Trevor Hastie and Robert Tibshirani, teaching Statistical Learning, using open source tools and a free version of the text book An Introduction to Statistical Learning, with Applications in R – yup, it’s all free!

There is a huge variety of online classes, from Science to Art to Technology, from top universities like Harvard, Berkeley, Yale and others – Google the name of the university plus “MOOC” and start your journey.

Level Up!