Skip to content

Here I put my notes and codes from the course Scala and Spark for Big Data and Machine Learning on Udemy

Notifications You must be signed in to change notification settings

katarzynasornat/scala-and-spark-for-big-data-and-machine-learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 

Repository files navigation

scala-and-spark-for-big-data-and-machine-learning

Here I put my notes and codes from the course Scala and Spark for Big Data and Machine Learning on Udemy

Setup

Below I summarized all the points to cover the local setup on Windows for Hadoop 2.7 and Spark 2.0 + Scala (should also work for newer versions) which was required.

Step 1: Downloading JDK

Download the latest Java Development Kit that matches your system (32-bit vs 64-bit). You can find the download website from Oracle here or just Googling "Java Development Kit".

When you go to the proper place to download JDK, you should see something like this (in my case I picked the proper 64-bit version for my Windows):

Image 1

Before downloading you need to accept the License Agreements and register your account:

Image 2

Step 2: Downloading Hadoop & Spark

Go to (apache.spark.org)[http://spark.apache.org/downloads.html] and download a pre-built version of Spark.

Image 3

In my case it was a bit older version than the current one (pre-built for Hadoop 2.7 and Later) and preferably Spark 2.0 or later. Therefore I had to find it in previous releases:

Image 3

Step 3: Installation of JDK

Go to the place where you downloaded Java Development Kit (in my case it is in Downloads).

Image 3

Click and install. I followed the default settings and in my case everything was installed in C:\Program Files\Java but is important to remember the path, as we will use this later.

Image 3

Step 4: Installing Hadoop

Extract the downloaded tar.gz file, in my case it is spark-2.0.2-bin-hadoop2.7-tar.gz file. You may need to extract this twice in order to get the full folder to show.

Once you have this folder, go to your C drive and create a new folder called Spark and copy and paste the contents of the unzipped spark-2.0.2-bin-hadoop2.7-tar.gz file to this new Spark folder you just created.

Step 5: Placing winutils.exe

Step 6: Setting PATH and enviromental variables

Step 7: Testing

Step 8: Installing Atom (additional step)

Installation of version 1.51 recommended as for more recent there is an error during platformio installation if you do not have Visual Studio Code installed. https://github.com/atom/atom/releases/tag/v1.51.0

About

Here I put my notes and codes from the course Scala and Spark for Big Data and Machine Learning on Udemy

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published