scala-and-spark-for-big-data-and-machine-learning

Here I put my notes and codes from the course Scala and Spark for Big Data and Machine Learning on Udemy

Setup

Below I summarized all the points to cover the local setup on Windows for Hadoop 2.7 and Spark 2.0 + Scala (should also work for newer versions) which was required.

Step 1: Downloading JDK

Download the latest Java Development Kit that matches your system (32-bit vs 64-bit). You can find the download website from Oracle here or just Googling "Java Development Kit".

When you go to the proper place to download JDK, you should see something like this (in my case I picked the proper 64-bit version for my Windows):

Before downloading you need to accept the License Agreements and register your account:

Step 2: Downloading Hadoop & Spark

Go to (apache.spark.org)[http://spark.apache.org/downloads.html] and download a pre-built version of Spark.

In my case it was a bit older version than the current one (pre-built for Hadoop 2.7 and Later) and preferably Spark 2.0 or later. Therefore I had to find it in previous releases:

Go: https://archive.apache.org/dist/spark/
Pick proper version you want to install. In my case it was the highlighted one:

Step 3: Installation of JDK

Go to the place where you downloaded Java Development Kit (in my case it is in Downloads).

Click and install. I followed the default settings and in my case everything was installed in C:\Program Files\Java but is important to remember the path, as we will use this later.

Step 4: Installing Hadoop

Extract the downloaded tar.gz file, in my case it is spark-2.0.2-bin-hadoop2.7-tar.gz file. You may need to extract this twice in order to get the full folder to show.

Once you have this folder, go to your C drive and create a new folder called Spark and copy and paste the contents of the unzipped spark-2.0.2-bin-hadoop2.7-tar.gz file to this new Spark folder you just created.

Step 5: Placing winutils.exe

Step 6: Setting PATH and enviromental variables

Step 7: Testing

Step 8: Installing Atom (additional step)

Installation of version 1.51 recommended as for more recent there is an error during platformio installation if you do not have Visual Studio Code installed. https://github.com/atom/atom/releases/tag/v1.51.0

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
img		img
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scala-and-spark-for-big-data-and-machine-learning

Setup

Step 1: Downloading JDK

Step 2: Downloading Hadoop & Spark

Step 3: Installation of JDK

Step 4: Installing Hadoop

Step 5: Placing winutils.exe

Step 6: Setting PATH and enviromental variables

Step 7: Testing

Step 8: Installing Atom (additional step)

About

Releases

Packages

katarzynasornat/scala-and-spark-for-big-data-and-machine-learning

Folders and files

Latest commit

History

Repository files navigation

scala-and-spark-for-big-data-and-machine-learning

Setup

Step 1: Downloading JDK

Step 2: Downloading Hadoop & Spark

Step 3: Installation of JDK

Step 4: Installing Hadoop

Step 5: Placing winutils.exe

Step 6: Setting PATH and enviromental variables

Step 7: Testing

Step 8: Installing Atom (additional step)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages