Hadoop local environment on win10

Hadoop local environment on win10

Hi there, Why do I need a hadoop local environment? am I crazy? I have two reasons to burn my PC running a hadoop local environment, first I need an environment to explore and learn without the risk of screwing something in the company servers, second, recently I got a request to import some oracle tables into HDFS with Sqoop, in order to do some development testing again without screwing something I need a safe place to play with. Said so you need two things, maybe three but one is optional.

1- (Optional) Install a WSL(Windows Subsystem for Linux) distribution like Ubuntu LTS from Microsoft store, docker will benefit from this with better performance.

2- Download and Install Docker

Tip. If you have a small SSD as primary storage I recommend you to move the docker images to another drive, you can follow the steps from Franks CHOW response on stackoverflow .

First, shut down your docker desktop by right click on the Docker Desktop icon and select Quit Docker Desktop

Then, open your command prompt:

wsl --list -v

You should be able to see, make sure the STATE for both is Stopped.

  NAME                   STATE           VERSION
* docker-desktop         Stopped         2
  docker-desktop-data    Stopped         2
Export docker-desktop-data into a file

wsl --export docker-desktop-data "D:\Docker\wsl\data\docker-desktop-data.tar" Unregister docker-desktop-data from wsl, note that after this, your ext4.vhdx file would automatically be removed (so back it up first if you have important existing image/container):

wsl --unregister docker-desktop-data

Import the docker-desktop-data back to wsl, but now the ext4.vhdx would reside in different drive/directory:

wsl --import docker-desktop-data "D:\Docker\wsl\data" "D:\Docker\wsl\data\docker-desktop-data.tar" --version 2

Start the Docker Desktop again and it should work

You may delete the D:\Docker\wsl\data\docker-desktop-data.tar file (NOT the ext4.vhdx file) if everything looks good for you after verifying.

3- Download sandbox from Cloudera

Tip. You should have more than 16gb of RAM in order to use the sandbox, I have 22gb and it's close to 100% usage.

4- You need a bash command line, you can use either git bash if you already have it, or you can go with the WSL linux bash you have choosen.

Tip. If you choose WSL bash then you can find your C: drive in /mnt/c path.

5- Install cloudera sandbox . In the decompressed folder, you will find shell script docker-deploy-.sh. From the command line, Linux / Mac / Windows(Git Bash), run the script:

cd /path/to/script
sh docker-deploy-{HDPversion}.sh

Then verify HDP sandbox was deployed successfully by issuing the command:

docker ps

When you want to stop/start your HDP sandbox, use it in the docker commands like:

docker stop sandbox-hdp
docker stop sandbox-proxy

And now you are ready to use hadoop. You should be able to use ambary from

127.0.0.1:8080/#/main/services/HIVE/configs

127.0.0.1:1080/splash.html#