Installing and using Hadoop and Pyspark on Ubuntu with VirtualBox and VMware
ubuntu-22.04.2-desktop-amd64.iso
su
usermod -aG ubuntu sudo
sudo apt install default-jdk
wget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz
tar -xvzf hadoop-3.3.6.tar.gz
su mv hadoop-3.3.6 /usr/local/hadoop
wget https://repo.anaconda.com/archive/Anaconda3-2023.07-1-Linux-x86_64.sh
bash Anaconda3-2023.07-1-Linux-x86_64.sh
In the following screenshots, we see Pyspark being used with Hadoop and a file successfully written to the local file system.
predictions.select('prediction', 'label').write \
.format('csv') \
.option('header', 'true') \
.save('predictions.csv')
Connecting to Github:
sudo dpkg -i gitkraken-amd64.deb
Jupyter Lab: