Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • sgraupner/ds_cs4bd_2324
  • akbi5459/ds_cs4bd_2324
  • s90907/ds_cs4bd_2324
  • fapr2511/ds_cs4bd_2324
4 results
Show changes
Showing
with 125662 additions and 0 deletions
#######################################################################
# Dockerfile to build image for Python-Alpine container that also has
# ssh access.
#
# Use python:alpine image and install needed packages for ssh:
# - openrc, system services control system
# - openssh, client- and server side ssh
# - sudo, utility to enable root rights to users
#
FROM python:3.9.0-alpine
RUN apk update
RUN apk add --no-cache openrc
RUN apk add --update --no-cache openssh
RUN apk add --no-cache sudo
# adjust sshd configuration
RUN echo 'PasswordAuthentication yes' >> /etc/ssh/sshd_config
RUN echo 'PermitEmptyPasswords yes' >> /etc/ssh/sshd_config
RUN echo 'IgnoreUserKnownHosts yes' >> /etc/ssh/sshd_config
# add user larry with empty password
RUN adduser -h /home/larry -s /bin/sh -D larry
RUN echo -n 'larry:' | chpasswd
# add larry to sudo'ers list
RUN mkdir -p /etc/sudoers.d
RUN echo '%wheel ALL=(ALL) NOPASSWD: ALL' >> /etc/sudoers.d/wheel
RUN adduser larry wheel
# generate host key
RUN ssh-keygen -A
# add sshd as service, start on boot [default], touch file to prevent error:
# "You are attempting to run an openrc service on a system which openrc did not boot."
RUN rc-update add sshd default
RUN mkdir -p /run/openrc
RUN touch /run/openrc/softlevel
# sshd is started in /entrypoint.sh
#
#######################################################################
ENTRYPOINT ["/entrypoint.sh"]
EXPOSE 22
COPY entrypoint.sh /
#!/bin/sh
# ssh-keygen -A
exec /usr/sbin/sshd -D -e "$@"
#######################################################################
# Dockerfile to build image for Alpine container that has sshd daemon.
#
# Use bare Alpine image and install all needed packages:
# - openrc, system services control system
# - openssh, client- and server side ssh
# - sudo, utility to enable root rights to users
#
FROM alpine:latest
RUN apk update
RUN apk add --no-cache openrc
RUN apk add --update --no-cache openssh
RUN apk add --no-cache sudo
# adjust sshd configuration
RUN echo 'PasswordAuthentication yes' >> /etc/ssh/sshd_config
RUN echo 'PermitEmptyPasswords yes' >> /etc/ssh/sshd_config
RUN echo 'IgnoreUserKnownHosts yes' >> /etc/ssh/sshd_config
# add user larry with empty password
RUN adduser -h /home/larry -s /bin/sh -D larry
RUN echo -n 'larry:' | chpasswd
# add larry to sudo'ers list
RUN mkdir -p /etc/sudoers.d
RUN echo '%wheel ALL=(ALL) NOPASSWD: ALL' >> /etc/sudoers.d/wheel
RUN adduser larry wheel
# generate host key
RUN ssh-keygen -A
# add sshd as service, start on boot [default], touch file to prevent error:
# "You are attempting to run an openrc service on a system which openrc did not boot."
RUN rc-update add sshd default
RUN mkdir -p /run/openrc
RUN touch /run/openrc/softlevel
# sshd is started in /entrypoint.sh
#
#######################################################################
ENTRYPOINT ["/entrypoint.sh"]
EXPOSE 22
COPY entrypoint.sh /
### Docker-compose
[Docker-compose](https://docs.docker.com/compose/features-uses) is a tool set
for Docker to automate building, configuring and running containers from a single file:
[docker-compose.yaml](https://docs.docker.com/compose/compose-file) specification.
Containers are referred to as *services* in the Docker-compose specification.
When a specified image does not exist and the build-tag of a service refers to a
directory where the container can be built from a
[Dockerfile](https://docs.docker.com/engine/reference/builder) (must be in same
directory of `docker-compose.yaml`):
```
docker-compose up -d
```
will automatically perform these steps (`-d` starts container in background):
1. build the image from `Dockerfile`,
1. register the image locally,
1. create a new container from the image,
1. register the container locally and
1. start it.
Multilpe containers can be specified in a single `docker-compose.yaml` file and
started in a defined order expressed by dependencies (`depends_on`-tag, e.g. to
express that a database service must be started before an application service
that is depending on it).
To stop all services specified in a `docker-compose.yaml` file:
```
docker-compose stop
```
To (re-)start services and show their running states:
```
docker-compose start
docker-compose ps
```
The `alpine-sshd` container can therefore always fully be re-produced from the
specifications in this directory.
Images and containers should always be reproduceable. They can be deleted any
time and recovered from specifications.
Container specifications are therefore common in code repositories controlling
automated *build-* and *deployment*-processes.
The principle implies that state ("data" such as databases) should not be
stored in containers and rather reside on outside volumes that are
[mounted](https://docs.docker.com/storage/volumes)
into the container.
Build and start `alpine-sshd` container from scratch:
```
docker-compose up
[+] Running 0/1
- alpine-sshd Error 2.5s
[+] Building 0.3s (23/23) FINISHED
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 32B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/library/alpine:latest 0.0s
=> [ 1/18] FROM docker.io/library/alpine:latest 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 34B 0.0s
=> CACHED [ 2/18] RUN apk update 0.0s
=> CACHED [ 3/18] RUN apk add --no-cache openrc 0.0s
=> CACHED [ 4/18] RUN apk add --update --no-cache openssh 0.0s
=> CACHED [ 5/18] RUN apk add --no-cache sudo 0.0s
=> CACHED [ 6/18] RUN echo 'PasswordAuthentication yes' >> /etc/ssh/sshd 0.0s
=> CACHED [ 7/18] RUN echo 'PermitEmptyPasswords yes' >> /etc/ssh/sshd_c 0.0s
=> CACHED [ 8/18] RUN echo 'IgnoreUserKnownHosts yes' >> /etc/ssh/sshd_c 0.0s
=> CACHED [ 9/18] RUN adduser -h /home/larry -s /bin/sh -D larry 0.0s
=> CACHED [10/18] RUN echo -n 'larry:' | chpasswd 0.0s
=> CACHED [11/18] RUN mkdir -p /etc/sudoers.d 0.0s
=> CACHED [12/18] RUN echo '%wheel ALL=(ALL) NOPASSWD: ALL' >> /etc/sudo 0.0s
=> CACHED [13/18] RUN adduser larry wheel 0.0s
=> CACHED [14/18] RUN ssh-keygen -A 0.0s
=> CACHED [15/18] RUN rc-update add sshd default 0.0s
=> CACHED [16/18] RUN mkdir -p /run/openrc 0.0s
=> CACHED [17/18] RUN touch /run/openrc/softlevel 0.0s
=> CACHED [18/18] COPY entrypoint.sh / 0.0s
=> exporting to image 0.1s
=> => exporting layers 0.0s
=> => writing image sha256:5664d856423d679de32c4b58fc1bb55d5973acb62507d 0.0s
=> => naming to docker.io/library/alpine-sshd 0.0s
Use 'docker scan' to run Snyk tests against images to find vulnerabilities and l
earn how to fix them
[+] Running 2/2
- Network alpine-sshd_default C... 0.1s
- Container alpine-sshd-alpine-sshd-1 Created 0.2s
Attaching to alpine-sshd-alpine-sshd-1
alpine-sshd-alpine-sshd-1 | Server listening on 0.0.0.0 port 22.
alpine-sshd-alpine-sshd-1 | Server listening on :: port 22.
```
Show running container:
```
docker-compose ps
NAME COMMAND SERVICE STATUS PORTS
alpine-sshd-alpine-sshd-1 "/entrypoint.sh" alpine-sshd running 0.0.0.0:22->22/tcp
```
Log in as user *larry* that was configured when the container was built from
the `Dockerfile`:
```
ssh larry@localhost
```
Output:
```
ssh larry@localhost
The authenticity of host 'localhost (::1)' can't be established.
ED25519 key fingerprint is SHA256:5ZZ4bnRJh3DxlDaWJooC1qYjKj00U+pHCuNGEWZPVqA.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Could not create directory '/cygdrive/c/Sven1/svgr/.ssh' (No such file or directory).
Failed to add the host to the list of known hosts (/cygdrive/c/Sven1/svgr/.ssh/known_hosts).
Welcome to Alpine!
The Alpine Wiki contains a large amount of how-to guides and general
information about administrating Alpine systems.
See <http://wiki.alpinelinux.org/>.
You can setup the system with the command: setup-alpine
You may change this message by editing /etc/motd.
85dbbb7c316a:~$ ls -la
total 16
drwxr-sr-x 1 larry larry 4096 Nov 1 12:48 .
drwxr-xr-x 1 root root 4096 Oct 7 18:31 ..
-rw------- 1 larry larry 7 Nov 1 12:48 .ash_history
85dbbb7c316a:~$ whoami
larry
85dbbb7c316a:~$ pwd
/home/larry
85dbbb7c316a:~$
```
Stop container:
```
docker-compose stop
- Container alpine-sshd-alpine-sshd-1 Stopped 0.3s
docker-compose ps
NAME COMMAND SERVICE STATUS
PORTS
alpine-sshd-alpine-sshd-1 "/entrypoint.sh" alpine-sshd exited (0)
```
Restart same container:
```
docker-compose start
[+] Running 1/1
- Container alpine-sshd-alpine-sshd-1 Started 0.4s
docker-compose ps
NAME COMMAND SERVICE STATUS PORTS
alpine-sshd-alpine-sshd-1 "/entrypoint.sh" alpine-sshd running 0.0.0.0:22->22/tcp
```
#######################################################################
# Create, start/stop container with docker-compose.
#
# Build image and container (once):
# - docker-compose up -d
# creates/builds image from Dockerfile, creates and runs container:
# - new local image: alpine-sshd from Dockerfile
# - new container created from image and started.
# --> container is running
#
# From the image, the (same) container can be restarted and stopped:
# - docker-compose start
# - docker-compose stop
#
#######################################################################
services:
alpine-sshd:
build: .
image: alpine-sshd
ports:
- "22:22" # host-env:container
# command: ["exec", "/usr/sbin/sshd"]
#!/bin/sh
# ssh-keygen -A
exec /usr/sbin/sshd -D -e "$@"
# /etc/init.d/sshd start
# service sshd restart
#######################################################################
# Steps to set up a bare Alpine container for ssh access.
#
# create and run bare Alpine container with name "alpine-ssh"
docker run --name alpine-ssh -p 22:22 -it alpine:latest
#######################################################################
# update package list and install all needed packages
# - openrc, system services control system
# - openssh, client- and server side ssh
# - sudo, utility to enable root rights to users
#
apk update
apk add --no-cache openrc
apk add --update --no-cache openssh
apk add --no-cache sudo
# adjust sshd configuration
echo 'PasswordAuthentication yes' >> /etc/ssh/sshd_config
echo 'PermitEmptyPasswords yes' >> /etc/ssh/sshd_config
echo 'IgnoreUserKnownHosts yes' >> /etc/ssh/sshd_config
# add user larry with empty password
adduser -h /home/larry -s /bin/sh -D larry
echo -n 'larry:' | chpasswd
# add larry to sudo'ers list
mkdir -p /etc/sudoers.d
echo '%wheel ALL=(ALL) NOPASSWD: ALL' >> /etc/sudoers.d/wheel
adduser larry wheel
# generate host key in /etc/ssh, e.g. /etc/ssh/ssh_host_rsa_key.pub
ssh-keygen -A
# add sshd as service, start on boot [default], touch file to prevent error:
# "You are attempting to run an openrc service on a system which openrc did not boot."
# rc-update add sshd default
rc-update add sshd
mkdir -p /run/openrc
touch /run/openrc/softlevel
# start sshd - ssh larry@localhost now working
# /etc/init.d/sshd start
# service sshd start
# ---- exec prevents shell as parent process
# exec /usr/sbin/sshd -D -e &
exec /usr/sbin/sshd &
#
#######################################################################
#
# stop alpine-ssh container
docker stop alpine-ssh
# start alpine-ssh container, create root sh (exec requires a started container)
docker start alpine-ssh
docker exec -it alpine-ssh /bin/sh
# start sshd in container
/etc/init.d/sshd restart
service sshd restart
/etc/init.d/sshd status
service sshd status
#######################################################################
# build docker container "alpine-sshd" from Dockerfile, entrypoint.sh
# image file "alpine-sshd" is 18.5 MB
docker build -t alpine-sshd .
docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
alpine-sshd latest 0d286d424c80 1 min ago 18.5MB
docker run --name alpine-sshd -p 22:22 -it -d alpine-sshd:latest
docker start alpine-sshd
docker exec -it alpine-sshd /bin/sh
docker stop alpine-sshd
#######################################################################
# References:
# How to enable and start services on Alpine Linux
# https://www.cyberciti.biz/faq/how-to-enable-and-start-services-on-alpine-linux
# How to install OpenSSH server on Alpine Linux
# https://www.cyberciti.biz/faq/how-to-install-openssh-server-on-alpine-linux-including-docker
# https://wiki.alpinelinux.org/wiki/Setting_up_a_SSH_server
# How To Set Up a Firewall with Awall on Alpine Linux
# https://www.cyberciti.biz/faq/how-to-set-up-a-firewall-with-awall-on-alpine-linux/
# Add, Delete And Grant Sudo Privileges To Users In Alpine Linux
# https://ostechnix.com/add-delete-and-grant-sudo-privileges-to-users-in-alpine-linux/
#
#######################################################################
# Jupyter with Docker Compose
This repository contains a simple docker-compose definition for launching the popular Jupyter Data Science Notebook.
You can define a password with the script ```generate_token.py -p S-E-C-R-E-T``` and generate SSL certificates as described below.
## Control the container:
* ```docker-compose up``` mounts the directory and starts the container
* ```docker-compose down``` destroys the container
## The compose file: docker-compose.yml
```bash
version: '3'
services:
datascience-notebook:
image: jupyter/datascience-notebook
volumes:
- ${LOCAL_WORKING_DIR}:/home/jovyan/work
- ${LOCAL_DATASETS}:/home/jovyan/work/datasets
- ${LOCAL_MODULES}:/home/jovyan/work/modules
- ${LOCAL_SSL_CERTS}:/etc/ssl/notebook
ports:
- ${PORT}:8888
container_name: jupyter_notebook
command: "start-notebook.sh \
--NotebookApp.password=${ACCESS_TOKEN} \
--NotebookApp.certfile=/etc/ssl/notebook/jupyter.pem"
```
## Example with a custom user
```YAML
version: '2'
services:
datascience-notebook:
image: jupyter/base-notebook:latest
volumes:
- /tmp/jupyter_test_dir:/home/docker_worker/work
ports:
- 8891:8888
command: "start-notebook.sh"
user: root
environment:
NB_USER: docker_worker
NB_UID: 1008
NB_GID: 1011
CHOWN_HOME: 'yes'
CHOWN_HOME_OPTS: -R
```
## The environment file .env
```bash
# Define a local data directory
# Set permissions for the container:
# sudo chown -R 1000 ${LOCAL_WORKING_DIR}
LOCAL_WORKING_DIR=/data/jupyter/notebooks
# Generate an access token like this
# import IPython as IPython
# hash = IPython.lib.passwd("S-E-C-R-E-T")
# print(hash)
# You can use the script generate_token.py
ACCESS_TOKEN=sha1:d4c78fe19cb5:0c8f830971d52da9d74b9985a8b87a2b80fc6e6a
# Host port
PORT=8888
# Provide data sets
LOCAL_DATASETS=/data/jupyter/datasets
# Provide local modules
LOCAL_MODULES=/home/git/python_modules
# SSL
# Generate cert like this:
# openssl req -x509 -nodes -newkey rsa:2048 -keyout jupyter.pem -out jupyter.pem
# Copy the jupyter.pem file into the location below.
LOCAL_SSL_CERTS=/opt/ssl-certs/jupyter
```
# Version Conflicts
Make sure to have the latest versions installed. You can use the Notebook Browser interface.
```python
pip install -U jupyter
```
# pip install -r ~/work/requirements.txt
# export PYTHONPATH=PYTHONPATH:/home/jovyan/work
\ No newline at end of file
version: '3'
#
# adopted from:
# https://github.com/dsmits/jupyter-docker-compose/blob/main/docker-compose.yml
# using jupyter/minimal-notebook image 1.47 GB
#
# must enable: Docker Desktop -> Settings -> Resources -> File Sharing
# path: C:/Sven1/svgr/workspaces/cs4bigdata/B_setup_docker/jupyter
#
# last started with:
# http://localhost:8888/lab?token=cbc1aa82b9144579334507e07607d4c83d8dfa9e11473df8
#
services:
jupyter:
image: jupyter/minimal-notebook
volumes:
- .:/home/jovyan/work
- ./configure_environment.sh:/usr/local/bin/before-notebook.d/configure_environment.sh
ports:
- 8888:8888
#
# alternative:
# https://github.com/stefanproell/jupyter-notebook-docker-compose
# using jupyter/datascience-notebook image 4.56 GB
#
# services:
# datascience-notebook:
# # image: jupyter/datascience-notebook
# image: jupyter/minimal-notebook
# volumes:
# - ${LOCAL_WORKING_DIR}:/home/jovyan/work
# - ${LOCAL_DATASETS}:/home/jovyan/work/datasets
# - ${LOCAL_MODULES}:/home/jovyan/work/modules
# - ${LOCAL_SSL_CERTS}:/etc/ssl/notebook
# ports:
# - ${PORT}:8888
# container_name: jupyter_notebook
# command: "start-notebook.sh \
# --NotebookApp.password=${ACCESS_TOKEN} \
# --NotebookApp.certfile=/etc/ssl/notebook/jupyter.pem"
#!/usr/bin/env sh
# import IPython as IPython
# import bcrypt
import hashlib
# python generate_token.py --password=password
#
# Generate a access token
# Copy this line into the .env file:
# ACCESS_TOKEN=5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8
if __name__ == "__main__":
print("Generate a access token")
from argparse import ArgumentParser
parser = ArgumentParser()
parser.add_argument("-p",
"--password",
dest="password",
help="The password you want to use for authentication.",
required=True)
args = parser.parse_args()
print("\nCopy this line into the .env file:\n")
# hash = IPython.lib.passwd(args.password)
dataBase_password = args.password
hash = hashlib.sha1(dataBase_password.encode()).hexdigest()
print("ACCESS_TOKEN=" + hash)
-----BEGIN PRIVATE KEY-----
MIIEvgIBADANBgkqhkiG9w0BAQEFAASCBKgwggSkAgEAAoIBAQCgq1Hakw3KI+uv
d3LSJWGdl+uUX0FkVInUuFzQGDsyToPOo9qIIsYP5Avo3Tn7AXoQKoYz/zR0MOms
WBqkztah9ahS8uUkDkwHcJ6BIFZ9UApWQ+klEnHkBgSzDjfZhDZT/sIlEdCLOYsi
agpUdZtnDKmkpWTAAn4Tc6vLMj5Xi7mqLDybbsVlyZ/Iy/nvG1pZknmE/R7HOs8g
IkuZ7KKvbaaN6nOfujeCOCCmTyCefjU0famZxzVJxAbUwo40BMT6w7VVmUOwoNS0
LUtcVUDZ+alINrO59fu7oKyQgVr2yeVyoqUJPD/FgS/fFXSAoEtf5e1ekMXgcvl4
EGacVtwfAgMBAAECggEAGmYLvO4MhfoA74OgygZ6U3pyqp48EFATlW/1T/urPkjI
P1uMvHF6OYIussQmkqdbduyFwGVeKPkga8DOH+YcPeAvF/Hw1EvFEjPe1ziI/W35
RNNDq2OsctrKSuE7K/IdOw/QtmaG7Vk3EyB5Mgdg0T2zYeoK88F1FZ0bzPckZx27
X9Df5AXJdAEx2ikK5+vG6hDpKgDGHa5e3+96PRTqlv0AJqw9FDGLpheCNZ6F/TnR
KQWhyG3F8njQ/B97L3yiY7K7fZ/XQbgnvxFtKdSwdCkptKjQCq53alwh09Qb9YWo
l5J+TZa3xpgZX/0QEwzpSsh7kcRFZ2Zu+8fVpw14AQKBgQDTZ7D69DCblv367+c7
7PmWQkQjEGUlRxi0GRoBAgQ/FcfNnKhJVKSYThjtPcAt7E8YM/A8VlVtOiTrZnS7
Z+7CT+jRIK03jmegDgAegERO/oiH3jIRwLLFYFlOzVj3/XlwEbtRjQR2IdZkaeFC
mx9wqu5y5Vo7ml98H2Vsg8O0XwKBgQDCj8oA3wZOePSWH2Lo+BU9rP3/2SjmIzZZ
EhpO3ARa8tuxS37wFXuRtWDPb2NO2b+yzo9zxrA0Fbyx8kl3Jg03iH+eNQh12A/m
HikUoK6PgdKPZcYjONKRcH7+SUsBvxOkO1y5zqRzWwQvsAK+QaDky14IHSnRIOWx
BrkEZJPwQQKBgFPqDuguUbUQ5FPdMm4pDJFGUIGSmnOHmxix9g58XG8mGB9Xlb01
6ffC2EYjgss3x9WVmEB7DIHE2K7QBnn1MWLUEVghnmA1GJEBva5dv7+TbWJxInLF
iLCsJAcRn8UgSjnf7/jY/vJdUBqfpJiptnskfm4A+CY8irZcSAgg7WgFAoGBAKeX
PCWr9r65qdV2i7ipmYJa9R/haz1xr2riEQ9Ereu5rkv2AA3GM367gfysshpFrr7S
9vZ/e2AiKTwOvAGKIXBof6VDgVohFvDdof1Gu5aZ+UnUHOxSEe99u6ZGc/m5Ia4i
BCl5OmazS9PYBUTlOzZZh1Ht7QtbDv+CDvUdveEBAoGBALFBGDBB+syXZfnFxD6r
TZ8E3ZH9uGKWFnSLjMY1jmz0g1knFya6yn7O9N7/KoFZwMf4qUvpKBzSO8l6ZJ/w
oh7YGKrZQrcIneHm2t12BAH+7LoaNLGNXEaA5q4D2+jGC00Ma0GL/qjOHMGXT9oO
mLrQZttUIVEfOe7Qi+JnINK2
-----END PRIVATE KEY-----
-----BEGIN CERTIFICATE-----
MIIEGTCCAwGgAwIBAgIUWiant8BjxGVzc2j/FvGP05SFmdYwDQYJKoZIhvcNAQEL
BQAwgZsxCzAJBgNVBAYTAkRFMRQwEgYDVQQIDAtCcmFuZGVuYnVyZzEVMBMGA1UE
BwwMS2xlaW5tYWNobm93MQwwCgYDVQQKDANCSFQxEzARBgNVBAsMCkluZm9ybWF0
aWsxEDAOBgNVBAMMB2p1cHl0ZXIxKjAoBgkqhkiG9w0BCQEWG3N2ZW4uZ3JhdXBu
ZXJAYmh0LWJlcmxpbi5kZTAeFw0yMjEyMTgyMDQ4MDRaFw0yMzAxMTcyMDQ4MDRa
MIGbMQswCQYDVQQGEwJERTEUMBIGA1UECAwLQnJhbmRlbmJ1cmcxFTATBgNVBAcM
DEtsZWlubWFjaG5vdzEMMAoGA1UECgwDQkhUMRMwEQYDVQQLDApJbmZvcm1hdGlr
MRAwDgYDVQQDDAdqdXB5dGVyMSowKAYJKoZIhvcNAQkBFhtzdmVuLmdyYXVwbmVy
QGJodC1iZXJsaW4uZGUwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQCg
q1Hakw3KI+uvd3LSJWGdl+uUX0FkVInUuFzQGDsyToPOo9qIIsYP5Avo3Tn7AXoQ
KoYz/zR0MOmsWBqkztah9ahS8uUkDkwHcJ6BIFZ9UApWQ+klEnHkBgSzDjfZhDZT
/sIlEdCLOYsiagpUdZtnDKmkpWTAAn4Tc6vLMj5Xi7mqLDybbsVlyZ/Iy/nvG1pZ
knmE/R7HOs8gIkuZ7KKvbaaN6nOfujeCOCCmTyCefjU0famZxzVJxAbUwo40BMT6
w7VVmUOwoNS0LUtcVUDZ+alINrO59fu7oKyQgVr2yeVyoqUJPD/FgS/fFXSAoEtf
5e1ekMXgcvl4EGacVtwfAgMBAAGjUzBRMB0GA1UdDgQWBBS4ojNVtQI3kPq8uVKh
lMsZPSpQUzAfBgNVHSMEGDAWgBS4ojNVtQI3kPq8uVKhlMsZPSpQUzAPBgNVHRMB
Af8EBTADAQH/MA0GCSqGSIb3DQEBCwUAA4IBAQBoEUE2m4EeC7SIrv3SESbiAQc4
qtXpJkS8jdQq3cuBx0zvDf41nN84JmxCSGLYsYDnQKjY5KgW9Azkf0jANEvxM6re
Ay4nyZAU9SmQWj4tQAVD5hnmtN3mE7bWwjaRr7B/D407NpmhhfamX2k+hFKc3Kr6
ilUNNgQoOqSoo3hEhavrpPIRhVDWza/glOd2coPLycbt4D7yYT7QrMcadCotiLnr
yBmPpKgaNQaNAHKVSkVN2aKKKtzLZkVutaQRL5W4OgWe5+J5dy4WeiNFB4jxqR46
C33htIGVmZW5Isa5kIQn0SOSaK3Nb+C4WTlJU87tTdmZnL8MDTugEN8YY8jq
-----END CERTIFICATE-----
File added
# Assignment H1: Solve the Shakespeare Challenge &nbsp; (8 Pts)
[William Shakespeare](https://en.wikipedia.org/wiki/William_Shakespeare) (1564 ‐ 1616)
has written many plays that have become popular for text analysis.
- [Project Gutenberg](http://gutenberg.org) is one project that has compiled
Shakespeare’s plays into a single text file
[Shakespeare.txt](data/Shakespeare.txt).
- [Processing Shakespeare](https://lmackerman.com/AdventuresInR/docs/shakespeare.nb.html)
is a project that aimes at visualizing Shakespeare’s texts.
&nbsp;
---
Task: [H1_Analyzing_Shakespeare.pdf](H1_Analyzing_Shakespeare.pdf)
- use: [Shakespeare.txt](data/Shakespeare.txt).
# Apache Spark Standalone Cluster on Docker
> The project was featured on an **[article](https://www.mongodb.com/blog/post/getting-started-with-mongodb-pyspark-and-jupyter-notebook)** at **MongoDB** official tech blog! :scream:
> The project just got its own **[article](https://towardsdatascience.com/apache-spark-cluster-on-docker-ft-a-juyterlab-interface-418383c95445)** at **Towards Data Science** Medium blog! :sparkles:
## Introduction
This project gives you an **Apache Spark** cluster in standalone mode with a **JupyterLab** interface built on top of **Docker**.
Learn Apache Spark through its **Scala**, **Python** (PySpark) and **R** (SparkR) API by running the Jupyter [notebooks](build/workspace/) with examples on how to read, process and write data.
<p align="center"><img src="docs/image/cluster-architecture.png"></p>
![build-master](https://github.com/cluster-apps-on-docker/spark-standalone-cluster-on-docker/workflows/build-master/badge.svg)
![sponsor](https://img.shields.io/badge/patreon-sponsor-ff69b4)
![jupyterlab-latest-version](https://img.shields.io/docker/v/andreper/jupyterlab/3.0.0-spark-3.0.0?color=yellow&label=jupyterlab-latest)
![spark-latest-version](https://img.shields.io/docker/v/andreper/spark-master/3.0.0?color=yellow&label=spark-latest)
![spark-scala-api](https://img.shields.io/badge/spark%20api-scala-red)
![spark-pyspark-api](https://img.shields.io/badge/spark%20api-pyspark-red)
![spark-sparkr-api](https://img.shields.io/badge/spark%20api-sparkr-red)
## TL;DR
```bash
curl -LO https://raw.githubusercontent.com/cluster-apps-on-docker/spark-standalone-cluster-on-docker/master/docker-compose.yml
docker-compose up
```
## Contents
- [Quick Start](#quick-start)
- [Tech Stack](#tech-stack)
- [Metrics](#metrics)
- [Contributing](#contributing)
- [Contributors](#contributors)
- [Support](#support)
## <a name="quick-start"></a>Quick Start
### Cluster overview
| Application | URL | Description |
| --------------- | ---------------------------------------- | ---------------------------------------------------------- |
| JupyterLab | [localhost:8888](http://localhost:8888/) | Cluster interface with built-in Jupyter notebooks |
| Spark Driver | [localhost:4040](http://localhost:4040/) | Spark Driver web ui |
| Spark Master | [localhost:8080](http://localhost:8080/) | Spark Master node |
| Spark Worker I | [localhost:8081](http://localhost:8081/) | Spark Worker node with 1 core and 512m of memory (default) |
| Spark Worker II | [localhost:8082](http://localhost:8082/) | Spark Worker node with 1 core and 512m of memory (default) |
### Prerequisites
- Install [Docker](https://docs.docker.com/get-docker/) and [Docker Compose](https://docs.docker.com/compose/install/), check **infra** [supported versions](#tech-stack)
### Download from Docker Hub (easier)
1. Download the [docker compose](docker-compose.yml) file;
```bash
curl -LO https://raw.githubusercontent.com/cluster-apps-on-docker/spark-standalone-cluster-on-docker/master/docker-compose.yml
```
2. Edit the [docker compose](docker-compose.yml) file with your favorite tech stack version, check **apps** [supported versions](#tech-stack);
3. Start the cluster;
```bash
docker-compose up
```
4. Run Apache Spark code using the provided Jupyter [notebooks](build/workspace/) with Scala, PySpark and SparkR examples;
5. Stop the cluster by typing `ctrl+c` on the terminal;
6. Run step 3 to restart the cluster.
### Build from your local machine
> **Note**: Local build is currently only supported on Linux OS distributions.
1. Download the source code or clone the repository;
2. Move to the build directory;
```bash
cd build
```
3. Edit the [build.yml](build/build.yml) file with your favorite tech stack version;
4. Match those version on the [docker compose](build/docker-compose.yml) file;
5. Build up the images;
```bash
chmod +x build.sh ; ./build.sh
```
6. Start the cluster;
```bash
docker-compose up
```
7. Run Apache Spark code using the provided Jupyter [notebooks](build/workspace/) with Scala, PySpark and SparkR examples;
8. Stop the cluster by typing `ctrl+c` on the terminal;
9. Run step 6 to restart the cluster.
## <a name="tech-stack"></a>Tech Stack
- Infra
| Component | Version |
| -------------- | ------- |
| Docker Engine | 1.13.0+ |
| Docker Compose | 1.10.0+ |
- Languages and Kernels
| Spark | Hadoop | Scala | [Scala Kernel](https://almond.sh/) | Python | [Python Kernel](https://ipython.org/) | R | [R Kernel](https://irkernel.github.io/) |
| ----- | ------ | ------- | ---------------------------------- | ------ | ------------------------------------- | ----- | --------------------------------------- |
| 3.x | 3.2 | 2.12.10 | 0.10.9 | 3.7.3 | 7.19.0 | 3.5.2 | 1.1.1 |
| 2.x | 2.7 | 2.11.12 | 0.6.0 | 3.7.3 | 7.19.0 | 3.5.2 | 1.1.1 |
- Apps
| Component | Version | Docker Tag |
| -------------- | ----------------------- | ---------------------------------------------------- |
| Apache Spark | 2.4.0 \| 2.4.4 \| 3.0.0 | **\<spark-version>** |
| JupyterLab | 2.1.4 \| 3.0.0 | **\<jupyterlab-version>**-spark-**\<spark-version>** |
## <a name="metrics"></a>Metrics
| Image | Size | Downloads |
| -------------------------------------------------------------- | ---------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------- |
| [JupyterLab](https://hub.docker.com/r/andreper/jupyterlab) | ![docker-size-jupyterlab](https://img.shields.io/docker/image-size/andreper/jupyterlab/latest) | ![docker-pull](https://img.shields.io/docker/pulls/andreper/jupyterlab) |
| [Spark Master](https://hub.docker.com/r/andreper/spark-master) | ![docker-size-master](https://img.shields.io/docker/image-size/andreper/spark-master/latest) | ![docker-pull](https://img.shields.io/docker/pulls/andreper/spark-master) |
| [Spark Worker](https://hub.docker.com/r/andreper/spark-worker) | ![docker-size-worker](https://img.shields.io/docker/image-size/andreper/spark-worker/latest) | ![docker-pull](https://img.shields.io/docker/pulls/andreper/spark-worker) |
## <a name="contributing"></a>Contributing
We'd love some help. To contribute, please read [this file](CONTRIBUTING.md).
## <a name="contributors"></a>Contributors
A list of amazing people that somehow contributed to the project can be found in [this file](CONTRIBUTORS.md). This
project is maintained by:
> **André Perez** - [dekoperez](https://twitter.com/dekoperez) - andre.marcos.perez@gmail.com
## <a name="support"></a>Support
> Support us on GitHub by staring this project :star:
> Support us on [Patreon](https://www.patreon.com/andreperez). :sparkling_heart:
\ No newline at end of file
# Assignment H: PySpark &nbsp; (10 Pts)
Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with:
- implicit data parallelism and
- fault tolerance.
&nbsp;
---
### Challenges
1. [Challenge 1:](#1-challenge-1-get-pyspark-containers) Get PySpark Containers - (3 Pts)
1. [Challenge 2:](#2-challenge-2-set-up-simple-pyspark-program) Set-up simple PySpark Program - (2 Pts)
1. [Challenge 3:](#3-challenge-3-run-on-pyspark-cluster) Run on PySpark Cluster - (3 Pts)
1. [Challenge 4:](#4-challenge-4-explain-the-program) Explain the Program - (2 Pts)
&nbsp;
---
### 1.) Challenge 1: Get PySpark Containers
Setup PySpark as Spark Standalone Cluster with Docker:
[https://github.com/cluster-apps-on-docker/spark-standalone-cluster-on-docker](https://github.com/cluster-apps-on-docker/spark-standalone-cluster-on-docker)
The setup looks like this:
![text](../markup/img/H_spark_cluster_architecture.png)
One simple command will:
- fetch all needed Docker images (~1.5GB).
- create containers for: Spark-Master, 2 Worker-Processes, Jupyter-Server.
- lauch all containers at once.
Clone the project and use as project directory:
```
git clone https://github.com/cluster-apps-on-docker/spark-standalone-cluster-on-docker
```
Fetch images, create and launch all containers with one command:
```
docker-compose up
```
It will launch the following containers:
![text](../markup/img/H_img01.png)
Open Urls:
<table>
<thead>
<tr>
<th>Application</th>
<th>URL</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>JupyterLab</td>
<td><a href="http://localhost:8888/" rel="nofollow">localhost:8888</a></td>
<td>Cluster interface with built-in Jupyter notebooks</td>
</tr>
<tr>
<td>Spark Driver</td>
<td><a href="http://localhost:4040/" rel="nofollow">localhost:4040</a></td>
<td>Spark Driver web ui</td>
</tr>
<tr>
<td>Spark Master</td>
<td><a href="http://localhost:8080/" rel="nofollow">localhost:8080</a></td>
<td>Spark Master node</td>
</tr>
<tr>
<td>Spark Worker I</td>
<td><a href="http://localhost:8081/" rel="nofollow">localhost:8081</a></td>
<td>Spark Worker node with 1 core and 512m of memory (default)</td>
</tr>
<tr>
<td>Spark Worker II</td>
<td><a href="http://localhost:8082/" rel="nofollow">localhost:8082</a></td>
<td>Spark Worker node with 1 core and 512m of memory (default)</td>
</tr>
</tbody>
</table>
&nbsp;
---
### 2.) Challenge 2: Set-up simple PySpark Program
Understand the simple PySpark program `pyspark_pi.py`:
```py
from __future__ import print_function
import sys
from random import random
from operator import add
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("PyPi").getOrCreate()
slices = 1
n = 100000 * slices
def f(_):
x = random() * 2 -1
y = random() * 2 - 1
return 1 if x ** 2 + y ** 2 <= 1 else 0
count = spark.sparkContext.parallelize(range(1,n+1), slices).map(f).reduce(add)
print("Pi is roughly %f" % (4.0 * count / n))
spark.stop()
```
What is the output of the program?
What happens when the value of variable ‘slices’ increases from 1 to 2 and 4?
&nbsp;
---
### 3.) Challenge 3: Run on PySpark Cluster
Open Jupyter, [http://localhost:8888](http://localhost:8888/) and paste the code
into the cell.
Execute the cell.
![text](../markup/img/H_img02.png)
&nbsp;
---
### 4.) Challenge 4: Explain the PySpark Environment
Briefly describe which essential parts a PySpark-environment consists of and
concepts:
- RDD, DF, DS
- Transformation
- Action
- Lineage
- Partition
%% Cell type:code id:c3bac98d tags:
``` python
from __future__ import print_function
import sys
from random import random
from operator import add
from pyspark.sql import SparkSession
import os
import pyspark.sql.functions as f
spark = SparkSession.builder.appName("PyPi").getOrCreate()
df_all = spark.read.option('lineSep', r'(THE\sEND)').text("./data/Shakespeare.txt")
```
%% Output
23/01/26 09:55:32 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
%% Cell type:code id:bb3abe55 tags:
``` python
df = df_all.withColumn('value', f.regexp_replace('value', r'<<[\w\s\d\n()\,.-]{495}>>', ''))\
.withColumn('value', f.explode(f.split('value', r'THE\sEND', -1))) \
.withColumn('index', f.monotonically_increasing_id())\
.filter("index > 1")\
.filter('index < 38') \
.withColumn("title", f.regexp_extract('value', r'(.*)\n*by', 0))\
.withColumn("value", f.regexp_replace('value', r'([A-Z ,]*)\n*by William Shakespeare', ''))\
.withColumn("title", f.regexp_replace('title', r'\n*by', ''))\
.withColumn("year", f.regexp_extract('value', r'\d{4}', 0))\
.withColumn("value", f.regexp_replace('value', r'\d{4}', ''))\
.withColumn('value', f.trim('value'))\
.withColumn('value', f.regexp_replace('value', r' {2,}', ' '))\
.withColumn('value', f.regexp_replace('value', r'\n{2,}', ''))\
.withColumn('wordCount', f.size(f.split('value', ' ')))\
.withColumn('lineCount', f.size(f.split('value', r'\n')))\
.orderBy(f.col("lineCount").desc())
```
%% Cell type:code id:3991296f tags:
``` python
def play_counts(df):
results = df.select('title','lineCount', 'wordCount').collect()
for r in results:
print(f"{r['title']}, {r['lineCount']} lines, {r['wordCount']} words.")
# df.filter("index == 37").collect()
play_counts(df)
```
%% Output
THE TRAGEDY OF HAMLET, PRINCE OF DENMARK, 3947 lines, 32079 words.
KING RICHARD III, 3914 lines, 31193 words.
THE TRAGEDY OF CORIOLANUS, 3691 lines, 29293 words.
CYMBELINE, 3649 lines, 28870 words.
THE TRAGEDY OF ANTONY AND CLEOPATRA, 3587 lines, 26552 words.
THE TRAGEDY OF OTHELLO, MOOR OF VENICE, 3479 lines, 27986 words.
THE TRAGEDY OF KING LEAR, 3433 lines, 27585 words.
THE HISTORY OF TROILUS AND CRESSIDA, 3431 lines, 27623 words.
KING HENRY THE EIGHTH, 3327 lines, 25886 words.
THE WINTER'S TALE, 3249 lines, 26059 words.
THE LIFE OF KING HENRY THE FIFTH, 3147 lines, 27498 words.
THE SECOND PART OF KING HENRY THE SIXTH, 3133 lines, 26840 words.
SECOND PART OF KING HENRY IV, 3101 lines, 27689 words.
THE TRAGEDY OF ROMEO AND JULIET, 3089 lines, 25857 words.
THE THIRD PART OF KING HENRY THE SIXTH, 3012 lines, 25873 words.
THE FIRST PART OF KING HENRY THE FOURTH, 2926 lines, 25783 words.
THE FIRST PART OF HENRY THE SIXTH, 2852 lines, 22883 words.
KING RICHARD THE SECOND, 2851 lines, 23363 words.
MEASURE FOR MEASURE, 2740 lines, 22947 words.
LOVE'S LABOUR'S LOST, 2735 lines, 22987 words.
KING JOHN, 2659 lines, 21776 words.
THE TRAGEDY OF JULIUS CAESAR, 2629 lines, 20930 words.
THE TRAGEDY OF TITUS ANDRONICUS, 2625 lines, 21701 words.
THE TAMING OF THE SHREW, 2616 lines, 22243 words.
THE MERCHANT OF VENICE, 2609 lines, 22309 words.
THE MERRY WIVES OF WINDSOR, 2579 lines, 23411 words.
AS YOU LIKE IT, 2543 lines, 22860 words.
THE LIFE OF TIMON OF ATHENS, 2437 lines, 19691 words.
MUCH ADO ABOUT NOTHING, 2425 lines, 22501 words.
THE TRAGEDY OF MACBETH, 2396 lines, 18246 words.
TWELFTH NIGHT; OR, WHAT YOU WILL, 2353 lines, 21208 words.
THE TEMPEST, 2328 lines, 17498 words.
THE TWO GENTLEMEN OF VERONA, 2196 lines, 18327 words.
A MIDSUMMER NIGHT'S DREAM, 2119 lines, 17306 words.
THE COMEDY OF ERRORS, 1815 lines, 15464 words.
A LOVER'S COMPLAINT, 283 lines, 2579 words.
%% Cell type:code id:676fab8d tags:
``` python
```
%% Cell type:code id:9d323a4b tags:
``` python
pip install pandas
```
%% Output
Requirement already satisfied: pandas in /usr/local/lib/python3.9/dist-packages (1.5.3)
Requirement already satisfied: python-dateutil>=2.8.1 in /usr/local/lib/python3.9/dist-packages (from pandas) (2.8.2)
Requirement already satisfied: numpy>=1.20.3 in /usr/local/lib/python3.9/dist-packages (from pandas) (1.24.1)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.9/dist-packages (from pandas) (2022.7.1)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.9/dist-packages (from python-dateutil>=2.8.1->pandas) (1.16.0)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Note: you may need to restart the kernel to use updated packages.
%% Cell type:code id:fc4a8512 tags:
``` python
from __future__ import print_function
import sys
from pyspark.sql import functions as F
from pyspark.sql.window import Window
from pyspark.sql import SparkSession
from pyspark.ml.feature import Bucketizer
import pandas as pd
def remove_header(df):
w = Window().orderBy(F.lit('value'))
df = df.withColumn("rowNum", F.row_number().over(w))
df = df.filter(df.rowNum > 244)
return df
def remove_filler(df):
return df.filter(~ df.value.startswith("<<THIS") & \
~ df.value.startswith("SHAKESPEARE IS") & \
~ df.value.startswith("PROVIDED BY") & \
~ df.value.startswith("WITH PERMISSION") & \
~ df.value.startswith("DISTRIBUTED") & \
~ df.value.startswith("PERSONAL USE") & \
~ df.value.startswith("COMMERCIALLY.") & \
~ df.value.startswith("SERVICE THAT"))
def get_play_rows(df):
return df.filter(df.value.rlike('[0-9]{4}')).drop('value')
def partition_by_play(df):
line_ids = get_play_rows(data)
splits = [x['rowNum'] for x in line_ids.collect()]
splits.append(float('Inf'))
bucketizer = Bucketizer(splits=splits,inputCol="rowNum", outputCol="playNum")
df = bucketizer.setHandleInvalid("keep").transform(df)
return df.repartition(df.playNum)
def count_words(df):
return df.withColumn('words', F.size(F.split(F.col('value'), ' ')))
def format_play(play, id, words) :
txt = "Play {}, words: {}, lines: {}"
return txt.format(play, words[id]['sum(words)'], words[id]['count(value)'])
spark = SparkSession.builder.appName("PyPi").getOrCreate()
data = spark.read.text("data/Shakespeare.txt")
data = remove_header(data)
data = remove_filler(data)
data = partition_by_play(data)
data = count_words(data)
result = data.groupBy(F.col('playNum')).agg(F.sum('words'), F.count('value')).sort('playNum').collect()
play_names = [x.strip() for x in open('data/plays.txt').readlines()]
play_results = [format_play(x, id, result) for id, x in enumerate(play_names)]
[print(x) for x in play_results]
spark.stop()
```
%% Output
23/01/26 12:48:06 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/01/26 12:48:20 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
23/01/26 12:48:25 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
Play THE SONNETS, words: 26469, lines: 2634
Play ALLS WELL THAT ENDS WELL, words: 37196, lines: 3199
Play THE TRAGEDY OF ANTONY AND CLEOPATRA, words: 45757, lines: 4167
Play AS YOU LIKE IT, words: 35589, lines: 2939
Play THE COMEDY OF ERRORS, words: 18817, lines: 2080
Play THE TRAGEDY OF CORIOLANUS, words: 46589, lines: 4253
Play CYMBELINE, words: 46143, lines: 4140
Play THE TRAGEDY OF HAMLET, PRINCE OF DENMARK, words: 52365, lines: 4489
Play THE FIRST PART OF KING HENRY THE FOURTH, words: 39843, lines: 3323
Play SECOND PART OF KING HENRY IV, words: 41774, lines: 3555
Play THE LIFE OF KING HENRY THE FIFTH, words: 42167, lines: 3603
Play THE FIRST PART OF HENRY THE SIXTH, words: 38962, lines: 3377
Play THE SECOND PART OF KING HENRY THE SIXTH, words: 41811, lines: 3614
Play THE THIRD PART OF KING HENRY THE SIXTH, words: 40832, lines: 3484
Play KING HENRY THE EIGHTH, words: 40894, lines: 3733
Play KING JOHN, words: 33625, lines: 2997
Play THE TRAGEDY OF JULIUS CAESAR, words: 33572, lines: 2984
Play THE TRAGEDY OF KING LEAR, words: 48177, lines: 3954
Play LOVE'S LABOUR'S LOST, words: 35286, lines: 2984
Play THE TRAGEDY OF MACBETH, words: 30479, lines: 2868
Play MEASURE FOR MEASURE, words: 35260, lines: 3095
Play THE MERCHANT OF VENICE, words: 34144, lines: 2952
Play THE MERRY WIVES OF WINDSOR, words: 35980, lines: 3104
Play A MIDSUMMER NIGHT'S DREAM, words: 29784, lines: 2424
Play MUCH ADO ABOUT NOTHING, words: 33536, lines: 2757
Play THE TRAGEDY OF OTHELLO, MOOR OF VENICE, words: 49358, lines: 3865
Play KING RICHARD THE SECOND, words: 36350, lines: 3204
Play KING RICHARD III, words: 49737, lines: 4509
Play THE TRAGEDY OF ROMEO AND JULIET, words: 41590, lines: 3584
Play THE TAMING OF THE SHREW, words: 35010, lines: 3001
Play THE TEMPEST, words: 28460, lines: 2612
Play THE LIFE OF TIMON OF ATHENS, words: 31097, lines: 2820
Play THE TRAGEDY OF TITUS ANDRONICUS, words: 34396, lines: 2961
Play THE HISTORY OF TROILUS AND CRESSIDA, words: 44082, lines: 3951
Play TWELFTH NIGHT; OR, WHAT YOU WILL, words: 32840, lines: 2767
Play THE TWO GENTLEMEN OF VERONA, words: 27622, lines: 2527
Play THE WINTER'S TALE, words: 40806, lines: 3563
Play A LOVER'S COMPLAINT, words: 3334, lines: 395
%% Cell type:code id:3f802133 tags:
``` python
```
This diff is collapsed.
THE SONNETS
ALLS WELL THAT ENDS WELL
THE TRAGEDY OF ANTONY AND CLEOPATRA
AS YOU LIKE IT
THE COMEDY OF ERRORS
THE TRAGEDY OF CORIOLANUS
CYMBELINE
THE TRAGEDY OF HAMLET, PRINCE OF DENMARK
THE FIRST PART OF KING HENRY THE FOURTH
SECOND PART OF KING HENRY IV
THE LIFE OF KING HENRY THE FIFTH
THE FIRST PART OF HENRY THE SIXTH
THE SECOND PART OF KING HENRY THE SIXTH
THE THIRD PART OF KING HENRY THE SIXTH
KING HENRY THE EIGHTH
KING JOHN
THE TRAGEDY OF JULIUS CAESAR
THE TRAGEDY OF KING LEAR
LOVE'S LABOUR'S LOST
THE TRAGEDY OF MACBETH
MEASURE FOR MEASURE
THE MERCHANT OF VENICE
THE MERRY WIVES OF WINDSOR
A MIDSUMMER NIGHT'S DREAM
MUCH ADO ABOUT NOTHING
THE TRAGEDY OF OTHELLO, MOOR OF VENICE
KING RICHARD THE SECOND
KING RICHARD III
THE TRAGEDY OF ROMEO AND JULIET
THE TAMING OF THE SHREW
THE TEMPEST
THE LIFE OF TIMON OF ATHENS
THE TRAGEDY OF TITUS ANDRONICUS
THE HISTORY OF TROILUS AND CRESSIDA
TWELFTH NIGHT; OR, WHAT YOU WILL
THE TWO GENTLEMEN OF VERONA
THE WINTER'S TALE
A LOVER'S COMPLAINT
\ No newline at end of file