Hadoop Dynamic Storage with LVM and Python Automation Script.

4 min readMar 14, 2021


Hadoop Dynamic Storage with LVM

Logical Volume Management
Logical volume management (LVM) is a form of storage virtualization that offers system administrators a more flexible approach to managing disk storage space than traditional partitioning. This type of virtualization tool is located within the device-driver stack on the operating system. It works by chunking the physical volumes (PVs) into physical extents (PEs). The PEs are mapped onto logical extents (LEs) which are then pooled into volume groups (VGs). These groups are linked together into logical volumes (LVs) that act as virtual disk partitions and that can be managed as such by using LVM.

The goal of LVM is to facilitate managing the sometimes conflicting storage needs of multiple end-users. Using the volume management approach, the administrator is not required to allocate all disk storage space at initial setup. Some can be held in reserve for later allocation. The administrator can use LVM to segment logically sequential data or combine partitions, increasing throughput and making it simpler to resize and move storage volumes as needed. Storage volumes may be defined for various user groups within the enterprise, and new storage can be added to a particular group when desired without requiring user files to be redistributed to make the most efficient use of space.

Use command

fdisk -l

to check the partitions

Creating Physical Volume

It is first necessary to create a new Physical Volume (PV) by using pvcreate command with the device name /dev/sdb. Physical block devices or other disk-like devices are used by LVM as the raw building material for higher levels of abstraction. Physical volumes are regular storage devices. LVM writes a header to the device to allocate it for management. If you are going to use the entire hard drive, creating a partition first does not offer any particular advantages and uses disk space for metadata that could otherwise be used as part of the PV.

Creating Volume Group

After creating the physical volumes we create a volume group, we can add multiple physical volumes in a volume group. Volume groups abstract the characteristics of the underlying devices and function as a unified logical device with a combined storage capacity of the component physical volumes.

Create logical volume and Format the partition

Creating Logical Volume

Now after creating VG we create Logical volume from a VG. A volume group can be sliced up into any number of logical volumes. Logical volumes are functionally equivalent to partitions on a physical disk, but with much more flexibility. Logical volumes are the primary component that users and applications will interact with.

Partition Formatting

Formatting the storage Now we format the partition so that we can mount it and use it.

Now mount it on the Hadoop datanode

using command

mount /dev/test7/test70 datanode

Now you can check the web UI

Now you can also increase its storage on fly

Using following commands

  • pvcreate /dev/sdb
  • vgextend test7 /dev/sdb
  • lvextend — size +5G /dev/test7/test70
  • rezise2fs /dev/test7/test70

Now check the hadoop UI


As you can see above we can distribute the amount of storage of Datanode to the Hadoop Cluster dynamically and extend it on the fly whenever we want with the help of Logical Volume Management.

Python script for Logical Volume Management

Here is the python script for Logical Volume management.

now you can run it by using command

python3 lvmautomation.py

Hence Task is done.