Python

Python Glob Function

Python is one of the most widely used general-purpose programming languages in recent times. It provides many built-in modules, functions, and keywords to perform file-related tasks. A glob is a term that refers to the techniques used to match particular patterns according to UNIX shell-related rules. Linux- and UNIX-based operating systems provide the glob() function to find files and directories according to a given pattern. Python also provides a built-in glob module to access or retrieve files and pathnames that match a specified pattern. We will be using the Python glob() function of the glob module for the tasks described in this article. This article explains how to use the glob() function to find pathnames and filenames according to a given pattern.

Example 1: Match Filename or Pathname with Absolute Path

Let us look at a couple of examples to understand the functioning and working of the glob() function. We will start with a simple example of matching the filename or pathname with the absolute path. If the filename or pathname matches with the absolute path, then the glob() function returns matches in the form of a list; otherwise, the glob() function returns an empty list.

#importing the glob module

import glob

#using the glob function to match the pathname with the absolute path

#matching absolute path of downloads directory

print(glob.glob("/home/linuxhint/Downloads"))

#matching absolute path of documents directory

print(glob.glob("/home/linuxhint/Documents"))

#matching absolute path of Desktop

print(glob.glob("/home/linuxhint/Desktop"))

#matching absolute path of files

print(glob.glob("/home/linuxhint/Desktop/script.sh"))

print(glob.glob("/home/linuxhint/Downloads/format.py"))

print(glob.glob("/home/linuxhint/Documents/calculator.py"))

#specifying path of file that does not exist

#the glob function will return the empty list

print(glob.glob("/home/linuxhint/Documents/myfile.py"))

Output

The output shows the matches.

Example 2: Using Wildcards for Path Retrieval

It is possible to use wildcards with the glob() function for path retrieval. The most commonly used wildcards are the asterisk (*), question mark (?), number range [0-9], and alphabets range [a-z].  First, we will discuss the use of the asterisk in the glob() function.

Using an Asterisk (*) Wildcard for Path Retrieval

The asterisk wild card operator is used to match zero or more characters of the absolute path. If no character(s) are specified with the asterisk, then the function will list the absolute path of all the files, directories, and subdirectories of the given path. You can also write some characters with the asterisk and it will match the absolute path based on the given characters. For instance, if you need to find the absolute path of .txt files, then * wildcard can be used as *.txt.

We will implement this in our Python script.

#importing the glob module

import glob

#finding the absolute path of the files and directories

print(glob.glob("/home/linuxhint/Downloads/*"))

print("----------------------------------------")

#finding the absolute path of the .txt files in the Desktop directory

print(glob.glob("/home/linuxhint/Desktop/*.txt"))

print("----------------------------------------")

#finding the absolute path of the .sh files in the Desktop directory

print(glob.glob("/home/linuxhint/Desktop/*.sh"))

print("----------------------------------------")

#finding the absolute path of the .py files in the Documents directory

print(glob.glob("/home/linuxhint/Documents/*.py"))

print("----------------------------------------")

Output

The output shows the absolute path of the files and directories according to defined rules in the globe() function.

Using a Question Mark (?) Wildcard Operator

The question mark (?) wildcard operator is used to match a single character. This can be useful in situations when you are not aware of a single character in the given name.

We will implement this in our Python script.

#importing the glob module

import glob

#finding the file with the * wildcard operator

print(glob.glob("/home/linuxhint/Desktop/file?.txt"))

Output

The output shows the matched files.

Using a Range Wildcard Operator

The range wildcard operator is used to match files in a given range of characters or numbers. The range is defined inside of square brackets [].

We will find the absolute path of files by defining the range.

#importing the glob module

import glob

#finding the absolute path of the files in a given range

#defining the range in characters

print(glob.glob("/home/linuxhint/Desktop/[a-f]*"))

#printing the dotted line to differentiate the output

print("------------------------------------------")

#defining the range in numbers

print(glob.glob("/home/linuxhint/Desktop/[1-5]*"))

Output

Find Files Recursively Using glob()

The glob function takes two parameters as an argument. The first parameter defines the pathname, and the second parameter defines the recursive property. The second parameter is optional, and the recursive property is set to “false” by default. However, you can set the recursive property to “true” to find the file path recursively.

We will set the recursive property equal to “true” in our Python script and find the absolute path of the files recursively.

#importing the glob module

import glob

#finding the files recursively

print(glob.glob("/home/linuxhint/Documents/*",recursive=True))

Output

Conclusion

Glob is a common term that refers to techniques used to match particular patterns according to UNIX shell-related rules. Python provides a built-in glob module and function to access pathnames according to given rules. This article explains how to use the glob() function to find pathnames with various examples.

About the author

Kamran Sattar Awaisi

Kamran Sattar Awaisi

I am a software engineer and a research scholar. I like to write article and make tutorial on various IT topics including Python, Cloud Computing, Fog Computing and Deep Learning. I love to use Linux based operating systems.