Python is one of the most widely used general-purpose programming languages in recent times. It provides many built-in modules, functions, and keywords to perform file-related tasks. A glob is a term that refers to the techniques used to match particular patterns according to UNIX shell-related rules. Linux- and UNIX-based operating systems provide the glob() function to find files and directories according to a given pattern. Python also provides a built-in glob module to access or retrieve files and pathnames that match a specified pattern. We will be using the Python glob() function of the glob module for the tasks described in this article. This article explains how to use the glob() function to find pathnames and filenames according to a given pattern.
Example 1: Match Filename or Pathname with Absolute Path
Let us look at a couple of examples to understand the functioning and working of the glob() function. We will start with a simple example of matching the filename or pathname with the absolute path. If the filename or pathname matches with the absolute path, then the glob() function returns matches in the form of a list; otherwise, the glob() function returns an empty list.
import glob
#using the glob function to match the pathname with the absolute path
#matching absolute path of downloads directory
print(glob.glob("/home/linuxhint/Downloads"))
#matching absolute path of documents directory
print(glob.glob("/home/linuxhint/Documents"))
#matching absolute path of Desktop
print(glob.glob("/home/linuxhint/Desktop"))
#matching absolute path of files
print(glob.glob("/home/linuxhint/Desktop/script.sh"))
print(glob.glob("/home/linuxhint/Downloads/format.py"))
print(glob.glob("/home/linuxhint/Documents/calculator.py"))
#specifying path of file that does not exist
#the glob function will return the empty list
print(glob.glob("/home/linuxhint/Documents/myfile.py"))
Output
The output shows the matches.
Example 2: Using Wildcards for Path Retrieval
It is possible to use wildcards with the glob() function for path retrieval. The most commonly used wildcards are the asterisk (*), question mark (?), number range [0-9], and alphabets range [a-z]. First, we will discuss the use of the asterisk in the glob() function.
Using an Asterisk (*) Wildcard for Path Retrieval
The asterisk wild card operator is used to match zero or more characters of the absolute path. If no character(s) are specified with the asterisk, then the function will list the absolute path of all the files, directories, and subdirectories of the given path. You can also write some characters with the asterisk and it will match the absolute path based on the given characters. For instance, if you need to find the absolute path of .txt files, then * wildcard can be used as *.txt.
We will implement this in our Python script.
import glob
#finding the absolute path of the files and directories
print(glob.glob("/home/linuxhint/Downloads/*"))
print("----------------------------------------")
#finding the absolute path of the .txt files in the Desktop directory
print(glob.glob("/home/linuxhint/Desktop/*.txt"))
print("----------------------------------------")
#finding the absolute path of the .sh files in the Desktop directory
print(glob.glob("/home/linuxhint/Desktop/*.sh"))
print("----------------------------------------")
#finding the absolute path of the .py files in the Documents directory
print(glob.glob("/home/linuxhint/Documents/*.py"))
print("----------------------------------------")
Output
The output shows the absolute path of the files and directories according to defined rules in the globe() function.
Using a Question Mark (?) Wildcard Operator
The question mark (?) wildcard operator is used to match a single character. This can be useful in situations when you are not aware of a single character in the given name.
We will implement this in our Python script.
import glob
#finding the file with the * wildcard operator
print(glob.glob("/home/linuxhint/Desktop/file?.txt"))
Output
The output shows the matched files.
Using a Range Wildcard Operator
The range wildcard operator is used to match files in a given range of characters or numbers. The range is defined inside of square brackets [].
We will find the absolute path of files by defining the range.
import glob
#finding the absolute path of the files in a given range
#defining the range in characters
print(glob.glob("/home/linuxhint/Desktop/[a-f]*"))
#printing the dotted line to differentiate the output
print("------------------------------------------")
#defining the range in numbers
print(glob.glob("/home/linuxhint/Desktop/[1-5]*"))
Output
Find Files Recursively Using glob()
The glob function takes two parameters as an argument. The first parameter defines the pathname, and the second parameter defines the recursive property. The second parameter is optional, and the recursive property is set to “false” by default. However, you can set the recursive property to “true” to find the file path recursively.
We will set the recursive property equal to “true” in our Python script and find the absolute path of the files recursively.
import glob
#finding the files recursively
print(glob.glob("/home/linuxhint/Documents/*",recursive=True))
Output
Conclusion
Glob is a common term that refers to techniques used to match particular patterns according to UNIX shell-related rules. Python provides a built-in glob module and function to access pathnames according to given rules. This article explains how to use the glob() function to find pathnames with various examples.