Forensics

File Carving and Data Recovery

The process of retrieving inaccessible, formatted, or damaged or corrupted data from a storage medium when it is not accessible through normal methods is called Data Recovery. Information is typically recouped from storage media; for example, internal and external hard disks (HDDs); solid-state drives (SSDs); flash drives; magnetic storage, such as CDs and DVDs; RAID subsystems; and other electronic gadgets. Recovery could be required due to physical harm to storage devices or legitimate harm to the file-system, preventing the system from being mounted by the host working operating system (OS). A definitive objective is to duplicate all fundamental records from the harmed media to a new drive. It is possible to back up information rapidly utilizing a Live CD or DVD, booting legitimately from the ROM, rather than using the corrupted drive or device to glean information from the system.

Live CDs or DVDs offer a way to boot the system drive, as well as the removable or fixed media drive, allowing you to use the file manager or software to load the file. A disk server can corrupt these cases and store valuable or proprietary data files in separate compartments in the OS files.

File Carving is a procedure used in PC crime scene investigation to extract information from a hard drive or other storage devices without the help of the file system table that created the original file in the first place. File Carving is a strategy that assumes control over documents in unallocated space with no data and is used to recover information to play out a computerized clinical examination. This process was initially called “design,” which is a general term for removing organized information from crude information, in light of the particular attributes of the pattern of organization of the stored information.

A forensic method that recoups documents is dependent on the structure and contents of the files without the appropriate file system metadata. File carving allows you to recover files from unallocated space in any drive. The area of the drive indicated by the file system structure (file table) that does not hold any file system information is called unallocated space.

Missing or damaged file system structures can affect the entire drive. Simply put, many file systems do not delete data when it is deleted. Instead, it simply eliminates the knowledge of where it is from. Scanning raw bytes and putting them in order is the basic process of File Carving. This process is performed by examining the header (first bytes) and footer (last bytes) of a file.

File carving is an excellent way to recover files and file fragments when text is damaged or missing. It is often used by professionals in troubleshooting to re-examine the evidence. An example of the ban and the ability to evacuate media occurred when the information was removed from the camps of Osama Bin Laden during the attack by the US Seals Navy. Forensics Investigators used file recovery methods to recover data from the drives and systems used in the camps.

File Systems Overview

A file system is a type of database used for storing, updating, and retrieving files or several numbers of files. It is a way in which files are archived logically and named for archiving and recovery. There are different types of File systems mentioned below :

Windows file system: Microsoft Windows uses only two types of FAT and NTFS.

  • FAT, which means ‘file allocation table’, is the simplest type of file system containing a boot sector, a file allocation table, and a simple storage space for storing files and folders. Recently, FAT came in FAT16, FAT12, and FAT32. FAT32 is compatible with Windows-based storage devices. Windows cannot create a FAT32 file system with a file bigger than 32 GB.
  • NTFS, abbreviation of “New Technology File System,” is now a  default file system for files greater than 32 GB. Encryption and Access control are some main properties of this file system.

Linux file system: Linux is a widely used, open-source operating system, and was developed for testing and development. This OS was intended to use different file system concepts. In Linux, there are several types of file systems.

  • Ext2, Ext3, Ext4 – This is the local, or default, Linux file system. The root filesystem is generally mcapped to the entire Linux distribution. The Ext3 file system is an excellent update of the previously used Ext2 file system; it uses the transactional file writing operation.  Ext4 is an extension file that supports Ext3 information and file attribution.
  • ReiserFS – The file system problem is solved by saving a lot of small files at once. There is a good laugh by the file manager, and the permission of the compatible file, the storage of the file code, the file contains metadata in the mode of not using the large file system due to its size.
  •  XFS – The XFS file system works well and is widely used for file archiving. This file system type is popular on IRIX servers.
  • JFS – IBM developed this file system, and it has become a file system that is used on almost all Linux distributions

macOS file system: The Apple Macintosh operating system uses only the HFS + file system without the HFS file system extension. MacOS, iPhones, iPads, and all other Apple products use the HFS + file system. Some Apple Server products do use the Hscan file system. This renowned file system keeps track of information related to directories view, windows location, etc.

File Carving Techniques

During the digital investigation, it is necessary to analyze the different types of media. Applicable information can be found on several storage devices and in the PC memory. Various types of information might be broken down, for example, email, electronic reports, framework logs, and media records. File carving is a recovery technique where only the contents and structure of the file are considered rather than file metadata used in the organization of data on the storage medium.

Below are some file carving terminologies to remember:

  • Block – The smallest size of data units that can be written to storage
  • Header – The starting point of the file.
  • Footer – The last bytes of the file.
  • Fragment – One or several blocks are belonging to a single file.
  • Base-fragment – First fragment of file container, the header of the file.
  • Fragmentation point – The last block just before fragmentation takes place. Multiple fragments in any file results in several fragmentation points.

The supreme corporate universal file carving techniques are as follows:

  • Header-footer technique (or header-“maximum file size”) – The basic strategy here is to carve files based on title and handwriting or total files.
  1. JPG or JPEG extension files – “\ xFF \ xD8” and “\ xFF \ xD9.”
  2. GIF – titled “\ x47 \ x49 \ x46 \ x38 \ x37 \ x61” and “\ x00 \ x3B” footer.
  3. PST: “! BDN” heading with no footers.
  4. If the file system does not have a base, the maximum number of files used in the carving program.
  • File structure-based carving
  1. The internal layout of the file is used as a basic technique.
  2. Header, footer, ID strings, and size information are basic elements.
  • Content-based carving

Content structure is free (MBOX, HTML, XML)

  • Characteristics of the material
  1. Count characters
  2. Text / language recognition
  3. Black and white data list
  4. Information entropy
  5. Statistical characteristics (Chi2)

Carving a File (without using any tool)

Next, we will see how to carve a .jpeg file without using a tool. First, we need to know the structure of the .jpeg file (header and footer, etc.). To do this, we will open a .jpeg image in the Hex editor to examine what the header and footer of the .jpeg file look like.

Here, we found the file header ( FFD8FFE0). Now, to find the footer, we will examine the last bytes in the file.

Here, we have the file footer or trailer (FFD9).

If you have a document with an image in it, you can carve the image by knowing its header and footer.

Now, we have a word file with an image in it. We will carve the image out using this technique.

The first thing we need to do is open this word document with the Hex editor by clicking File >> Open.

Here, we can see a figure showing the word file’s data in Hexadecimal form. As we already know, the .jpeg file has a header value of FFD8FFE0, so we will search for the file header by pressing Ctrl + F or Search >> File and entering the known header value (selecting the hex value data type is very important in this step).

We will find a signature value at Offset 14FD.

Next, we must search for a footer or trailer. We know that the .jpeg file has a footer value of FFD9, so we will search for the file footer by pressing Ctrl + F or Search >> File and entering the known footer value (selecting the hex value data type is very important.

We will find a footer value at Offset 2ADB.

Presently we have the header and footer of a jpeg document, and, as we recently stated, between the header and footer is the information of a jpeg record. Here we duplicate the entire square of information with header and footer and store it as another file.

Go to EDIT >> Select Block and enter both of the following terms:

File Header Offset: 14FD

File Footer Offset:  2ADB

After entering these values, the entire .jpeg file will be marked in blue. To save it as a dfile, copy it by right-clicking and selecting Copy, or by pressing Ctrl + C. Next, we will paste the information in a new file. A dialogue box will appear, and we will click OK. Now, we are ready to save the file by clicking File >> Save as or pressing Ctrl + S. If you open this copied file, you will see the same image as was in the original document. This is the basic technique for carving media files.

Data Carving Tools

Data recovery tools play an important role in most forensic investigations, as smart attackers always try to erase evidence of their crimes. Listed below are some important data recovery tools in Linux and Windows.

  • Foremost (file carving tool)

To recover files that are lost due to their internal data structures, headers, and footers, foremost, can be used. Foremost usually takes input in various image formats, such as AFF or raw formats, which can be generated using a variety of tools, such as FTK Imager, DD, encase, etc.  You can navigate to foremost’s help page to learn and explore its powerful commands using the following command:

[email protected]:~$ foremost -h
Recover  files  from  a disk image based on file types specified by the
       user using the -t switch.

       jpg    Support for the JFIF and Exif formats, including implementations
              used in modern digital cameras.
       gif
       png
       bmp    Support for windows bmp format.
       avi
       exe    Support for Windows PE binaries will extract DLL and EXE files
              along with their compile times.
       mpg    Support for most MPEG files (must begin with 0x000001BA)
       wav
       riff   This will extract AVI and RIFF since they use the same file for‐
              mat (RIFF). note faster than running each separately.
       wmv    Note may also extract wma files as they have a similar format.
       ole    This will grab any file using the OLE file structure.  This
              includes PowerPoint, Word, Excel, Access, and StarWriter
       doc    Note it is more efficient to run OLE as you get more bang for
              your buck.   If you wish to ignore all other ole files, then use
              this.
       zip    Note it will extract .jar files as well because they use a similar
               format.   Open Office docs are just zip'd XML files, so they
              are extracted as well.  These include SXW, SXC, SXI, and SX? for
              undetermined  OpenOffice files.  Office 2007 files are also XML
              based (PPTX,DOCX,XLSX)
       rar
       htm
       cpp    C source code detection, note this is primitive and may generate
              documents other than C code.
       mp4    Support for MP4 files.
       all    Run all predefined extraction methods. [Default if no -t is
              specified]
  • BinWalk

BinWalk is used to manage binary libraries and extract important data from firmware images. This tool is great for those who know how to use it. BinWalk is considered one of the best tools available for reverse engineering and extracting firmware images. BinWalk is easy to use and comes with enormous capabilities. You can navigate to binwalk’s help page to learn more using the following command:

[email protected]:~$ binwalk --help
Signature Scan Options:
    -B, --signature              Scan target file(s) for common file signatures
    -R, --raw=              Scan target file(s) for the specified sequence of bytes
    -A, --opcodes                Scan target file(s) for common executable opcode signatures
    -m, --magic=           Specify a custom magic file to use
    -b, --dumb                   Disable smart signature keywords
    -I, --invalid                Show results marked as invalid
    -x, --exclude=          Exclude results that match
    -y, --include=          Only show results that match

Extraction Options:
    -e, --extract                Automatically extract known file types
    -D, --dd=      Extract  signatures, give the files an extension of , and execute
    -M, --matryoshka             Recursively scan extracted files
    -d, --depth=            Limit matryoshka recursion depth (default: 8 levels deep)
    -C, --directory=        Extract files/folders to a custom directory (default: current working directory)
    -j, --size=             Limit the size of each extracted file
    -n, --count=            Limit the number of extracted files
    -r, --rm                     Delete carved files after extraction
    -z, --carve                  Carve data from files, but don't execute extraction utilities

Entropy Analysis Options:
    -E, --entropy                Calculate file entropy
    -F, --fast                   Use faster, but less detailed, entropy analysis
    -J, --save                   Save plot as a PNG
    -Q, --nlegend                Omit the legend from the entropy plot graph
    -N, --nplot                  Do not generate an entropy plot graph
    -H, --high=           Set the rising edge entropy trigger threshold (default: 0.95)
    -L, --low=            Set the falling edge entropy trigger threshold (default: 0.85)

Binary Diffing Options:
    -W, --hexdump                Perform a hexdump / diff of a file or files
    -G, --green                  Only show lines containing bytes that are the same among all files
    -i, --red                    Only show lines containing bytes that are different among all files
    -U, --blue                   Only show lines containing bytes that are different among some files
    -w, --terse                  Diff all files, but only display a hex dump of the first file

Raw Compression Options:
    -X, --deflate                Scan for raw deflate compression streams
    -Z, --lzma                   Scan for raw LZMA compression streams
    -P, --partial                Perform a superficial, but faster, scan
    -S, --stop                   Stop after the first result

General Options:
    -l, --length=           Number of bytes to scan
    -o, --offset=           Start scan at this file offset
    -O, --base=             Add a base address to all printed offsets
    -K, --block=            Set file block size
    -g, --swap=             Reverse every n bytes before scanning
    -f, --log=             Log results to file
    -c, --csv                    Log results to file in CSV format
    -t, --term                   Format output to fit the terminal window
    -q, --quiet                  Suppress output to stdout
    -v, --verbose                Enable verbose output
    -h, --help                   Show help output
    -a, --finclude=         Only scan files whose names match this regex
    -p, --fexclude=         Do not scan files whose names match this regex
    -s, --status=           Enable the status server on the specified port

Recovering Data from Formatted Disks

Data recovery tools should be selected carefully to recover information from formatted disks, USB flash drives, and memory cards. Tools designed to complete various activities can produce unexpected results. Below, we will look at some of the differences between various data recovery tools for data correction in formatted drives.

Unformat

The first fatal error that many computer users make when accidentally formatting their drives is to find, install, and use “unformatted” tools. There are many of these tools on the market; some are commercial, and others are free goods. The purpose of these tools is to rebuild or recreate the preformatted disk by restoring the file system.

While this may seem like a viable approach to the inexperienced, it might end up being a bigger mistake than losing the files in the first place. Formatting the disk flushes the original file system, replacing it at least in part, usually at the beginning. When you try to restore your old file system, the best that you can get is a disk that is readable with some of your files. Everything cannot be recovered exactly as it was this way, and the most precious files might be compromised, with only random samples of the original files on the disk. When you think about “formatting” a system drive, forget it; at least some system files will be gone. Even if you can boot the operating system, you will never get a stable system.

Undelete

The second mistake that many computer users will make is to use recovery tools. Although these tools exist and tend to do their job in good faith, they are not designed to handle disks with an excluded file system. Even with some of the best recovery tools, such as RS File Recovery, you can delete multiple files, but that is about it.

Partition Recovery

To recover files, you should look for a partition recovery tool like RS Partition Recovery. Designed to handle distributed, formatted, and damaged disks, this tool can scan the entire surface of a disk or partition to recover everything it can find. Even if the file system is empty or deleted, this tool can recover many types of files, such as documents, images, and videos, through its signature function. However, though segmented recovery tools are top-notch for data recovery, they are usually quite expensive. If you only want to recover a formatted disk, it can be useful to search and save instead.

FAT and NTFS Recovery

You can save up to 40% on the cost of Partition RS recovery by choosing a tool that only recovers FAT- or NTFS-formatted disks. Remember that you will need to purchase a tool that is suitable for the original file system and not the one written above. If the original drive is NTFS, get the NTFS Recovery RS. If it is FAT or FAT32, get the FAT Recovery RS. This way, you will get the same quality tools, but you will be limited to FAT or NTFS formatting. This is the perfect choice for a unique job.

Carving Files (using a tool)

PhotoRec is an awesome software used to carve files and especially jpeg or image files  (that’s why it’s named Photo Recovery). PhotoRec overlooks the document framework and pursues the basic information, so it will work regardless of whether your media’s record framework has been seriously harmed or reformatted. Photorec is easily accessible on Windows operating systems.

As an example, we will recover image files from an 8-GB flash drive using this tool.

First, run the PhotoRec.exe file and launch the application. We will see a screen like this:

Here, we have all the partitions showing. We will select /K as our desired target from which to recover data.

We can see which file system this partition is using here, and there are four options at the bottom.

Search – This will search the partition that holds files for recovery.
Options – Used for minor changes in the options.
File Opt – Used for modifying the types of files to be recovered.
Quit – Exits the process.

We will select File Opt (File Options):

This will give us options for selecting the files we want to recover from the desired partition. Pressing S will unmark all the options. We will select JPG pictures, as we only want to recover image files from the drive. Next, we will press B.

To select the File System, go back to the main options and select Other. As for recovery options, we have two choices:

  • recover from the whole partition
  • recovery from unallocated space only (FAT12, FAT16, FAT32, EXT1, EXT2, EXT3, etc.). Using this option, only the files that have been deleted will be recovered.

Now, all we need to do is set the location where the deleted files will be recovered. After that, the recovery process will start and finish after taking some time.  Then, we will look for the recovered files at the set location. The recovered image files will be there.

Conclusion

File Carving is a well-known forensic computer term to describe identifying file types and removing them from non-subordinate clusters using file signatures. A file signature, also known as a magic number, is a numeric or permanent text value used to identify the file format. Extraction of files or data is a term used in the field of forensic informatics. A computerized forensic investigation is an acquisition, verification, analysis, and documentation of evidence contained in a computer system, a network of computers, or other forms of digital media. Extracting meaningful data from raw data is called carving.

File Sculpting is the identification and recovery of files based on format analysis. In forensic computing, sculpting is a useful way to find hidden or deleted files on digital media. FFiles can be hidden in areas such as lost clusters, unallocated clusters, and playing discs or digital media. To use this extraction method, a file must have a standard signature, called a file header, at the beginning of the file. To obtain the file header, the recovery tool will continue to query until it reaches the footer of the file at the end of the file. The data between the header and the footer is extracted and analyzed to ensure integrity. Several sculpting methods are used in its algorithms, depending on the file type.

Modern operating systems do not fully delete deleted files without user permission. Deleted files can be recovered through various forensic tools and tactics if the deleted files are not added to another file. Damaged files can be recovered if the data is not damaged beyond recognition.

There is a lot of difference between file recovery and file carving. File recovery uses information from the file system; by utilizing this information, several files can be recovered. If the information is incorrect, it will not work. With the advent of file carving, law enforcement, technology professionals, and forensics professionals have found another tool that can be used to recover deleted data. While it is not always perfect and refined, tools like Foremost, Scalpel, and Photorec have made file recreation easier than ever.

About the author

Usama Azad

Usama Azad

A security enthusiast who loves Terminal and Open Source. My area of expertise is Python, Linux (Debian), Bash, Penetration testing, and Firewalls. I’m born and raised in Wazirabad, Pakistan and currently doing Undergraduation from National University of Science and Technology (NUST). On Twitter i go by @UsamaAzad14