Duplicate files in a SharePoint document library waste storage space and confuse users who cannot tell which version is current. This problem usually happens when team members upload the same file multiple times, copy folders from local drives, or sync libraries with OneDrive without a clear file-naming policy. This article explains the built-in tools and manual methods you can use as a SharePoint admin to find and remove duplicate files without breaking document metadata or version history.
Key Takeaways: Removing Duplicates From a SharePoint Library
- Library Settings > Information management policy settings: Use retention and auditing rules to prevent duplicate uploads before they happen
- PowerShell with PnP.PowerShell module: Script that compares file names, sizes, and modified dates to identify exact duplicates
- SharePoint admin center > Storage metrics: View storage consumption per site to spot libraries that may contain duplicate bloat
Why Duplicate Files Accumulate in SharePoint Libraries
Duplicate files appear for several reasons. Users often drag files from their desktop into a library without checking whether a file with the same name already exists. When OneDrive sync is enabled, a local folder structure copied to the cloud can create multiple copies of the same document with slightly different file names such as Report.docx and Report (1).docx. Version history in SharePoint does not detect duplicates because each upload creates a new item, not a new version of an existing item.
SharePoint does not include a built-in duplicate file scanner. The platform relies on the file name and location to determine uniqueness. Two files with identical content but different names are treated as separate documents. This behavior means that admins must use external tools or custom scripts to find duplicates based on content hashes, file size, or metadata.
Prerequisites for Duplicate Cleanup
Before you start removing files, confirm that you have the correct permissions. You need at least Edit or Contribute access to the document library. Site Collection Administrator or Owner access is required for running PowerShell scripts against the site. Backup any files you plan to delete by exporting the library to a file share or downloading a copy of the suspected duplicates. Enable versioning on the library so that if you accidentally delete the wrong file, you can restore the previous version.
Method 1: Manual Review Using SharePoint Views
For small libraries with fewer than 200 items, a manual review is the safest approach. Create a custom view that groups files by name so duplicates appear next to each other.
- Open the document library
Navigate to the SharePoint site and open the library that contains the suspected duplicate files. - Create a new view
Click the library settings gear icon, select Library settings, then under Views click Create view. Choose Standard view and name it Duplicate Scan. - Configure grouping and sorting
In the view settings, under Group by select Name. Under Sort select Modified descending. This order places the most recently modified duplicate at the top of each group. - Check the grouping
Return to the library and select the Duplicate Scan view. Scroll through each group. If a group contains more than one item, compare the file size and modified date to decide which copy to keep. - Delete the extra copies
Select the duplicate file you want to remove, click the ellipsis, and choose Delete. Confirm the deletion in the dialog box.
This method works well for libraries with fewer than 500 items. Manual review becomes impractical at larger volumes.
Method 2: PowerShell Script to Detect Exact Duplicates
For libraries with thousands of files, use PowerShell with the PnP.PowerShell module. This script compares file content hashes to find exact duplicates regardless of file name.
- Install PnP.PowerShell module
Open Windows PowerShell as an administrator and run:Install-Module PnP.PowerShell -Scope CurrentUser. Confirm the installation when prompted. - Connect to the SharePoint site
Run:Connect-PnPOnline -Url "https://yourtenant.sharepoint.com/sites/yoursite" -Interactive. Sign in with a site owner account. - Run the duplicate detection script
Execute the following script. Replace DocumentLibraryName with the actual library name.$library = "DocumentLibraryName"
$items = Get-PnPListItem -List $library -PageSize 500
$hashTable = @{}
foreach ($item in $items) {
$file = $item.File
if ($file -ne $null) {
$hash = Get-FileHash -Path $file.ServerRelativeUrl -Algorithm MD5
if ($hashTable.ContainsKey($hash.Hash)) {
Write-Output "Duplicate: $($file.Name) matches $($hashTable[$hash.Hash])"
} else {
$hashTable[$hash.Hash] = $file.Name
}
}
} - Review the output
The script prints each duplicate pair. Copy the file names and manually verify them in the library before deletion. - Delete duplicates from the output
Use the SharePoint interface to delete the files identified by the script. Do not delete files directly from PowerShell unless you have confirmed the exact item ID.
The MD5 hash comparison catches files with identical binary content even if the file names differ. This method does not detect near-duplicates where only a few words changed.
Method 3: Third-Party Tools for Large Libraries
Several commercial tools can scan SharePoint libraries for duplicates and provide a web interface for bulk deletion. Tools such as ShareGate, Metalogix, and Collabspace Duplicate Checker offer scheduled scans, reporting, and one-click cleanup. Evaluate the cost against the time saved if your library contains more than 10,000 files. Most tools require a trial period so you can test them on a small library first.
Common Mistakes When Removing Duplicates
Deleting a file that has unique metadata or permissions
A file might appear to be a duplicate of another but have custom metadata, unique permissions, or a required retention label. Before deleting, check the properties of each suspected duplicate. Open the file details pane and look for any column values that differ from the copy you plan to keep. If the metadata is important, copy it to the surviving file before deletion.
Removing the file with the current version history
If one duplicate has 20 versions and the other has only one, the file with the richer version history is usually the one to keep. Open the version history by clicking the file name and selecting Version history. Compare the number of versions and the content of the latest version before deleting.
Not checking linked files or web part references
A file may be referenced by a SharePoint web part, a Power Automate flow, or a hyperlink on another page. Deleting a referenced file breaks the connection. Use the Search feature in the site settings to find pages or flows that mention the file name. Update those references to point to the surviving file before deletion.
Manual Cleanup vs Automated Cleanup
| Item | Manual Cleanup | Automated Cleanup (PowerShell or Tool) |
|---|---|---|
| Best for library size | Under 500 files | 500 to 100,000 files |
| Detection method | Visual comparison by file name | Content hash, file size, or metadata match |
| Risk of accidental deletion | Low because each file is reviewed | Moderate if script output is not verified |
| Time required | Hours for 500 files | Minutes for 10,000 files |
| Requires PowerShell knowledge | No | Yes for script method |
After you finish the cleanup, configure a retention policy in the SharePoint admin center to keep only the latest version of each file. This policy prevents duplicates from accumulating again. Also enable the Require Check Out setting on the library so users must check out a file before editing, which reduces the chance of multiple users uploading the same document.