You need to get data from a PDF file into Excel for analysis, but copying and pasting creates a messy, unusable spreadsheet. PDFs are designed for viewing, not editing, which makes extracting structured tables difficult. This article explains how to use Excel’s built-in Power Query tool to import PDF tables directly. You will learn the steps to get clean, formatted data without installing any extra software.
Key Takeaways: Import PDF Tables to Excel
- Data > Get Data > From File > From PDF: This is the primary method in Excel for Windows to connect to a PDF and preview its tables before import.
- Power Query Editor > Transform Data: Use this interface to clean columns, remove blank rows, and change data types after the initial import.
- Close & Load: This final command loads the cleaned and transformed table from the PDF into a new Excel worksheet.
Using Excel’s Power Query to Import PDF Data
Excel for Microsoft 365 and Excel 2021 include a powerful data transformation engine called Power Query. Its “From PDF” connector can read PDF files and detect tables within them. The tool interprets the visual structure of the PDF page, identifying rows and columns based on spacing and lines. It then converts that structure into a data table you can edit before it enters your workbook. This method works best with PDFs created from spreadsheet or database programs, as they have clear table boundaries.
You need a version of Excel for Windows that includes Power Query. This feature is available in Excel for Microsoft 365, Excel 2021, Excel 2019, and Excel 2016. The “From PDF” option is not available in Excel for Mac or the web version. The PDF file must be stored on your local computer or a network drive you have access to. Password-protected PDFs cannot be read by this method.
Steps to Import and Clean a PDF Table
Follow this process to bring a table from a PDF into Excel with correct formatting.
- Start the data import
In Excel, go to the Data tab on the ribbon. Click Get Data, hover over From File, and select From PDF. Navigate to your PDF file and click Import. - Select the correct table
The Navigator pane will open showing a list of tables and pages found in the PDF. Click on a table name to see a preview on the right. Check the preview to ensure the data looks correct. Select the checkbox next to the table you want and click Transform Data. - Clean the data in Power Query Editor
The Power Query Editor window opens. Here you can remove extra header rows by selecting Home > Remove Rows > Remove Top Rows. Delete blank columns by right-clicking the column header and choosing Remove. Change a column’s data type by clicking the data type icon next to the column name, like ABC for text or 123 for whole number. - Load the cleaned table into Excel
After making your changes, click the Close & Load button on the Home tab. Power Query will close and load the final table into a new worksheet in your Excel workbook.
Using the Legacy “From Web” Feature for Online PDFs
If your PDF is hosted on a public website, you can use a different method. In Excel, go to Data > Get Data > From Other Sources > From Web. Paste the direct URL to the PDF file. This may open the PDF in the Navigator pane, allowing you to select a table. This method is less reliable than the direct From PDF option and depends on how the web server hosts the file.
Common Mistakes and Data Cleaning Challenges
Even with Power Query, PDF imports can have problems. Knowing these issues helps you fix them quickly.
Imported Data is All in One Column
This happens when Power Query cannot detect column separators in the PDF. In the Power Query Editor, select the column with all the data. Go to the Transform tab and click Split Column > By Delimiter. Choose a delimiter like a space or comma, or select “By number of characters” if the data has fixed widths.
Numbers Import as Text or Dates are Wrong
Power Query sometimes guesses data types incorrectly. Click the data type icon next to the column header in the Power Query Editor. Choose the correct type: Decimal Number, Whole Number, or Date. For stubborn text numbers, use Transform > Replace Values to remove any currency symbols or commas before changing the type.
Extra Header Rows or Merged Cells in the PDF
PDFs with complex layouts can confuse the import. In the Navigator pane preview, if you see extra title rows above the data, do not select the table. Instead, select the “Page” item, which imports the entire page content. You can then use Power Query’s filtering tools to remove the unwanted rows manually.
Power Query vs. Copy-Paste vs. Save As Methods
| Item | Power Query (Get Data from PDF) | Copy and Paste | Save PDF as Excel (Online Converter) |
|---|---|---|---|
| Data Structure | Preserves table structure in separate columns | Often pastes all data into one column | Results vary widely; often creates many merged cells |
| Data Cleaning | Built-in editor to filter, change types, and remove errors | Must clean manually in Excel after pasting | No editing before import; errors are embedded |
| Formatting | Brings raw data without PDF fonts or colors | May bring text formatting but breaks alignment | Attempts to keep visual layout, which harms data structure |
| Automation | Query can be refreshed if the source PDF updates | Fully manual process must be repeated | Manual one-time conversion |
| Software Required | Excel for Windows (2016 or newer) | Any version of Excel | Third-party website or service |
You can now import tables from PDFs directly into Excel using the Get Data feature. This gives you a clean dataset ready for formulas and pivot tables. For recurring reports, set the query to refresh by right-clicking the table in Excel and selecting Refresh. A more advanced tip is to combine multiple PDFs from a folder using Data > Get Data > From File > From Folder, then appending the queries into one master table.