Word Counts are a mess
First Published: 2024-05-20, Last Updated: 2024-11-10
Why are word counts different between different software
Why are word counts different between different software
Word counts are a mess.
Once in a Uni assignment, I had to submit an accurate word count along with the assignment.
This proved impossible as each document software gave different word counts and there was no way I was going to manually count an approx ~600 word document.
Afterwards, I was curious as to why each software gave different word counts, so I decided to do some testing.
This blogpost is a collation of my findings.
My testing was not quite scientific, but I tried to at least be consistent.
I standardised using document formats .DOCX and .DOC as it is quite popular due to Microsoft Office's dominance and is well supported by most software.
I then chose a selection of popular office document software then tested each software (Word Desktop, Word Online, Libre Office, Google Docs, Apple Pages).
For each software, I first started with a sanity check: a simple DOCX document with a 100 word story.
This acted as a litmus test to check if there weren't any significant issues with the software.
Then came the benchmark: a DOCX document with every feature I could think of which has a word count of 219 (counted manually).
This DOCX document was then converted into a DOC document (via Word Desktop) to see if there were any significant differences.
This benchmark is not a real-world document, any word count generated from the benchmark will be wrong.
This is by design as I added as many tricks to try and trip up software to find edge cases.
The main point of this exercise is to find the differences in how each software measures word counts through a stress test and not to find the actual word count of the document as that is quite subjective.
That being said, let me list and explain some certain unknowns
(Things that shouldn't count as a word, but as long as the software is consistent is counted as correct within reason).
They include
Now here we get to the meat of the blogpost, the results of the testing.
Now that we have the results, I have to say that I am quite surprised by the results.
There is a massive range between the word counts (126-187), and supprisingly, there is no difference between .DOC and .DOCX.
The difference in Google Docs and Word Desktop is due to lack of equations in the .DOC due to the conversion process.
Because of this, I will only be discussing the .DOCX results, lets have some fun and look at the quirks of each software:
A rather interesting quirk is that Word (Desktop) and Word (Online) have different word counts.
This quirk was the main motivation for this blogpost, as this caused much grief in my assignment.
In my testing and research, I found that Word Online doesn't count words in text boxes, headers, footers and SmartArt.
I found this rather strange, as I felt the main point of Word Online was to compete with Google Docs with its collaborative features.
And for students which are a major target for the Office suite, this inconsistency is quite annoying as groups may be working on the same document in Word Online and Word Desktop, and the word count would be different between the two.
As an aside, during testing I found a strange quirk in Word Online:
Inexplicably, Word Online doesn't counts bullets from bulletpoints, but it does count numbers from numbered lists.
This is almost certainly a bug, but I have no idea if the intended behaviour was to count the bulletpoints or to not count list markers at all.
Libre Office is rather impressive, as it caught almost every trick I threw at it. (Even the textbox in the header!)
I could only find 3 flaws in its wordcount, and some of those flaws are subjective whether they should be counted or not.
The first flaw I feel is valid as even if equations are not counted, words in equations should be counted or else I could just write an entire essay in an equation and bypass any wordcount requirements.
The second flaw, although more subjective, I feel is definitely incorrect as I feel that citation marks should be considered punctuation and should not be counted.
At the very least, this behaviour is uniqe among all the software tested.
(Citation Marks are the numbers next to a text which indicate citations, e.g. study1)
Finally I can't quite articulate an explanation for the third flaw, but it is "just wrong".
Note: Although Libre Office has the most inclusive wordcount algorithm, I have to say it was the most annoying for me to use and test
as it has the rather annoying behaviour of not updating the word count after undoing anything.
Before I get to wordcounts, I have to note that Google Docs has its own unique approach to .DOCX compatibility.
Google Docs, I assume was never meant to compete directly with Word, but rather to offer collaboration as its main selling point
because of that it often has its own implementations of many DOCX features.
Some of the ways it handles .DOCX are:
Now the quirks in its wordcounts:
But here's the real kicker:
It counts links and Acroynms as separate words,
i.e. this link: https://www.youtube.com/watch?v=dQw4w9WgXcQ which counts as 1 word in Word counts as 7 words in Google Docs.
This is especially egregious as it would inflate the word count of any documents with links, which is especially bad as it would count links in a Bibliography or a references section.
In comparison to Google Docs, Apple Pages seems much tamer in how it imports elements from .DOCX documents.
The only issue I had with imports were tables nested in textboxes, which imported the table as a text representation.
This is rather minor, as Pages gives an warning in advance.
That's where the good news ends as Apple Pages is inconsistent in its word count.
Notably, Apple Pages is the only software which counts emoji as words, and not as a char.\
Now that this exercise is finished, I don't know quite what to feel about the results.
I don't know which software is the most correct, as ultimately many of the decision on what is or isn't a word is debatable.\
Ironically, I discovered that this still hasn't fixed the issue with my original Uni assignment as they use Canvas's speedgrader,\ which does not expose its wordcount to students.
Ultimately, the best solution is to build in a margin of error into wordcount requirements
and to NEVER require a precise wordcount to be written inside the document.
If you have questions, feel free to send them to jchu634@keshuac.com
All of files used in the testing are freely and publicly available at https://github.com/jchu634/WordCountTesting
Feedback is very much welcome!
Header/Footer | Textbox in Header | Headings | Text | List | Numbered list | Table Of Contents | Tables | Citation Mark | Equations | Links | Slash/Seperated/Words | Acronyms | Bibliographies | Captions | Footnotes | EndNotes | WordArt | Textbox | Comments | Page Numbers | Sub/SuperScript | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Word (Desktop) | ❌ | ❌ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ❌ | ✔️ | One Word | One Word | One Word | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ❌ | ❌ | One Word |
Word (Online) | ❌ | ❌ | ✔️ | ❌ | ✔️ | ✔️ | ✔️ | ✔️ | ❌ | ❌ | One Word | One Word | One Word | ✔️ | ❌ | ✔️ | ✔️ | ❌ | ❌ | ❌ | ❌ | One Word |
Libre | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ❌ | One Word | One Word | One Word | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ❌ | ✔️ | One Word |
Google Docs | ❌ | ❌ | ✔️ | ✔️ | ⭕ (No Bullets) | ⭕ (No Bullets) | ✔️ | ✔️ | ❌ | ✔️ | Separate | Separate | Separate | ✔️ | ❌ (Not Imported) | ❌ | ❌ | ❌ (imported as drawing) | ❌ (imported as drawing) | ❌ | ❌ | One Word |
Apple Pages | ❌ | ✔️ | ✔️ | ✔️ | ⭕ (No Bullets) | ⭕ (No Bullets) | ✔️/❌ (Only Title counts) | ✔️ | ❌ | ❌ | Separate | Separate | Separate | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ❌ | ❌ | One Word |
Header/Footer | Textbox in Header | Headings | Text | List | Numbered list | Table Of Contents | Tables | Citation Mark | Equations | Links | SlashSeperatedWords | Acroynms | Bibliographies | Captions | Footnotes | EndNotes | WordArt | Textbox | Comments | Page Numbers | SubScript | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Word (Desktop) | ❌ | ❌ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ❌ | N.A. | One Word | One Word | One Word | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ❌ | ❌ | ❌ |
Word (Online) | N.A. | N.A. | N.A. | N.A. | N.A. | N.A. | N.A. | N.A. | N.A. | N.A. | N.A. | N.A. | N.A. | N.A. | N.A. | N.A. | N.A. | N.A. | N.A. | N.A. | N.A. | N.A. |
Libre | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | N.A. | One Word | One Word | One Word | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ❌ | ✔️ | ❌ | |
Google Docs | ❌ | ❌ | ✔️ | ✔️ | ⭕ (No Bullets) | ✔️ (Bullets Don't Count) | ✔️ | ✔️ | ❌ | N.A. | Separate | Separate | Separate | ✔️ | ❌ (Imported, But with a quirk) | ❌ | ❌ | ❌ (imported as drawing) | ❌ (imported as drawing) | ❌ | ❌ | ❌ |
Apple Pages | ❌ | ✔️ | ✔️ | ✔️ | ⭕ (No Bullets) | ⭕ (No Bullets) | ✔️/❌ (Only Title counts) | ✔️ | ❌ | ❌ | Separate | Separate | Separate | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ❌ | ❌ | One Word |