Extra Spaces in Windows STDOUT

Today I was using windows and I piped some output to a file.

echo '["test"]' > bad.json

When I looked at the file, the content inside looked fine. There was no extra spaces or weird characters. But when I read the file using Node’s fs module, I saw extra whitespace between the characters. In the console I saw this…

const fs = require('fs');
var output = fs.readFileSync('bad.json', 'utf8');
console.log(output);

Why were there extra spaces? If I ran the script using a manually created file (we’ll call it good.json), the output was displayed correctly with no extra spaces. So I decided to look at the byte count of each file. bad.json had 22 bytes, but good.json had only 8. When I did a hex dump to see the bytes in both files this is what I saw…

file name: bad.json

0000-0010:  ff fe 5b 00-22 00 74 00-65 00 73 00-74 00 22 00  ..[.".t. e.s.t.".
0000-0016:  5d 00 0d 00-0a 00                                ].....
file name: good.json

0000-0008:  5b 22 74 65-73 74 22 5d                          ["test"]

Why was Windows adding extra bytes when I piped output to a file? So I tried creating another file using stdout, but this time I used the windows command prompt (cmd). I was previously using PowerShell to create bad.json. This time I didn’t get extra bytes.

file name: cmd.json

0000-000b:  5b 22 74 65-73 74 22 5d-20 0d 0a                 ["test"] ...

So it looked like powershell was using a different character encoding than cmd and what I’m used to in Linux.

I found this StackOverflow post which said some versions of PowerShell use Unicode as the default encoding. I ran this command to find my poweshell version, and it said I was was using version 5.1.

$PSVersionTable.PSVersion

As an experiment, I ran this command to force PowerShell to use utf-8.

write-output '["test"]' | Out-File 'powershell-test.json' -encoding utf8
file name: powershell-test.json

0000-000d:  ef bb bf 5b-22 74 65 73-74 22 5d 0d-0a           ...["tes t"]..

This was a little better, but it still had extra bytes at the front. I did some further investigation, and found this StackOverflow post which said PowerShell 5.1 creates UTF-8 files with a pseudo Byte Order Marker. It went on to say that the latest, cross platform PowerShell (PowerShell Core) no longer encodes UTF-8 with a BOM.

So I installed PowerShell 6.1.1 (the latest cross platform version) and tried to run my original command.

echo '["test"]' > buen.json
file name: buen.json

0000-000a:  5b 22 74 65-73 74 22 5d-0d 0a                    ["test"] ..

Much better! Now I wanted to switch the default terminal from the old windows PowerShell to the new PowerShell Core. Since I mainly use the terminal in Visual Studio Code, it was easy to change the default. I went into Visual Studio Code’s options and and changed the ‘terminal.integrated.shell.windows’ option from the old PowerShell path to the new PowerShell path.