by

Truncating GeoJSON Coordinate Precision with Regular Expressions

So, this isn’t anything particularly magical, but it’s a fast way to get rid of extra decimals on GeoJSON without loading up any GIS packages and it’s something I need every so often, so I figured I’d save it here and share it at the same time. It’s great when I only have Notepad++ available, which can search and replace with regular expressions, or when I’m on Linux and don’t want to install spatial tools to load and process the GeoJSON.

Before going further, make sure to make a copy of the data you’ll be processing with this regex, in case it goes wrong, doesn’t match your formatting, etc.

The search value of the regular expression is \.(\d{6})\d+([\,\s\]]+?) – this truncates (it doesn’t round!) to 6 digits – you can adjust that by changing the value inside the {6} to whatever you like.

and the replacement value (at least for Notepad++, sometimes replacement tokens are $1 instead of \1, depending on the language you’re using), is \.\1\2

In Notepad++, you can put these into the “Find what” and “Replace with” fields of the Replace dialog, respectively, and then in the bottom, switch “Search Mode” to “Regular Expression” for it to work. I recommend doing few test finds/replaces, then you can click Replace All to process the whole file.

Breaking it Down a Bit

In case it helps, let’s take a look at what this regular expression does. Imagine a coordinate pair in GeoJSON that looks like the following
[ -121.4459734379239, 41.1838973283285 ],

In the regular expression, we start by searching for the decimals using \. and then look for the first set of digits using \d{6}, surrounded in parentheses that save, or capture, that when it finds it, so we can use it later (it becomes the \1 in the output). Then, it searches for any more digits with \d+. This is the truncation – we’re not capturing it, so we’ll just discard it in the output. The ([\,\s\]]+?) finds whatever terminates that particular coordinate – as longitude, it could be followed by whitespace or a comma, and as latitude, it could be followed by whitespace or a ]. So it searches for one or more commas, whitespaces, or closing brackets (and uses the ? modifier to keep it to the shortest match possible – otherwise, it could get weird at the end of a single feature, though it’d likely be fine). It captures that too.

Then, the replacement value of \.\1\2 just says to replace the whole value with a decimal, followed by the digits we captures, followed by the terminating characters it found, whether a space and a square bracket, a comma, or some combination. It just keeps what was there before.

I hope that helps – this occasionally helps me shrink the size and complexity of GeoJSON that is unnecessarily precise and save on filesize.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.