GCP, AWS and and MarineCadastre.gov AIS data
MarineCadastre.gov
National AIS at 1 Minute Intervals
AIS_2022_01_01.zip
, AIS_2022_01_02.zip
, …, AIS_2022_12_31.zip
i.e. AIS_2022_(0[1-9]|1[0-2])_(0[1-9]|[12][0-9]|3[01])\.zip
Nationwide Automatic Identification System 2022
Name | Description | Example | Units | Resolution | Type | Size | |
---|---|---|---|---|---|---|---|
1 | MMSI | Maritime Mobile Service Identity value | 477220100 | Text | 8 | ||
2 | BaseDateTime | Full UTC date and time | 2017-02-01 20:05:07 | YYYY-MM-DD:HH-MM-SS | DateTime | ||
3 | LAT | Latitude | 42.35137 | decimal degrees | XX.XXXXX | Double | 8 |
4 | LON | Longitude | -71.04182 | decimal degrees | XXX.XXXXX | Double | 8 |
5 | SOG | Speed Over Ground | 5.9 | knots | XXX.X | Float | 4 |
6 | COG | Course Over Ground | 47.5 | degrees | XXX.X | Float | 4 |
7 | Heading | True heading angle | 45.1 | degrees | XXX.X | Float | 4 |
8 | VesselName | Name as shown on the station radio license | OOCL Malaysia | Text | 32 | ||
9 | IMO | International Maritime Organization Vessel number | IMO9627980 | Text | 16 | ||
10 | CallSign | Call sign as assigned by FCC | VRME7 | Text | 8 | ||
11 | VesselType | Vessel type as defined in NAIS specifications | 70 | Integer | short | ||
12 | Status | Navigation status as defined by the COLREGS | 3 | Integer | short | ||
13 | Length | Length of vessel (see NAIS specifications) | 71 | meters | XXX.X | Float | 4 |
14 | Width | Width of vessel (see NAIS specifications) | 12 | meters | XXX.X | Float | 4 |
15 | Draft | Draft depth of vessel (see NAIS specifications) | 3.5 | meters | XXX.X | Float | 4 |
16 | Cargo | Cargo type (see NAIS specification and codes) | 70 | Text | 4 | ||
17 | TransceiverClass | Class of AIS transceiver | A | Text | 2 |
AIS Fundamentals | Spire Maritime Documentation
Unix text processing
curl -O https://coast.noaa.gov/htdata/CMSP/AISDataHandler/2022/AIS_2022_06_20.zip
unzip AIS_2022_06_20.zip
head -n 5 AIS_2022_06_20.csv
MMSI,BaseDateTime,LAT,LON,SOG,COG,Heading,VesselName,IMO,CallSign,VesselType,Status,Length,Width,Draft,Cargo,TransceiverClass
538009563,2022-06-20T00:00:04,29.23668,-116.63519,20.4,149.9,152.0,DEL MONTE PRIDE,IMO9869693,V7A4893,70,0,192,30,7.6,70,A
367481660,2022-06-20T00:00:05,38.58858,-90.19843,0.0,360.0,511.0,MIRANDA PAIGE,IMO8976578,WDF7156,31,0,21,9,,31,A
303200000,2022-06-20T00:00:06,36.85888,-76.34542,0.0,213.9,337.0,TAURUS,IMO7819498,WDB6361,31,0,22,7,,32,A
368011450,2022-06-20T00:00:06,29.61732,-89.89189,6.1,297.8,511.0,KRISTIN,,WDJ7927,31,0,17,7,,31,A
tail -n 5 AIS_2022_06_20.csv
303533000,2022-06-20T23:17:52,13.46087,144.66438,0.4,266.0,511.0,HURAO,IMO9277230,WDL4585,52,15,29,9,4.0,52,A
303533000,2022-06-20T23:21:21,13.46088,144.66436,0.0,244.7,511.0,HURAO,IMO9277230,WDL4585,52,15,29,9,4.0,52,A
303533000,2022-06-20T23:37:22,13.45743,144.65525,7.8,260.7,511.0,HURAO,IMO9277230,WDL4585,52,15,29,9,4.0,52,A
303533000,2022-06-20T23:44:22,13.45513,144.64894,0.7,215.1,511.0,HURAO,IMO9277230,WDL4585,52,15,29,9,4.0,52,A
303533000,2022-06-20T23:53:42,13.45187,144.64068,0.2,290.2,511.0,HURAO,IMO9277230,WDL4585,52,15,29,9,4.0,52,A
AWS S3
I started the cloud process using AWS S3. I then did it using Google Cloud Storage. I keep the documentation for AWS S3.
Using high-level (s3) commands with the AWS CLI
The size of each zip file we download is around 300 MB and decompressed the csv file is around 900 MB. There is not enough available space in my EC2 instance to do the following in EC2 Instance Connect.
Locally we run the following:
Download to a file named by the URL
This is the
-O
(uppercase letter o) option, or--remote-name
for the long name version. The-O
option selects the local file name to use by picking the file name part of the URL that you provide. This is important. You specify the URL and curl picks the name from this data. If the site redirects curl further (and if you tell curl to follow redirects), it does not change the file name curl will use for storing this.
for i in {21..27}; do \
curl -O https://coast.noaa.gov/htdata/CMSP/AISDataHandler/2022/AIS_2022_06_${i}.zip; \
unzip AIS_2022_06_${i}.zip; \
done
for i in {21..27}; do aws s3 cp AIS_2022_06_${i}.csv s3://jordanbell2357ais/; done
S3 bucket:
Google Cloud Storage
Discover object storage with the gsutil tool
Using EC2 Instance Connect was not possible in my AWS configuration, because the size of each pair of zip file and csv file is over 1 GB in each case. On the other hand, in my configuration of Google Cloud Platform, the 5 GB available space is enough to store one by one the zip file and csv file.
We write out the steps for June 21, 2022, in a general way.
i=21 # 01-31
curl -O https://coast.noaa.gov/htdata/CMSP/AISDataHandler/2022/AIS_2022_06_${i}.zip
unzip AIS_2022_06_${i}.zip
gsutil cp AIS_2022_06_${i}.csv gs://jordanbell2357marinecadastre/
rm AIS_2022_06_${i}.zip
rm AIS_2022_06_${i}.csv
Relevant Google Cloud Self-Paced Labs (GSP): Cloud Storage: Qwik Start - CLI/SDK (GSP074), Ingesting Data Into The Cloud (GSP194), Ingesting New Datasets into BigQuery (GSP 411), Loading Your Own Data into BigQuery (GSP865).
We use bq
now:
for i in {21..27}; do bq load --source_format=CSV --autodetect AIS_2022_06_21_to_27.AIS_2022_06_${i} gs://jordanbell2357marinecadastre/AIS_2022_06_${i}.csv; done
Now, to make sure we can do the same task multiple ways, we will do the above locally for June 20, 2022.
curl -O https://coast.noaa.gov/htdata/CMSP/AISDataHandler/2022/AIS_2022_06_20.zip
unzip AIS_2022_06_20.zip
If we now run
gsutil cp AIS_2022_06_20.csv gs://jordanbell2357/marinecadastre/
we get
ResumableUploadAbortException: 401 Anonymous caller does not have storage.objects.create access to the Google Cloud Storage object. Permission 'storage.objects.create' denied on resource (or it may not exist).
We run
gcloud init
and then run
gsutil cp AIS_2022_06_20.csv gs://jordanbell2357/marinecadastre/
with success. Then
bq load --source_format=CSV --autodetect AIS_2022_06_21_to_27.AIS_2022_06_20 gs://jordanbell2357marinecadastre/AIS_2022_06_20.csv
with success. Now we clean up,
rm AIS_2022_06_20.zip
rm AIS_2022_06_20.csv
BigQuery
(SELECT MMSI,BaseDateTime,LAT,LON,SOG,COG,Heading,VesselType,Status,Length,Width FROM ais-data-385301.AIS_2022_06_21_to_27.AIS_2022_06_21)
UNION ALL
(SELECT MMSI,BaseDateTime,LAT,LON,SOG,COG,Heading,VesselType,Status,Length,Width FROM ais-data-385301.AIS_2022_06_21_to_27.AIS_2022_06_22)
UNION ALL
(SELECT MMSI,BaseDateTime,LAT,LON,SOG,COG,Heading,VesselType,Status,Length,Width FROM ais-data-385301.AIS_2022_06_21_to_27.AIS_2022_06_23)
UNION ALL
(SELECT MMSI,BaseDateTime,LAT,LON,SOG,COG,Heading,VesselType,Status,Length,Width FROM ais-data-385301.AIS_2022_06_21_to_27.AIS_2022_06_24)
UNION ALL
(SELECT MMSI,BaseDateTime,LAT,LON,SOG,COG,Heading,VesselType,Status,Length,Width FROM ais-data-385301.AIS_2022_06_21_to_27.AIS_2022_06_25)
UNION ALL
(SELECT MMSI,BaseDateTime,LAT,LON,SOG,COG,Heading,VesselType,Status,Length,Width FROM ais-data-385301.AIS_2022_06_21_to_27.AIS_2022_06_26)
UNION ALL
(SELECT MMSI,BaseDateTime,LAT,LON,SOG,COG,Heading,VesselType,Status,Length,Width FROM ais-data-385301.AIS_2022_06_21_to_27.AIS_2022_06_27)
We save the results to a BigQuery table, creating a new table we name AIS_2022_06_21_to_27
(ais-data-385301.AIS_2022_06_21_to_27.AIS_2022_06_21_to_27
).