Ingest datasets
You can upload your own datasets to the EOTDL platform.
The following constraints apply to the dataset name:
- It must be unique
- It must be between 3 and 45 characters long
- It can only contain alphanumeric characters and dashes.
CLI
The CLI is the most convenient way to ingest datasets. You can ingest a dataset using the following CLI command:
eotdl datasets ingest -p "dataset path"
Where dataset-path
is the path to a folder containing your dataset.
For Q0 datasets, a file named README.md
is expected in the root of the folder. This file should contain the following information:
---
name: dataset-name
authors:
- author 1 name
- author 2 name
- ...
license: dataset license
source: link to source
thumbnail: link to thumbnail (optional)
---
some markdown content (titles, text, links, code, images, ...)
If this file is not present, the ingestion process will fail.
After uploading a dataset with the CLI you can visiting the dataset page to edit this information.
You can update your dataset in multiple ways. If you modify your local folder and run the ingest
command again, a new version will be created reflecting the new data structure and files.
If the metadata in the README.md
file is not consistent with the one in the platform (either because you edited the file or because you edited the dataset in the platform), you should use:
- the
--force
flag to overwrite the metadata in the platform with the one in theREADME.md
file. - the
--sync
flag to update your file with the metadata in the platform.
For Q1+ datasets, a file called catalog.json
is expected in the root of the folder containing the STAC metadata for your dataset, that will be used as an entrypoint to ingest all the assets.
Library
You can ingest datasets using the following Python code:
from eotdl.datasets import ingest_dataset
ingest_dataset("dataset-path")