presentations/2012-HPDE-Feb-TSDS
From TSDS
Contents |
1. Objectives
- develop a standard API for time series-like data,
- develop a software package, TSDS (Time Series Data Server), that implements this API and provides server-side super-setting, sub-setting, filtering, and uniform gridding of time series-like data,
- make the data holdings from several key data providers in the heliophysics environment accessible through the TSDS API, and
- develop client-side software for standard data analysis packages (IDL, MATLAB, Java, Python, and Excel) that will allow access to a TSDS-enabled server.
2. People
|
|
3. API (non-SPASE)
The base-line API builds on OPeNDAP-compliant URL requests of the form:
http://host/servletname/dataset.suffix?parameters&constraint&filter
Examples:
- http://tsds.net/tsdsdev/cdaweb/AC_H1_MFI.asc
- http://tsds.net/tsdsdev/cdaweb/AC_H1_MFI.asc?time,BX_GSE,BZ_GSE
- http://tsds.net/tsds/test/Scalar.asc?time,Variable
- http://tsds.net/tsds/test/Scalar.asc?time,Variable&format_time(yyMMddHHmmssZ)
- http://lasp.colorado.edu/lisird/tss/historical_tsi.csv?time,Irradiance&Irradiance>1361.6&Irradiance<1361.8
- (server temporarily down) http://tsds.net/tsdsdev/vmo/crres.csv?time,B
|
where
filter options include
|
constraint options include
The suffix options include
|
4. API (SPASE)
- non-SPASE-enabled:
http://tsds.net/tsdsdev/cdaweb/AC_H1_MFI.asc?time,BX_GSE - SPASE-enabled (proposed, not implemented):
http://tsds.net/tsdsdev/NumericalData/SPASE_ID.csv?time,BX_GSE - To implement this, we need a SPASE ID + ParameterKey and a mapping to the Product Name + Parameter Name that each data services uses internally (e.g.,
AC_H1_MFI/BX_GSE). - This is not possible with SPASE Numerical Data records, as implemented.
AC_H1_MFIis found in [2] . This Product Name is used to get a list of Parameters, e.g.,BX_GSEis returned by [3]- but not in a SPASE Record which has
spase://VSPO/NumericalData/P_ACE_HDR_MAG_SWEPAM_4M_MGD
- I don't think this information should be hard-coded into SPASE Numerical records. I think that they should come from a service (see also presentations/2012-HPDE-Feb-SPASE).
- Should this information be hard-coded in SPASE record (it is now): "Data are presently ~5 months delayed"? What happen when ACE stops returning data? Will someone receive and email alert that says "Update SPASE record"?
5. Connecting to a data service
- Serving data through the TSDS API from a data service requires two key pieces of information and possibly some additional code.
- This information is generated by TSDS developers based on the data service's API documentation.
- A catalog listing containing all information required to form a data request. At the very least is a list of
parameter IDsfor each data server andstart dates. Ideally additional information is given including stop date, units, and a link to documentation.- In working on this, I realized that most of the SPASE Numerical Data records will require significant edits if they were to be used instead of the ad-hoc approach taken (discussed later). The reason is discussed in the Use 4. section of presentations/2012-HPDE-Feb-SPASE.
- A template NcML file that is used by TSDS to form a data request and interpret the result.
- An IOSP (Input/Output Service Provider) - Usually Java code that maps the response from a service to the internal TSDS data structure.
6. Connecting to a data service - Catalog
1. A catalog listing containing all information required to form a data request. At the very least is a list of parameter IDs for each data server and start dates. Ideally additional information is given including stop date, units, and a link to documentation.
Example data requests:
- CDAWeb (parameter ID =
AC_H1_MFI Magnitude): [4] - SPIDR (parameter ID =
index_ssn): [5] - SuperMAG (parameter ID =
BOU): [6] - VSEO (parameter ID =
DE::DE-1::HAPI::HAPI::D1HE): [7]
Example catalogs:
| Code for creating catalog |
|---|
<catalog xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0" name="SPIDR Data"> <dataset name="spidr"> <!-- The URL http://tsds.net/tss/spidr/ssn is the base URL for forming a data request --> <access serviceName="tss" urlPath="http://tsds.net/tss/spidr/ssn" /> <!-- The URL http://tsds.net/meta/tsds?catalog=spidr¶meter=ssn returns NcML --> <access serviceName="ncml" urlPath="http://tsds.net/meta/tsds?catalog=spidr¶meter=ssn"/> <documentation xlink:href="http://spidr.ngdc.noaa.gov/spidr/servlet/GetData?describe&param=ssn" xlink:title="Metadata" /> <timeCoverage> <Start>19320101</Start> <End>20120201</End> </timeCoverage> </dataset> </catalog>
7. Connecting to a data service - NcML
2. A template NcML file that is used by TSDS to form a data request.
- The template file given below is modified based on a request of the form
http://tsds.net/tsdsdev/spidr/DATA_SET_ID.asc?time,PARAMETER1_SHORT_NAME,PARAMETER2_SHORT_NAME&time>STARTDATE&time<STOPDATE
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" location="file:/dev/null" iosp="lasp.tss.iosp.ColumnarAsciiReader" commentCharacter="#" columns="1,2;3" url="http://spidr.ngdc.noaa.gov/spidr/servlet/GetData? format=csv&dateFrom=STARTDATE&dateTo=STOPDATE& param=PARAMETER_SHORT_NAME"> <attribute name="title" value="DATA_SET_LONG_TITLE" /> <dimension name="time" isUnlimited="true" /> <variable name="time" shape="time" type="String"> <attribute name="units" value="yyyy-MM-dd HH:mm" /> </variable> <variable name="PARAMETER1_SHORT_NAME" shape="time" type="double"> <attribute name="long_name" value="PARAMETER1_LONG_NAME" /> <attribute name="units" value="PARAMETER1_UNITS" /> <attribute name="precision" value="PARAMETER1_PRECISION" /> <attribute name="_FillValue" type="double" value="PARAMETER1_FILL" /> </variable> <variable name="PARAMETER2_SHORT_NAME" shape="time" type="double"> <attribute name="long_name" value="PARAMETER2_LONG_NAME" /> <attribute name="units" value="PARAMETER2_UNITS" /> <attribute name="precision" value="PARAMETER2_PRECISION" /> <attribute name="_FillValue" type="double" value="PARAMETER2_FILL" /> </variable> </netcdf>
8. Connecting to a data service - IOSP
3. An IOSP (Input/Output Service Provider) - Usually Java code that maps the response from a service to the internal TSDS data structure (CDM).
IOSPs exist for:
- Columnar remote or local data files.
- Data piped from the command line.
- Data in a text file that is pre-processed by a regex.
- Data from web services: CDAWeb, SPIDR, LISIRD, ViRBO, SuperMAG, VSEO (in development).
- (Many of the IOSPs use Java code from Autoplot).
9. Example use: Browser
View ASCII data from web browser:
http://lasp.colorado.edu/lisird/tss/historical_tsi.csv
Return time and Irradiance in a time range:
http://lasp.colorado.edu/lisird/tss/historical_tsi.csv?time,Irradiance&time%3E2003-02-25&time%3C2009-03-27
Return time and Irradiance when Irradiance was greater than 1361.6 and less than 1361.8:
http://lasp.colorado.edu/lisird/tss/historical_tsi.csv?time,Irradiance&Irradiance%3E1361.6&Irradiance%3C1361.8
10. Example use: IDL
Import data into IDL. The following would be the response to a request for output=pro instead of output=csv, e.g., http://lasp.colorado.edu/lisird/tss/historical_tsi.pro (the following is the new style of output that differs from this link).
; Copy the following on to the IDL command line
oUrl = OBJ_NEW('IDLnetUrl')
fn = oUrl->Get(filename='tss_reader__define.pro', $
url='http://tsds.net/idl/tss_reader__define.pro')
tss = OBJ_NEW('tss_reader',baseurl='http://lasp.colorado.edu/lisird/tss/')
data = tss->read_data(dataset='historical_tsi')
OBJ_DESTROY,tss
print, 'For more information,'
print, 'see http://lasp.colorado.edu/lisird/tss/historical_tsi.html'
plot,data[*].(0),data[*].(1), $
yrange=[1360,1362],/xstyle,/ystyle, $
xtitle='Year',ytitle='Irradiance (W/m^2)', $
title='TSI Reconstruction from Wang, Lean, Sheeley (ApJ, 2005)'
11. Example use: Autoplot
http://autoplot.org/autoplot.jnlp&open=http://lasp.colorado.edu/lisird/tss/historical_tsi.csv
12. A look ahead
- Test every parameter
- Set up alert system for when data provider site goes down
- Add links to metadata (which metadata)
- Continue work on aggregation
