search

Description

Searches the data lake based on key terms provided.

The search engine performs full text searches across the name, description and associated metadata for all packages in the data lake. The search engine has the ability to perform wildcard [*] searches. The search results return the packages and its associated metadata that match the key terms provided. The size of the search results returned is managed by the administrator with a default of 100.

Synopsis

search
--terms <value>

Options

--terms (string)

Key terms to search for in the data lake

Examples

To search for 'salinity'

The following command searches for 'salinity' in the data lake.

Command:

datalake search --terms 'salinity'

Output:

{
    "Items":[
        {
            "deleted":false,
            "created_at":"2016-10-16T22:27:10Z",
            "updated_at":"2016-10-16T22:38:32Z",
            "name":"Water Salinity and River Discharge Updated",
            "package_id":"HkIa0OW1l",
            "owner":"usera_example_com",
            "description":"Biweekly averages of the water salinity and river discharge in Pamlico Sound, North Carolina were recorded between the years 1972 and 1977. The data in this set consists only of those measurements in March, April and May. Another Update 2.",
            "metadata":[
                {
                    "tag":"Category",
                    "value":"salinity"
                },
                {
                    "tag":"Source",
                    "value":"Ruppert, D. and Carroll, R.J. (1980) Trimmed least squares estimation in the linear model. Journal of the American Statistical Association, 75, 828–838."
                }
            ]
        }
    ]
}

To search for 'nuclear' and wildcard 'wat*'

The following command searches for 'nuclear' and a wildcard 'wat*' in the data lake.

Command:

datalake search --terms 'nuclear, wat*'

Output:

{
    "Items":[
        {
            "created_at":"2016-10-01T20:28:42Z",
            "deleted":false,
            "updated_at":"2016-10-01T20:28:42Z",
            "name":" Biochemical Oxygen Demand",
            "package_id":"HJmthcTT",
            "owner":"usera_example_com",
            "description":"The BOD data frame has 6 rows and 2 columns giving the biochemical oxygen demand versus time in an evaluation of water quality.",
            "metadata":[
                {
                    "tag":"Source",
                    "value":"Bates, D.M. and Watts, D.G. (1988), Nonlinear Regression Analysis and Its Applications, Wiley, Appendix A1.4."
                },
                {
                    "tag":"Category",
                    "value":"BOD"
                }
            ]
        },
        {
            "created_at":"2016-10-14T16:17:18Z",
            "deleted":false,
            "updated_at":"2016-10-14T16:17:18Z",
            "name":"Nuclear Power Station Construction Data",
            "package_id":"SkLGBFCA",
            "owner":"usera_example_com",
            "description":"The data relate to the construction of 32 light water reactor (LWR) plants constructed in the U.S.A in the late 1960's and early 1970's. The data was collected with the aim of predicting the cost of construction of further LWR plants. 6 of the power plants had partial turnkey guarantees and it is possible that, for these plants, some manufacturers' subsidies may be hidden in the quoted capital costs.",
            "metadata":[
                {
                    "tag":"Category",
                    "value":"nuclear"
                }
            ]
        },
        {
            "deleted":false,
            "created_at":"2016-10-16T22:27:10Z",
            "updated_at":"2016-10-16T22:38:32Z",
            "name":"Water Salinity and River Discharge Updated",
            "package_id":"HkIa0OW1l",
            "owner":"usera_example_com",
            "description":"Biweekly averages of the water salinity and river discharge in Pamlico Sound, North Carolina were recorded between the years 1972 and 1977. The data in this set consists only of those measurements in March, April and May. Another Update 2.",
            "metadata":[
                {
                    "tag":"Category",
                    "value":"salinity"
                },
                {
                    "tag":"Source",
                    "value":"Ruppert, D. and Carroll, R.J. (1980) Trimmed least squares estimation in the linear model. Journal of the American Statistical Association, 75, 828–838."
                }
            ]
        }
    ]
}

To search for 'column_name' content

The following command searches for datasets that have pickup_datetime as column.

Command:

datalake search --terms 'column_name: pickup_datetime'

Output:

{
    "Items": [{
        "owner": "heitorc_amazon_com",
        "metadata": [],
        "deleted": false,
        "updated_at": "2018-05-24T21:34:58Z",
        "name": "NYC Taxi and TLC. fd",
        "column_name": ["dispatching_base_num", "pickup_datetime", "dropoff_datetime", "pulocationid", "dolocationid", "vendorid", "lpep_pickup_datetime", "lpep_dropoff_datetime", "store_and_fwd_flag", "ratecodeid", "passenger_count", "trip_distance", "fare_amount", "extra", "mta_tax", "tip_amount", "tolls_amount", "ehail_fee", "improvement_surcharge", "total_amount", "payment_type", "trip_type", "tpep_pickup_datetime", "tpep_dropoff_datetime"],
        "column_comment": ["UTC"],
        "groups": [],
        "created_at": "2018-05-24T21:34:42Z",
        "description": "Data of trips taken by taxis and for-hire vehicles in New York City.",
        "package_id": "By5egh4k7",
        "table_desc": ["Yellow cab only"]
    }]
}

Output

Search results -> (list)