Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Skip to content
This is a Non-Federal dataset covered by different Terms of Use than Data.gov.

Mining Distance-Based Outliers in Near Linear Time

Metadata Updated: February 22, 2025

Full title: Mining Distance-Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule

Abstract: Defining outliers by their distance to neighboring examples is a popular approach to finding unusual examples in a data set. Recently, much work has been conducted with the goal of finding fast algorithms for this task. We show that a simple nested loop algorithm that in the worst case is quadratic can give near linear time performance when the data is in random order and a simple pruning rule is used. We test our algorithm on real high-dimensional data sets with millions of examples and show that the near linear scaling holds over several orders of magnitude. Our average case analysis suggests that much of the efficiency is because the time to process non-outliers, which are the majority of examples, does not depend on the size of the data set.

Access & Use Information

Public: This dataset is intended for public access and use. Non-Federal: This dataset is covered by different Terms of Use than Data.gov. License: No license information was provided.

Downloads & Resources

Dates

Metadata Created Date February 22, 2025
Metadata Updated Date February 22, 2025
Data Update Frequency irregular

Metadata Source

Harvested from nasa test json

Additional Metadata

Resource Type Dataset
Metadata Created Date February 22, 2025
Metadata Updated Date February 22, 2025
Publisher Dashlink
Maintainer
Identifier DASHLINK_191
Data First Published 2010-09-22
Data Last Modified 2025-02-19
Public Access Level public
Data Update Frequency irregular
Bureau Code 026:00
Metadata Context https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld
Schema Version https://project-open-data.cio.gov/v1.1/schema
Catalog Describedby https://project-open-data.cio.gov/v1.1/schema/catalog.json
Harvest Object Id 28b14637-9b4e-41ac-bb20-76bd27c6b78b
Harvest Source Id a73e0c30-4684-40ef-908e-d22e9e9e5f86
Harvest Source Title nasa test json
Homepage URL https://c3.nasa.gov/dashlink/resources/191/
Program Code 026:029
Source Datajson Identifier True
Source Hash 2ff57061fa2e3a765cd0fea1b5842bfc940c93eb87decd3e8f7ecd5bb458ed4a
Source Schema Version 1.1

Didn't find what you're looking for? Suggest a dataset here.