Author: Justin Y. Chen
Publisher:
ISBN:
Category :
Languages : en
Pages : 0
Book Description
Streaming algorithms allow for space-efficient processing of massive datasets. The distribution of the frequencies of items in a large dataset is often used to characterize that data: e.g., the data is heavy-tailed, the data follows a power law, or there are many elements that only appear only once or twice. In this thesis, we focus on the problem of estimating the profile (a vector representation of the frequency distribution). Given a sequence of m elements from a universe of size n, its profile is a vector [phi] whose i-th entry [phi][subscript i] represents the number of distinct elements that appear in the stream exactly i times. A classic paper by Datar and Muthukrishan from 2002 gave an algorithm which estimates any entry [phi][subscript i] up to an additive error of ±[epsilon]D using O(1/[epsilon]2 log(nm)) bits of space, where D is the number of distinct elements in the stream. We considerably improve on this result by designing an algorithm which estimates the whole profile vector [phi], up to overall error ±[epsilon]m, using O(1/[epsilon]2 log(1/[epsilon]) + log(nm)) bits. More formally, we give an algorithm that computes an approximate profile [phi]̂ such that the L1 distance [parallel lines][phi] - [phi]̂[parallel lines]1 is at most [epsilon]m. In addition to bounding the error across all coordinates, our space bound separates the terms that depend on 1/[epsilon] and those that depend on n and m. Furthermore, we give a lower bound showing that our bound is optimal up to constant factors. "To achieve these results, we introduce two new techniques. First, we develop hashing-based sketches that keep very limited information about the identities of the hashed elements. As a result, elements with different frequencies are mixed together, and need to be unmixed using an iterative "deconvolution" process. Second, we reduce the randomness used by the algorithms in a somewhat subtle way: we first use Nisans generator to ensure that the random variables of interest are O(1)-wise independent, and then we analyze those variables by calculating their moments. (In our setting, using Nisans generator alone would not yield the desired space bound.) The latter technique seems quite versatile, and has been already used for other streaming problems [Ano23].
Estimating Frequency Distributions in Data Streams
Author: Justin Y. Chen
Publisher:
ISBN:
Category :
Languages : en
Pages : 0
Book Description
Streaming algorithms allow for space-efficient processing of massive datasets. The distribution of the frequencies of items in a large dataset is often used to characterize that data: e.g., the data is heavy-tailed, the data follows a power law, or there are many elements that only appear only once or twice. In this thesis, we focus on the problem of estimating the profile (a vector representation of the frequency distribution). Given a sequence of m elements from a universe of size n, its profile is a vector [phi] whose i-th entry [phi][subscript i] represents the number of distinct elements that appear in the stream exactly i times. A classic paper by Datar and Muthukrishan from 2002 gave an algorithm which estimates any entry [phi][subscript i] up to an additive error of ±[epsilon]D using O(1/[epsilon]2 log(nm)) bits of space, where D is the number of distinct elements in the stream. We considerably improve on this result by designing an algorithm which estimates the whole profile vector [phi], up to overall error ±[epsilon]m, using O(1/[epsilon]2 log(1/[epsilon]) + log(nm)) bits. More formally, we give an algorithm that computes an approximate profile [phi]̂ such that the L1 distance [parallel lines][phi] - [phi]̂[parallel lines]1 is at most [epsilon]m. In addition to bounding the error across all coordinates, our space bound separates the terms that depend on 1/[epsilon] and those that depend on n and m. Furthermore, we give a lower bound showing that our bound is optimal up to constant factors. "To achieve these results, we introduce two new techniques. First, we develop hashing-based sketches that keep very limited information about the identities of the hashed elements. As a result, elements with different frequencies are mixed together, and need to be unmixed using an iterative "deconvolution" process. Second, we reduce the randomness used by the algorithms in a somewhat subtle way: we first use Nisans generator to ensure that the random variables of interest are O(1)-wise independent, and then we analyze those variables by calculating their moments. (In our setting, using Nisans generator alone would not yield the desired space bound.) The latter technique seems quite versatile, and has been already used for other streaming problems [Ano23].
Publisher:
ISBN:
Category :
Languages : en
Pages : 0
Book Description
Streaming algorithms allow for space-efficient processing of massive datasets. The distribution of the frequencies of items in a large dataset is often used to characterize that data: e.g., the data is heavy-tailed, the data follows a power law, or there are many elements that only appear only once or twice. In this thesis, we focus on the problem of estimating the profile (a vector representation of the frequency distribution). Given a sequence of m elements from a universe of size n, its profile is a vector [phi] whose i-th entry [phi][subscript i] represents the number of distinct elements that appear in the stream exactly i times. A classic paper by Datar and Muthukrishan from 2002 gave an algorithm which estimates any entry [phi][subscript i] up to an additive error of ±[epsilon]D using O(1/[epsilon]2 log(nm)) bits of space, where D is the number of distinct elements in the stream. We considerably improve on this result by designing an algorithm which estimates the whole profile vector [phi], up to overall error ±[epsilon]m, using O(1/[epsilon]2 log(1/[epsilon]) + log(nm)) bits. More formally, we give an algorithm that computes an approximate profile [phi]̂ such that the L1 distance [parallel lines][phi] - [phi]̂[parallel lines]1 is at most [epsilon]m. In addition to bounding the error across all coordinates, our space bound separates the terms that depend on 1/[epsilon] and those that depend on n and m. Furthermore, we give a lower bound showing that our bound is optimal up to constant factors. "To achieve these results, we introduce two new techniques. First, we develop hashing-based sketches that keep very limited information about the identities of the hashed elements. As a result, elements with different frequencies are mixed together, and need to be unmixed using an iterative "deconvolution" process. Second, we reduce the randomness used by the algorithms in a somewhat subtle way: we first use Nisans generator to ensure that the random variables of interest are O(1)-wise independent, and then we analyze those variables by calculating their moments. (In our setting, using Nisans generator alone would not yield the desired space bound.) The latter technique seems quite versatile, and has been already used for other streaming problems [Ano23].
Techniques for Estimating Magnitude and Frequency of Floods on Streams in Indiana
Author: Dale R. Glatfelter
Publisher:
ISBN:
Category : Flood forecasting
Languages : en
Pages : 120
Book Description
Publisher:
ISBN:
Category : Flood forecasting
Languages : en
Pages : 120
Book Description
Estimating the Magnitude and Frequency of Peak Streamflows for Ungaged Sites on Streams in Alaska and Conterminous Basins in Canada
Author: Janet H. Curran
Publisher:
ISBN:
Category : Flood forecasting
Languages : en
Pages : 116
Book Description
Publisher:
ISBN:
Category : Flood forecasting
Languages : en
Pages : 116
Book Description
Techniques for Estimating Peak-streamflow Frequency for Unregulated Streams and Streams Regulated by Small Floodwater Retarding Structures in Oklahoma
Author: Robert L. Tortorelli
Publisher:
ISBN:
Category : Flood forecasting
Languages : en
Pages : 50
Book Description
Publisher:
ISBN:
Category : Flood forecasting
Languages : en
Pages : 50
Book Description
Methods for Estimating the Magnitude and Frequency of Peak Discharges of Rural, Unregulated Streams in Virginia
Author: James A. Bisese
Publisher:
ISBN:
Category : Flood forecasting
Languages : en
Pages : 86
Book Description
Publisher:
ISBN:
Category : Flood forecasting
Languages : en
Pages : 86
Book Description
Technique for Estimating Magnitude and Frequency of Floods in Illinois
Author: George W. Curtis
Publisher:
ISBN:
Category : Flood forcasting
Languages : en
Pages : 82
Book Description
Publisher:
ISBN:
Category : Flood forcasting
Languages : en
Pages : 82
Book Description
Automata, Languages and Programming
Author: Peter Widmayer
Publisher: Springer Science & Business Media
ISBN: 9783540438649
Category : Computers
Languages : en
Pages : 1100
Book Description
This book constitutes the refereed proceedings of the 29th International Colloquium on Automata, Languages and Programming, ICALP 2002, held in Malaga, Spain, in July 2002. The 83 revised full papers presented together with 7 invited papers were carefully reviewed and selected from a total of 269 submissions. All current aspects of theoretical computer science are addressed and major new results are presented.
Publisher: Springer Science & Business Media
ISBN: 9783540438649
Category : Computers
Languages : en
Pages : 1100
Book Description
This book constitutes the refereed proceedings of the 29th International Colloquium on Automata, Languages and Programming, ICALP 2002, held in Malaga, Spain, in July 2002. The 83 revised full papers presented together with 7 invited papers were carefully reviewed and selected from a total of 269 submissions. All current aspects of theoretical computer science are addressed and major new results are presented.
Estimation of Peak-discharge Frequency of Urban Streams in Jefferson County, Kentucky
Author:
Publisher:
ISBN:
Category : Urban runoff
Languages : en
Pages : 54
Book Description
Publisher:
ISBN:
Category : Urban runoff
Languages : en
Pages : 54
Book Description
A Method of Estimating Flood-frequency Parameters for Streams in Idaho
Author: L. C. Kjelstrom
Publisher:
ISBN:
Category : Flood forecasting
Languages : en
Pages : 112
Book Description
Publisher:
ISBN:
Category : Flood forecasting
Languages : en
Pages : 112
Book Description
Estimating the Magnitude and Frequency of Low Flows of Streams in Massachusetts
Author: John C. Risley
Publisher:
ISBN:
Category : Stream measurements
Languages : en
Pages : 42
Book Description
...Presents techniques for estimating 7 day, 2 year and 7 day, 10 year flows at continuous and partial record streamflow gaging stations and techniques for estimating these values at ungaged stream sites...
Publisher:
ISBN:
Category : Stream measurements
Languages : en
Pages : 42
Book Description
...Presents techniques for estimating 7 day, 2 year and 7 day, 10 year flows at continuous and partial record streamflow gaging stations and techniques for estimating these values at ungaged stream sites...