Skip to content

Commit

Permalink
Add iterator support to url_search_params (#532)
Browse files Browse the repository at this point in the history
  • Loading branch information
jasnell authored Oct 10, 2023
1 parent 88cded9 commit 2f1130a
Show file tree
Hide file tree
Showing 8 changed files with 653 additions and 6 deletions.
36 changes: 31 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@ The Ada library passes the full range of tests from the specification,
across a wide range of platforms (e.g., Windows, Linux, macOS). It fully
supports the relevant [Unicode Technical Standard](https://www.unicode.org/reports/tr46/#ToUnicode).

A common use of a URL parser is to take a URL string and normalize it.
The WHATWG URL specification has been adopted by most browsers. Other tools, such as curl and many
A common use of a URL parser is to take a URL string and normalize it.
The WHATWG URL specification has been adopted by most browsers. Other tools, such as curl and many
standard libraries, follow the RFC 3986. The following table illustrates possible differences in practice
(encoding of the host, encoding of the path):

Expand All @@ -30,10 +30,10 @@ standard libraries, follow the RFC 3986. The following table illustrates possibl
The project is otherwise self-contained and it has no dependency.
A recent C++ compiler supporting C++17. We test GCC 9 or better, LLVM 10 or better and Microsoft Visual Studio 2022.

## Ada is fast.
## Ada is fast.

On a benchmark where we need to validate and normalize [thousands URLs found
on popular websites](https://github.com/ada-url/url-various-datasets/tree/main/top100),
on popular websites](https://github.com/ada-url/url-various-datasets/tree/main/top100),
we find that ada can be several times faster than popular competitors (system: Apple MacBook 2022
with LLVM 14).

Expand Down Expand Up @@ -201,6 +201,21 @@ url->set_hash("is-this-the-real-life");
// url->get_hash() will return "#is-this-the-real-life"
```
For more information about command-line options, please refer to the [CLI documentation](docs/cli.md).
- URL search params
```cpp
ada::url_search_params search_params("a=b&c=d&e=f");
search_params.append("g=h");
search_params.get("g"); // will return "h"
auto keys = search_params.get_keys();
while (keys.has_next()) {
auto key = keys.next(); // "a", "c", "e", "g"
}
```

### C wrapper

See the file `include/ada_c.h` for our C interface. We expect ASCII or UTF-8 strings.
Expand Down Expand Up @@ -231,6 +246,17 @@ int main(int c, char *arg[] ) {
ada_set_search(url, "new-search");
ada_set_protocol(url, "wss");
ada_print(ada_get_href(url)); // will print wss://changed-host:9090/new-pathname?new-search#new-hash

// Manipulating search params
ada_string search = ada_get_search(url);
ada_url_search_params search_params =
ada_parse_search_params(search.data, search.length);
ada_search_params_append(search_params, "a", 1, "b", 1);
ada_owned_string result = ada_search_params_to_string(search_params);
ada_set_search(url, result.data, result.length);
ada_free_owned_string(result);
ada_free_search_params(search_params);

ada_free(url);
return EXIT_SUCCESS;
}
Expand Down Expand Up @@ -283,6 +309,6 @@ You may amalgamate all source files into only two files (`ada.h` and `ada.cpp`)
### License
This code is made available under the Apache License 2.0 as well as the MIT license.
This code is made available under the Apache License 2.0 as well as the MIT license.
Our tests include third-party code and data. The benchmarking code includes third-party code: it is provided for research purposes only and not part of the library.
15 changes: 15 additions & 0 deletions fuzz/parse.cc
Original file line number Diff line number Diff line change
Expand Up @@ -121,5 +121,20 @@ extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
search_params.remove(base_source, source);
}

auto keys = search_params.get_keys();
while (keys.has_next()) {
keys.next();
}

auto values = search_params.get_values();
while (values.has_next()) {
values.next();
}

auto entries = search_params.get_entries();
while (entries.has_next()) {
entries.next();
}

return 0;
} // extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
46 changes: 46 additions & 0 deletions include/ada/url_search_params-inl.h
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,10 @@

namespace ada {

// A default, empty url_search_params for use with empty iterators.
template <typename T, ada::url_search_params_iter_type Type>
url_search_params url_search_params_iter<T, Type>::EMPTY;

inline void url_search_params::initialize(std::string_view input) {
if (!input.empty() && input.front() == '?') {
input.remove_prefix(1);
Expand Down Expand Up @@ -165,6 +169,48 @@ inline void url_search_params::sort() {
});
}

inline url_search_params_keys_iter url_search_params::get_keys() {
return url_search_params_keys_iter(*this);
}

/**
* @see https://url.spec.whatwg.org/#interface-urlsearchparams
*/
inline url_search_params_values_iter url_search_params::get_values() {
return url_search_params_values_iter(*this);
}

/**
* @see https://url.spec.whatwg.org/#interface-urlsearchparams
*/
inline url_search_params_entries_iter url_search_params::get_entries() {
return url_search_params_entries_iter(*this);
}

template <typename T, url_search_params_iter_type Type>
inline bool url_search_params_iter<T, Type>::has_next() {
return pos < params.params.size();
}

template <>
inline std::optional<std::string_view> url_search_params_keys_iter::next() {
if (!has_next()) return std::nullopt;
return params.params[pos++].first;
}

template <>
inline std::optional<std::string_view> url_search_params_values_iter::next() {
if (!has_next()) return std::nullopt;
return params.params[pos++].second;
}

template <>
inline std::optional<key_value_view_pair>
url_search_params_entries_iter::next() {
if (!has_next()) return std::nullopt;
return params.params[pos++];
}

} // namespace ada

#endif // ADA_URL_SEARCH_PARAMS_INL_H
92 changes: 92 additions & 0 deletions include/ada/url_search_params.h
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,26 @@

namespace ada {

enum class url_search_params_iter_type {
KEYS,
VALUES,
ENTRIES,
};

template <typename T, url_search_params_iter_type Type>
struct url_search_params_iter;

typedef std::pair<std::string_view, std::string_view> key_value_view_pair;

using url_search_params_keys_iter =
url_search_params_iter<std::string_view, url_search_params_iter_type::KEYS>;
using url_search_params_values_iter =
url_search_params_iter<std::string_view,
url_search_params_iter_type::VALUES>;
using url_search_params_entries_iter =
url_search_params_iter<key_value_view_pair,
url_search_params_iter_type::ENTRIES>;

/**
* @see https://url.spec.whatwg.org/#interface-urlsearchparams
*/
Expand Down Expand Up @@ -74,6 +94,42 @@ struct url_search_params {
*/
inline std::string to_string();

/**
* Returns a simple JS-style iterator over all of the keys in this
* url_search_params. The keys in the iterator are not unique. The valid
* lifespan of the iterator is tied to the url_search_params. The iterator
* must be freed when you're done with it.
* @see https://url.spec.whatwg.org/#interface-urlsearchparams
*/
inline url_search_params_keys_iter get_keys();

/**
* Returns a simple JS-style iterator over all of the values in this
* url_search_params. The valid lifespan of the iterator is tied to the
* url_search_params. The iterator must be freed when you're done with it.
* @see https://url.spec.whatwg.org/#interface-urlsearchparams
*/
inline url_search_params_values_iter get_values();

/**
* Returns a simple JS-style iterator over all of the entries in this
* url_search_params. The entries are pairs of keys and corresponding values.
* The valid lifespan of the iterator is tied to the url_search_params. The
* iterator must be freed when you're done with it.
* @see https://url.spec.whatwg.org/#interface-urlsearchparams
*/
inline url_search_params_entries_iter get_entries();

/**
* C++ style conventional iterator support. const only because we
* do not really want the params to be modified via the iterator.
*/
inline const auto begin() const { return params.begin(); }
inline const auto end() const { return params.end(); }
inline const auto front() const { return params.front(); }
inline const auto back() const { return params.back(); }
inline const auto operator[](size_t index) const { return params[index]; }

private:
typedef std::pair<std::string, std::string> key_value_pair;
std::vector<key_value_pair> params{};
Expand All @@ -82,7 +138,43 @@ struct url_search_params {
* @see https://url.spec.whatwg.org/#concept-urlencoded-parser
*/
void initialize(std::string_view init);

template <typename T, url_search_params_iter_type Type>
friend struct url_search_params_iter;
}; // url_search_params

/**
* Implements a non-conventional iterator pattern that is closer in style to
* JavaScript's definition of an iterator.
*
* @see https://webidl.spec.whatwg.org/#idl-iterable
*/
template <typename T, url_search_params_iter_type Type>
struct url_search_params_iter {
inline url_search_params_iter() : params(EMPTY) {}
url_search_params_iter(const url_search_params_iter &u) = default;
url_search_params_iter(url_search_params_iter &&u) noexcept = default;
url_search_params_iter &operator=(url_search_params_iter &&u) noexcept =
default;
url_search_params_iter &operator=(const url_search_params_iter &u) = default;
~url_search_params_iter() = default;

/**
* Return the next item in the iterator or std::nullopt if done.
*/
inline std::optional<T> next();

inline bool has_next();

private:
static url_search_params EMPTY;
inline url_search_params_iter(url_search_params &params_) : params(params_) {}

url_search_params &params;
size_t pos = 0;

friend struct url_search_params;
};

} // namespace ada
#endif
73 changes: 73 additions & 0 deletions include/ada_c.h
Original file line number Diff line number Diff line change
Expand Up @@ -109,4 +109,77 @@ const ada_url_components* ada_get_components(ada_url result);
ada_owned_string ada_idna_to_unicode(const char* input, size_t length);
ada_owned_string ada_idna_to_ascii(const char* input, size_t length);

// url search params
typedef void* ada_url_search_params;

// Represents an std::vector<std::string>
typedef void* ada_strings;
typedef void* ada_url_search_params_keys_iter;
typedef void* ada_url_search_params_values_iter;

typedef struct {
ada_string key;
ada_string value;
} ada_string_pair;

typedef void* ada_url_search_params_entries_iter;

ada_url_search_params ada_parse_search_params(const char* input, size_t length);
void ada_free_search_params(ada_url_search_params result);

size_t ada_search_params_size(ada_url_search_params result);
void ada_search_params_sort(ada_url_search_params result);
ada_owned_string ada_search_params_to_string(ada_url_search_params result);

void ada_search_params_append(ada_url_search_params result, const char* key,
size_t key_length, const char* value,
size_t value_length);
void ada_search_params_set(ada_url_search_params result, const char* key,
size_t key_length, const char* value,
size_t value_length);
void ada_search_params_remove(ada_url_search_params result, const char* key,
size_t key_length);
void ada_search_params_remove_value(ada_url_search_params result,
const char* key, size_t key_length,
const char* value, size_t value_length);
bool ada_search_params_has(ada_url_search_params result, const char* key,
size_t key_length);
bool ada_search_params_has_value(ada_url_search_params result, const char* key,
size_t key_length, const char* value,
size_t value_length);
ada_string ada_search_params_get(ada_url_search_params result, const char* key,
size_t key_length);
ada_strings ada_search_params_get_all(ada_url_search_params result,
const char* key, size_t key_length);
ada_url_search_params_keys_iter ada_search_params_get_keys(
ada_url_search_params result);
ada_url_search_params_values_iter ada_search_params_get_values(
ada_url_search_params result);
ada_url_search_params_entries_iter ada_search_params_get_entries(
ada_url_search_params result);

void ada_free_strings(ada_strings result);
size_t ada_strings_size(ada_strings result);
ada_string ada_strings_get(ada_strings result, size_t index);

void ada_free_search_params_keys_iter(ada_url_search_params_keys_iter result);
ada_string ada_search_params_keys_iter_next(
ada_url_search_params_keys_iter result);
bool ada_search_params_keys_iter_has_next(
ada_url_search_params_keys_iter result);

void ada_free_search_params_values_iter(
ada_url_search_params_values_iter result);
ada_string ada_search_params_values_iter_next(
ada_url_search_params_values_iter result);
bool ada_search_params_values_iter_has_next(
ada_url_search_params_values_iter result);

void ada_free_search_params_entries_iter(
ada_url_search_params_entries_iter result);
ada_string_pair ada_search_params_entries_iter_next(
ada_url_search_params_entries_iter result);
bool ada_search_params_entries_iter_has_next(
ada_url_search_params_entries_iter result);

#endif // ADA_C_H
Loading

0 comments on commit 2f1130a

Please sign in to comment.