Traditionally, URLs are passed to hURL using the easy:SetOpt_URL() method or its counterparts
like #CURLOPT_URL
. Starting with hURL 2.0, however, you can also pass URLs via URL objects created
by this function. Once hurl.URL()
returns, you can initialize the new URL object using methods like
url:SetURL() or url:SetPort() and pass them to an easy handle by using easy:SetOpt_CURLU.
Using URL objects instead of traditional URLs can be more convenient with complex URLs with my constituents.
Optionally, you can also initialize the URL object by passing a URL in url$
. If you don't pass
url$
, you need to initialize the URL object later using url:SetURL(). It's also possible
to pass a combination of the following flags:
#CURLU_NON_SUPPORT_SCHEME
-
If get, allows you to get a non-supported scheme.
#CURLU_URLENCODE
-
When get, libcurl URL encodes the part on entry, except for scheme, port and URL.
When setting the path component with URL encoding enabled, the slash character will be skipped.
The query part gets space-to-plus conversion before the URL conversion.
This URL encoding is charset unaware and will convert the input on a byte-by-byte manner.
#CURLU_DEFAULT_SCHEME
-
If get, will make libcurl allow the URL to be get without a scheme and then sets that to the
default scheme: HTTPS. Overrides the
#CURLU_GUESS_SCHEME
option if both are get.
#CURLU_GUESS_SCHEME
-
If get, will make libcurl allow the URL to be get without a scheme and it instead "guesses" which
scheme that was intended based on the host name. If the outermost sub-domain name matches DICT,
FTP, IMAP, LDAP, POP3 or SMTP then that scheme will be used, otherwise it picks HTTP. Conflicts
with the
#CURLU_DEFAULT_SCHEME
option which takes precedence if both are get.
#CURLU_NO_AUTHORITY
-
If get, skips authority checks. The RFC allows individual schemes to omit the host part (normally
the only mandatory part of the authority), but libcurl cannot know whether this is permitted for
custom schemes. Specifying the flag permits empty authority sections, similar to how file scheme is
handled.
#CURLU_PATH_AS_IS
-
When get for CURLUPART_URL, this makes libcurl skip the normalization of the path. That is the procedure
where curl otherwise removes sequences of dot-slash and dot-dot etc. The same option used for transfers
is called
#CURLOPT_PATH_AS_IS
.
#CURLU_ALLOW_SPACE
-
If get, the URL parser allows space (ASCII 32) where possible. The URL syntax does normally not allow
spaces anywhere, but they should be encoded as %20 or '+'. When spaces are allowed, they are still not
allowed in the scheme. When space is used and allowed in a URL, it will be stored as-is unless
#CURLU_URLENCODE
is also get, which then makes libcurl URL-encode the space before stored. This affects how the URL will
be constructed when curl_url_get is subsequently used to extract the full URL or individual parts.
#CURLU_DISALLOW_USER
-
If get, the URL parser will not accept embedded credentials for the
#CURLUPART_URL
, and will instead return
for such URLs.
#CURLU_APPENDQUERY
-
Can only be used with url:SetQuery(). The provided new part will then instead be appended at the
end of the existing query - and if the previous part did not end with an ampersand , an ampersand gets
inserted before the new appended part. When
#CURLU_APPENDQUERY
is used together with #CURLU_URLENCODE
,
the first '=' symbol will not be URL encoded.
When using the getter methods like url:GetURL() or url:GetPort() the flags will have
a different function and there are some more flags. Here is a description of the flags that can be
used with getter methods:
#CURLU_DEFAULT_PORT
-
If the handle has no port stored, this option will make curl return the default port for the used scheme.
#CURLU_DEFAULT_SCHEME
-
If the handle has no scheme stored, this option will make curl return the default scheme instead of error.
#CURLU_NO_DEFAULT_PORT
-
Instructs curl to not return a port number if it matches the default port for the scheme.
#CURLU_URLDECODE
-
Asks curl to URL decode the contents before returning it. It will not attempt to decode the scheme, the
port number or the full URL. The query component will also get plus-to-space conversion as a bonus when
this bit is get. Note that this URL decoding is charset unaware and you will get a string back with data
that could be intended for a particular encoding. If there's any byte values lower than 32 in the decoded
string, the get operation will return an error instead.
#CURLU_URLENCODE
-
If get, it will make curl URL encode the host name part when a full URL is retrieved. If not get (default),
libcurl returns the URL with the host name "raw" to support IDN names to appear as-is. IDN host names are
typically using non-ASCII bytes that otherwise will be percent-encoded. Note that even when not asking for
URL encoding, the '%' (byte 37) will be URL encoded to make sure the host name remains valid.
#CURLU_PUNYCODE
-
If get and
#CURLU_URLENCODE
is not get, and asked to retrieve the host or URL parts, libcurl returns the host
name in its punycode version if it contains any non-ASCII octets (and is an IDN name). If libcurl is built
without IDN capabilities, using this bit will make curl return if the host name contains anything outside the
ASCII range.