-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathREADME_lanl
208 lines (147 loc) · 8.04 KB
/
README_lanl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
This is not (yet) an officially-released version of aws4c. 0.5.2
represents unofficial extensions to the 0.5 release, by Los Alamos National
Laboratory.
We attempted to preserve functionality of all interfaces provided by 0.5.
Where new functionality was added, we've typically added new interfaces.
(1) Extensions to IOBuf / IOBufNode.
The 0.5 version simply adds strings into a linked list, via a malloc and
strcpy(). This might be acceptable when all that's needed is to parse
individual response headers, but we wanted to support more-efficient
operations when large buffers are being sent or received. Thus, it is now
possible to add data for writing by installing user buffers directly into
an IOBuf's linked list. It is also possible to add storage for receiving
data in a similar way. In either case, the added storage may be static or
dynamic. Here are some typical scenarios.
// GET an object from the server into user's <data_ptr>
// NOTE: this uses the "extend" functions to add unused storage
aws_iobuf_reset(io_buf);
aws_iobuf_extend_static(io_buf, data_ptr, data_length);
AWS4C_CHECK( s3_get(io_buf, obj_name) );
AWS4C_CHECK_OK( io_buf );
// PUT the contents of user's <data_ptr>
// NOTE: this uses the "append" functions to add data
aws_iobuf_reset(io_buf);
aws_iobuf_append_static(io_buf, data_ptr, data_length);
AWS4C_CHECK( s3_put(io_buf, obj_name) );
AWS4C_CHECK_OK( io_buf );
These use cases will typically also want to call aws_iobuf_reset() after
completion, so as to avoid keeping a pointer internal to the IOBuf which
will go out of scope.
(2) Re-using connections
In 0.5, every call to aws4c functions creates a new CURL connection. This
can add overhead to an application that is performing many operations. We
allow the user to specify that connections should be preserved, or to reset
the connection at a specific time.
aws_reuse_connections(1); // begin reusing connections
// ...
aws_reset_connection(); // reset the connection once
// ...
aws_reuse_connections(0); // stop reusing connections
(3) GET/PUT to/from file
Companion-functions to the 0.5 head/get/put/post functions allow user to
specify a file. This also allows user to provide one IOBuf to capture the
response, in addition to one used to send the request.
(4) binary data
The old getline() function was not very useful for reading arbitrary
streams of binary (or text data). We added a get_raw() function that
ignores newlines.
(5) EMC extensions
EMC supports some extensions to "pure" S3, such as using byte-ranges to
write parts of an object in parallel, or to append to an object. These
normally aren't legal, so you have to call a special function to allow the
library support to be used:
// use byte-range to append to an object
s3_enable_EMC_extensions(1);
s3_set_byte_range(-1, -1); // creates "Range: bytes=-1-"
s3_put(io_buf, obj_name);
// another interface to append to an object
s3_enable_EMC_extensions(1);
emc_put_append(io_buf, obj_name);
// instead of multi-part upload ...
s3_enable_EMC_extensions(1);
s3_set_byte_range(offset, length);
s3_put(io_buf, obj_name);
(6) Scality sproxyd extensions
Scality sproxyd is now supported, as well. This is basically just a "pure
curl" interface, without the S3 authentication. Should we just rename this
to something like pure curl, instead of sproxyd? Test 100 in test_aws.c
shows how to use it.
Basically, the S3 "bucket" just becomes the first part of the path, which
for sproxyd is the "proxy" that fastcgi on the server uses to dispatch to
sproxyd. The second part of the path (i.e. first part of the S3
"object_name") becomes the sproxyd "namespace". This coresponds with the
driver name in the /etc/sproxyd.conf setup on the server.
How do you know which tings are EMC and which are Scality? Well, the code
currently throws an error (with a meaningful message) if you try to do
something special that requires one set of extensions or the other, and you
haven't enabled them yet. The two extensions are considered
mutually-exclusive, so it is also an error to try to set them both.
(7) user metadata on objects
You can add custom metadata to objects. You build metadata as a distinct
structure, which you can then install into an iobuf. This allows you to
reuse metadata, if you want to. When parsing replies, we can also parse
out user-metadata.
(8) streaming reads/writes, and chunked transfer-encoding
These are independent things that you can do, or you can combine them. The
motivation for a streaming PUT (for example), is that you have a
data-producer that is providing data buffers, and you want to add these
buffers into the ongoing PUT stream as they become available, rather than
buffering an entire object before writing.
So, you supply a custom curl readfunc, which blocks (e.g. on a POSIX
semaphore) until data is available, then adds it to the stream (e.g. by
moving it into curl's buffer given to the readfunc, and returning).
See case 11 and 12 in test_aws.c, for a crude example. There is also a
file called object_stream.c in the MarFS file-system
(https://github.com/mar-file-system/marfs.git) which uses this capability
in a more-refined way. The latter is in fact the reason for this extension
to aws4c.
It may be that you don't know ahead of time how much data your
data-producer will produce. In this case, you can't supply a length
(CURLOPT_INFILESIZE) in the initial PUT header. So, you can now get
libaws4c to suppress the INFILESIZE header, and to provide a
"Transfer-encoding: chunked" header, instead. Just call
aws_iobuf_chunked_transfer_encoding(), to enable/disable.
Note that when you enable CTE, you do not have to implement the CTE
header/footer yourself (e.g. in a custom readfunc). Curl will generate the
header/footer itself, and call whatever readfunc is used with a buff-size
reduced by 12 bytes (to account for the CTE header/footer).
IOBuf now also supports an "avail" member, which tells how much unread data
is there.
(9) thread-safety
All global variables have been moved into a new AWSContext struct. There
is a global instance of this struct, which is used by default everywhere.
However, users can now also create their own contexts and attach them to
individual IOBufs. This allows multiple threads calling into libaws4c to
avoid stepping on each other's parameters.
The GET/PUT/DELETE requests generated by this library will look for a
context in the IOBuf, falling back to the default context, otherwise.
Thus, old code should continue to work without modification.
All the old interfaces continue to work the same as they did before
(i.e. not thread-safe). The old functions that manipulated global
variables (e.g. aws_set_id(), or s3_set_host()) now just manipulate the
default context. However, these functions now also have "_r" variants
(e.g. aws_set_id_r(), or s3_set_host_r()), which take an extra context
argument, changing the settings only in that context.
In other words, if you want/need thread-safety, you can now do something
like this:
AWSContext* ctx = aws_context_new();
aws_set_host_r(ctx, myhost);
IOBuf* b = aws_iobuf_new();
aws_iobuf_context(b, ctx);
s3_put(b ...);
aws_iobuf_reset_hard(b); // frees context, if present
aws_iobuf_free(b);
(10) extras
The makefile builds a library, libaws4c. We also provide some
debugging-functions (e.g. pretty-print some diagnostics about an IOBuf) and
XML support, which probably should not be part of the default library.
Therefore, these have their own header (aws4c_extras.h) and are built into
a separate library libaws4c_extras. This allows test-apps to use extra
functionality, without requiring the production library to be as big.
(11) unit-tests
Feel free to add new unit-tests to test_aws.c This allows simple tests of
new functions, and provides a crude regression-test.
NOTE: There are some tests that currently depend on POSIX threads. Because
this may not build on some systems, this is an option that is disabled by
default. If you explicitly call make test_aws_threaded, you'll get the
pthreads tests, as well.