Receiving File Uploads in Go
File uploads are a crucial aspect of many programming projects, particularly those that involve user-generated content. Whether it’s a social media platform that allows users to upload photos and videos, or an e-commerce site that allows users to upload product images, receiving file uploads is often a necessary component of a project’s functionality.
Without the ability to receive files, the usefulness of these platforms would be limited. Therefore, implementing file uploads is a common requirement for many programming projects. So as software builders, we have to know how to manage file uploads.
There are several important points to consider while programming a file upload endpoint. Here are some of the considerations:
- Acceptable File Types: Determine what types of files your server will accept. It’s important to consider security vulnerabilities that can arise from allowing certain file types, such as executable files.
- File Size Limit: Determine the maximum file size that can be uploaded. This will prevent malicious actors from attempting to overload your server with large files.
- Request Timeout: Set a reasonable request timeout to avoid server overload or denial of service (DoS) attacks.
- File Validation: Validate the file to ensure it meets the expected criteria, such as file type and size. This helps prevent malicious actors from attempting to upload files with harmful content.
- File Storage: Determine how the uploaded files will be stored on the server. Will they be stored locally or on a remote storage service?
- Response Format: Determine the response format for successful file uploads, including any metadata that will be returned to the client.
Receiving file uploads in a server can be accomplished through various methods such as multipart uploads and sending file contents as binary in the request body. Let’s first look at these methods, and in the end, I’ll share a more efficient method to handle file uploads.
Handling File Uploads With multipart/form-data
Multipart/form-data is a popular and widely supported method for file uploads over HTTP. It allows for transfer of multiple files and other form data in a single request.
When using multipart form data, the file to be uploaded is typically assigned to a field in the multipart form, along with any other data that needs to be sent. The client sends a POST request with the content type set to multipart/form-data, and the body of the request contains a series of parts, each corresponding to a field in the form. The part containing the file data is marked with a content type of application/octet-stream, and includes additional headers indicating the name of the file being uploaded, and any metadata associated with it.
The additional headers in the multipart upload can increase the overall size of the request, but the impact is generally minimal for small to medium-sized files.
Multipart uploads are better suited for large files because most multipart upload implementations will write the incoming file data to a temporary file on disk if it exceeds a certain size threshold, instead of keeping the entire file in memory. This is often referred to as “streaming” or “chunked” file upload. By streaming the file data to disk, it allows the server to handle much larger files without running out of memory. In Go we can configure this passing the max memory usage as an argument to ParseMultipartForm()
function on http.Request
struct.
Implementing multipart uploads can be more complex than simple file uploads due to the additional headers and the need to handle the file data in chunks. However, with Go standard library we can do it in a couple of lines.
As an example, imagine an ecommerce cms that receives a spreadsheet to import products.
|
|
To parse a multipart form, we have to first call ParseMultipartForm function on http.Request, which parses the request body as multipart/form-data. It receives a maxMemory
argument, and stores the data in memory up to that amount.
The access the file, and its metadata we can call FormFile function on the request struct, passing the name of the form field(part) where we expect a file. It returns multipart.File and *multipart.FileHeader. multipart.File implements io.Reader interface. We can use the multipart.FileHeader to access the metadata of the file.
That’s about it, it’s really simple with Go.
To make sure that we handle file uploads reliably we can set default read and write timeouts on an instance of http.Server
and start listening with these settings.
|
|
Binary File Upload
In this approach, the file is sent as raw binary data in the HTTP request body, without any additional metadata. This method can be useful for simple use cases where only the file content is needed.
This option involves reading the request body into a buffer. As it requires loading the entire file into memory at once, it can be less suitable for large files. However, this option may still be useful for smaller files or in cases where memory usage is not a major concern.
Another downside is that, with this approach, it can be difficult to associate metadata with the uploaded file, such as its original filename or content type, without resorting to custom headers or other non-standard approaches.
Let’s look at an example. Again, in the same scenario, imagine our s.ImportsService.ParseProductImport
function expects an io.Reader
, which is always useful instead of expecting a File
.
Firstly, since we are going to store the contents of the file in memory we should check the content length to make sure we don’t exceed our limit.
|
|
So we are creating a temporary file, and copying the contents of the request body to it. Then we can pass this file down as io.Reader
.
The os.Remove()
function is used to delete the temporary file when the request handler is done.
Note that the first argument to os.CreateTemp()
is the directory where the temporary file should be created. If the empty string "" is used, the default temporary directory for the operating system will be used.
File Uploads Using Cloud Object Storage And Signed URLs
A more scalable way to handle file uploads is to use a cloud object storage such as AWS S3, and pre-signed URLs. With this approach, the server generates a pre-signed URL that the client can use to upload the file directly to S3, instead of uploading the file to the server. This approach saves network bandwidth and reduces server load by offloading the file upload to S3.
After the file is uploaded to s3, the server only needs to receive the details of the uploaded file, such as its location and metadata, which can be handled with a simple API call. By using this approach, the server can handle a larger number of concurrent file uploads while also reducing the risk of file upload failures and minimizing data transfer costs.
However, this approach requires more complicated client side code. The client, first, has to make a request to the server, receive a pre-signed URL, then in turn, upload the file to S3. Next, it has to send the upload response to the server, so that the server can save the metadata. Important point is that, the pre-signed URL must be created with an expiration.
Another point to consider is that it requires additional steps to validate and process the uploaded file, since the server does not have direct access to the file until after it has been uploaded to S3. So if there is an initial processing or validation required, this may not be a suitable approach.
In this example, I would like to use the approach to write handlers I described in this post, because I don’t want to have a global variable for the s3 service even though this is a simple example.
|
|
In this handler, the server just creates a pre-signed URL and returns it back to the client. The return value of PresignPutObject function, is PresignedHTTPRequest which contains the signed URL and some other variables that we are returning to the client.
Please refer to AWS examples repo for the complete code.
I hope this was helpful. As always, any feedback would be appreciated. Thank you for reading.