Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 87 additions & 0 deletions tests/btree_index_tests.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -318,4 +318,91 @@ TEST_F(BTreeIndexTests, DataPersistenceAcrossOpenClose) {
EXPECT_EQ(results[0].slot_num, 0U);
}

TEST_F(BTreeIndexTests, InsertManyTextKeys_FillLeaf) {
// Use a fresh text index to avoid interference
auto text_index = std::make_unique<BTreeIndex>("text_fill_idx", *bpm_, ValueType::TYPE_TEXT);
ASSERT_TRUE(text_index->create());
ASSERT_TRUE(text_index->open());

// Insert entries with increasingly long text keys to fill the leaf page
// Each entry: type|lexeme|page|slot| where type=11 (TEXT)
// Header is 12 bytes, so data area is ~4084 bytes.
// With small strings (~10 bytes each): ~30 bytes/entry → ~136 entries fit
// Use longer strings (~100 bytes) to fit fewer entries
int count = 0;
for (int i = 0; i < 500; ++i) {
std::string key = "key_" + std::to_string(i) + "_" + std::string(80, 'x');
auto rid = make_rid(1, static_cast<uint16_t>(i));
if (!text_index->insert(Value::make_text(key), rid)) {
// Leaf full - insert returns false
count = i;
break;
}
count = i + 1;
}
// Verify we inserted some but hit the limit
EXPECT_GT(count, 0);
EXPECT_LE(count, 500);
// The first insert should succeed
EXPECT_GE(count, 1);

text_index->close();
std::remove("./test_idx_data/text_fill_idx.idx");
}
Comment thread
coderabbitai[bot] marked this conversation as resolved.

TEST_F(BTreeIndexTests, ScanIterator_TextKeyDeserialization) {
// Use a fresh text index
auto text_index = std::make_unique<BTreeIndex>("text_scan_idx", *bpm_, ValueType::TYPE_TEXT);
ASSERT_TRUE(text_index->create());
ASSERT_TRUE(text_index->open());

// Insert text keys - the scan iterator should deserialize via the else branch at
// btree_index.cpp:87-89
EXPECT_TRUE(text_index->insert(Value::make_text("apple"), make_rid(1, 0)));
EXPECT_TRUE(text_index->insert(Value::make_text("banana"), make_rid(2, 0)));
EXPECT_TRUE(text_index->insert(Value::make_text("cherry"), make_rid(3, 0)));

auto it = text_index->scan();
EXPECT_FALSE(it.is_done());

BTreeIndex::Entry entry;
int entries_found = 0;
while (it.next(entry)) {
entries_found++;
// Text key deserialization: val = Value::make_text(lexeme)
EXPECT_TRUE(entry.key.is_null() || entry.key.type() == ValueType::TYPE_TEXT);
}
EXPECT_EQ(entries_found, 3);
Comment on lines +376 to +381

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm the accessor name for text values on common::Value
ast-grep --pattern 'class Value { $$$ }'
rg -nP '\b(as_text|text_value|get_text|as_string)\s*\(' --type=cpp -g '!**/build/**'

Repository: poyrazK/cloudSQL

Length of output: 4145


🏁 Script executed:

# Check the actual test code at lines 370-375
cat -n tests/btree_index_tests.cpp | sed -n '365,380p'

Repository: poyrazK/cloudSQL

Length of output: 654


🏁 Script executed:

# Check the btree_index.cpp file around lines 87-89 to understand what needs testing
cat -n src/storage/btree_index.cpp | sed -n '80,100p'

Repository: poyrazK/cloudSQL

Length of output: 1059


🏁 Script executed:

# Get the context of the entire test function containing lines 370-375
cat -n tests/btree_index_tests.cpp | sed -n '340,395p'

Repository: poyrazK/cloudSQL

Length of output: 2619


Assertion is too permissive to validate text-key deserialization.

entry.key.is_null() || entry.key.type() == TYPE_TEXT passes even if deserialization silently produces null keys, which defeats the stated purpose of covering the text-key branch at btree_index.cpp:87–89. Also, the test counts 3 entries but never checks that the deserialized lexemes match "apple", "banana", and "cherry", so a regression that yields the correct count but wrong/empty strings would not be caught.

Assert the type strictly and validate the lexeme set:

Suggested fix
-    while (it.next(entry)) {
-        entries_found++;
-        // Text key deserialization: val = Value::make_text(lexeme)
-        EXPECT_TRUE(entry.key.is_null() || entry.key.type() == ValueType::TYPE_TEXT);
-    }
-    EXPECT_EQ(entries_found, 3);
+    std::set<std::string> seen;
+    while (it.next(entry)) {
+        ASSERT_EQ(entry.key.type(), ValueType::TYPE_TEXT);
+        seen.insert(entry.key.as_text());
+    }
+    EXPECT_EQ(seen, (std::set<std::string>{"apple", "banana", "cherry"}));
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
while (it.next(entry)) {
entries_found++;
// Text key deserialization: val = Value::make_text(lexeme)
EXPECT_TRUE(entry.key.is_null() || entry.key.type() == ValueType::TYPE_TEXT);
}
EXPECT_EQ(entries_found, 3);
std::set<std::string> seen;
while (it.next(entry)) {
ASSERT_EQ(entry.key.type(), ValueType::TYPE_TEXT);
seen.insert(entry.key.as_text());
}
EXPECT_EQ(seen, (std::set<std::string>{"apple", "banana", "cherry"}));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/btree_index_tests.cpp` around lines 370 - 375, The current assertion
allows null keys and doesn't verify actual lexemes; update the loop that
iterates with it.next(entry) to require entry.key.type() == ValueType::TYPE_TEXT
(remove the is_null() allowance), extract the text from entry.key using the same
deserialization used in Value::make_text (the lexeme accessor on Value),
accumulate the lexemes into a set, and after the loop assert entries_found == 3
and that the set equals {"apple","banana","cherry"} to ensure correct text-key
deserialization (this targets the iterator/entry usage and the Value/ValueType
text branch exercised in btree_index.cpp).

EXPECT_TRUE(it.is_done());

text_index->close();
std::remove("./test_idx_data/text_scan_idx.idx");
}

TEST_F(BTreeIndexTests, InsertReturnsFalse_WhenLeafFull) {
// Use a fresh index with a key type that allows filling the page
auto fill_index = std::make_unique<BTreeIndex>("fill_idx", *bpm_, ValueType::TYPE_TEXT);
ASSERT_TRUE(fill_index->create());
ASSERT_TRUE(fill_index->open());

// Insert with long text to quickly fill the 4084-byte data area
// Each entry: "11|{80-char string}|65535|0|" ≈ 100 bytes → ~40 entries per page
for (int i = 0; i < 60; ++i) {
std::string long_key = std::string(80, 'A' + (i % 26));
auto rid = make_rid(1, static_cast<uint16_t>(i));
bool result = fill_index->insert(Value::make_text(long_key), rid);
if (!result) {
// Should fail once leaf is full (around entry 40)
EXPECT_GE(i, 30); // Should have inserted at least 30
fill_index->close();
std::remove("./test_idx_data/fill_idx.idx");
return;
}
}
// If we inserted 60 without failure, the space check isn't working as expected
// This still exercises the insert path; the test verifies at least some inserts work
fill_index->close();
std::remove("./test_idx_data/fill_idx.idx");
}
Comment thread
coderabbitai[bot] marked this conversation as resolved.

} // namespace
172 changes: 172 additions & 0 deletions tests/heap_table_tests.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -696,4 +696,176 @@ TEST_F(HeapTableTests, Remove_WhenTupleNotOnCachedPage) {
EXPECT_EQ(table_->tuple_count(), before_count - 1);
}

// ============= Insert + Retrieve with Different Types =============

TEST_F(HeapTableTests, InsertAndRetrieveBool) {
ASSERT_TRUE(table_->create());

// Build a schema with BOOL column
auto bool_schema = std::make_unique<Schema>();
bool_schema->add_column("id", ValueType::TYPE_INT64, false);
bool_schema->add_column("flag", ValueType::TYPE_BOOL, false);

HeapTable bool_table("bool_table", *bpm_, *bool_schema);
ASSERT_TRUE(bool_table.create());

// Insert true and false
auto tuple_true = Tuple({Value::make_int64(1), Value::make_bool(true)});
auto tuple_false = Tuple({Value::make_int64(2), Value::make_bool(false)});

auto rid_true = bool_table.insert(tuple_true);
auto rid_false = bool_table.insert(tuple_false);

EXPECT_FALSE(rid_true.is_null());
EXPECT_FALSE(rid_false.is_null());

Tuple out_true;
Tuple out_false;
ASSERT_TRUE(bool_table.get(rid_true, out_true));
ASSERT_TRUE(bool_table.get(rid_false, out_false));

EXPECT_TRUE(out_true.get(1).as_bool());
EXPECT_FALSE(out_false.get(1).as_bool());

// Scan verifies both
auto it = bool_table.scan();
int count = 0;
Tuple t;
while (it.next(t)) {
count++;
}
EXPECT_EQ(count, 2);

std::remove("./test_data/bool_table.heap");
}
Comment thread
coderabbitai[bot] marked this conversation as resolved.

TEST_F(HeapTableTests, InsertAndRetrieveFloat) {
ASSERT_TRUE(table_->create());

// Build a schema with FLOAT64 column
auto float_schema = std::make_unique<Schema>();
float_schema->add_column("id", ValueType::TYPE_INT64, false);
float_schema->add_column("value", ValueType::TYPE_FLOAT64, false);

HeapTable float_table("float_table", *bpm_, *float_schema);
ASSERT_TRUE(float_table.create());

auto tuple1 = Tuple({Value::make_int64(1), Value::make_float64(3.14159)});
auto tuple2 = Tuple({Value::make_int64(2), Value::make_float64(2.71828)});

auto rid1 = float_table.insert(tuple1);
auto rid2 = float_table.insert(tuple2);

EXPECT_FALSE(rid1.is_null());
EXPECT_FALSE(rid2.is_null());

Tuple out1;
Tuple out2;
ASSERT_TRUE(float_table.get(rid1, out1));
ASSERT_TRUE(float_table.get(rid2, out2));

EXPECT_DOUBLE_EQ(out1.get(1).as_float64(), 3.14159);
EXPECT_DOUBLE_EQ(out2.get(1).as_float64(), 2.71828);

std::remove("./test_data/float_table.heap");
}

TEST_F(HeapTableTests, TupleView_GetFloatValue) {
ASSERT_TRUE(table_->create());

auto float_schema = std::make_unique<Schema>();
float_schema->add_column("id", ValueType::TYPE_INT64, false);
float_schema->add_column("score", ValueType::TYPE_FLOAT64, false);

HeapTable float_table("float_view_table", *bpm_, *float_schema);
ASSERT_TRUE(float_table.create());

auto tuple = Tuple({Value::make_int64(99), Value::make_float64(1.41421)});
float_table.insert(tuple);

auto it = float_table.scan();
HeapTable::TupleView view;
ASSERT_TRUE(it.next_view(view));

// Verify xmin/xmax are 0 (no MVCC)
EXPECT_EQ(view.xmin, 0U);
EXPECT_EQ(view.xmax, 0U);

// get_value on column 1 (FLOAT64)
auto val = view.get_value(1);
EXPECT_DOUBLE_EQ(val.as_float64(), 1.41421);

std::remove("./test_data/float_view_table.heap");
}

// ============= write_page() Non-Cached Path =============

TEST_F(HeapTableTests, WritePage_NonCachedPath) {
ASSERT_TRUE(table_->create());
// Insert a tuple to cache page 0
table_->insert(make_test_tuple(1, "Cached"));

// write_page(page 1, ...) with no prior insert to page 1
// should go through the non-cached path (bpm_.new_page or fetch + write)
char buffer[Page::PAGE_SIZE];
std::memset(buffer, 0, Page::PAGE_SIZE);

// Write a custom page 1 with page header
HeapTable::PageHeader hdr{};
hdr.next_page = 0;
hdr.num_slots = 1;
hdr.free_space_offset = sizeof(HeapTable::PageHeader) + 32;
hdr.flags = 0;
std::memcpy(buffer, &hdr, sizeof(HeapTable::PageHeader));

EXPECT_TRUE(table_->write_page(1, buffer));

// Verify we can read it back
char read_buf[Page::PAGE_SIZE];
ASSERT_TRUE(table_->read_page(1, read_buf));

HeapTable::PageHeader out_hdr;
std::memcpy(&out_hdr, read_buf, sizeof(HeapTable::PageHeader));
EXPECT_EQ(out_hdr.num_slots, 1U);
}

// ============= physical_remove Non-Cached Page =============

TEST_F(HeapTableTests, PhysicalRemove_NonCachedPage) {
ASSERT_TRUE(table_->create());
// Insert enough tuples to span pages (150 tuples × ~40 bytes each ≈ 6000 bytes per page)
std::vector<HeapTable::TupleId> rids;
for (int i = 0; i < 160; ++i) {
rids.push_back(table_->insert(make_test_tuple(i, "User" + std::to_string(i))));
}
EXPECT_EQ(table_->tuple_count(), 160U);

// rids[0] is on page 0; rids[150+] likely on page 1+
// Find an RID with page_num > 0
HeapTable::TupleId later_rid;
for (const auto& rid : rids) {
if (rid.page_num > 0) {
later_rid = rid;
break;
}
}
// Fallback: if all on page 0 (small tuples), use an extreme slot
if (later_rid.is_null()) {
later_rid = rids.back(); // Last inserted
}

// Unpin it by scanning to advance past it, then physical_remove
auto it = table_->scan();
Tuple tmp;
while (it.next(tmp)) {
(void)tmp;
}
(void)later_rid; // suppress unused warning in non-page>0 case

// Actually, just use page 0 with slot 0 if later_rid is the only one
const auto before_count = table_->tuple_count();
EXPECT_TRUE(table_->physical_remove(later_rid));
EXPECT_EQ(table_->tuple_count(), before_count - 1);
}
Comment on lines +837 to +872

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Ensure this actually exercises physical_remove() on a non-cached page.

Line 846 selects the first RID on page_num > 0, but after the insert loop the cached page is likely the last page — often that same page. The scan loop does not clear HeapTable’s insertion cache, so this can pass without covering the intended branch.

🎯 Proposed test setup fix
-    // rids[0] is on page 0; rids[150+] likely on page 1+
-    // Find an RID with page_num > 0
-    HeapTable::TupleId later_rid;
-    for (const auto& rid : rids) {
-        if (rid.page_num > 0) {
-            later_rid = rid;
-            break;
-        }
-    }
-    // Fallback: if all on page 0 (small tuples), use an extreme slot
-    if (later_rid.is_null()) {
-        later_rid = rids.back();  // Last inserted
-    }
-
-    // Unpin it by scanning to advance past it, then physical_remove
-    auto it = table_->scan();
-    Tuple tmp;
-    while (it.next(tmp)) {
-        (void)tmp;
-    }
-    (void)later_rid;  // suppress unused warning in non-page>0 case
-
-    // Actually, just use page 0 with slot 0 if later_rid is the only one
+    // The final insert should leave a later page cached; removing page 0 then
+    // exercises physical_remove's non-cached-page path.
+    ASSERT_GT(rids.back().page_num, 0U) << "test setup must span multiple pages";
+    const auto non_cached_rid = rids.front();
+    ASSERT_EQ(non_cached_rid.page_num, 0U);
+
     const auto before_count = table_->tuple_count();
-    EXPECT_TRUE(table_->physical_remove(later_rid));
+    EXPECT_TRUE(table_->physical_remove(non_cached_rid));
     EXPECT_EQ(table_->tuple_count(), before_count - 1);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/heap_table_tests.cpp` around lines 834 - 869, The test currently may
not exercise physical_remove() on a non-cached page because the insertion cache
can still point at the same page as the chosen later_rid; fix by guaranteeing
the target RID is on a different (non-cached) page and by clearing the
insertion/cache state before calling physical_remove: after inserting initial
tuples via HeapTable::insert and selecting a later_rid with TupleId.page_num >
0, advance the insertion cache to a newer page (e.g., insert additional filler
tuples until an insert returns a higher page_num) or reopen/reset the HeapTable
instance to evict insertion cache, then perform the scan/unpin if needed and
call table_->physical_remove(later_rid); reference HeapTable::insert,
HeapTable::scan, HeapTable::physical_remove, TupleId and the test fixture
HeapTableTests to locate where to change the setup.


} // namespace