Table of Contents
Notebook Info
Author: Sharbatanu Chatterjee
Category: analysis
Format: Jupyter Notebook
Requirements
To run this notebook, you'll need:
- Python 3.7+
- Jupyter
- NumPy, Pandas
- Matplotlib, Seaborn
- SciPy
Test Analysis Notebook
This is a simple test notebook to demonstrate the notebook viewing functionality.
Author: Sharbatanu Chatterjee
Date: October 2024
Category: Tutorial
Introduction
This notebook contains some basic examples of data analysis and visualization using Python. It serves as a template for more complex analyses.
# Import required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
print("Libraries imported successfully!")
Libraries imported successfully!
Data Generation
Let's create some sample data for analysis:
# Generate sample data
np.random.seed(42)
n_samples = 100
data = {
'x': np.random.normal(0, 1, n_samples),
'y': np.random.normal(0, 1, n_samples),
'category': np.random.choice(['A', 'B', 'C'], n_samples)
}
df = pd.DataFrame(data)
df['z'] = df['x'] * 0.5 + df['y'] * 0.3 + np.random.normal(0, 0.2, n_samples)
print(f"Generated dataset with {len(df)} samples")
print(df.head())
Generated dataset with 100 samples
x y category z
0 0.496714 -1.415371 B -0.244131
1 -0.138264 -0.420645 B -0.358380
2 0.647689 -0.342715 A 0.360848
3 1.523030 -0.802277 A 0.640811
4 -0.234153 -0.161286 A -0.261712
Data Visualization
Now let's create some visualizations:
# Create a scatter plot
plt.figure(figsize=(10, 6))
plt.subplot(1, 2, 1)
sns.scatterplot(data=df, x='x', y='y', hue='category', alpha=0.7)
plt.title('Scatter Plot by Category')
plt.grid(True, alpha=0.3)
plt.subplot(1, 2, 2)
sns.histplot(data=df, x='z', bins=20, alpha=0.7)
plt.title('Distribution of Z values')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Statistical Analysis
Let's perform some basic statistical analysis:
# Calculate correlations
correlations = df[['x', 'y', 'z']].corr()
print("Correlation matrix:")
print(correlations)
# Group statistics by category
print("\nStatistics by category:")
summary_stats = df.groupby('category').agg({
'x': ['mean', 'std'],
'y': ['mean', 'std'],
'z': ['mean', 'std']
}).round(3)
print(summary_stats)
Correlation matrix:
x y z
x 1.000000 -0.136422 0.810754
y -0.136422 1.000000 0.337631
z 0.810754 0.337631 1.000000
Statistics by category:
x y z
mean std mean std mean std
category
A -0.028 0.861 -0.045 0.805 0.009 0.490
B -0.236 0.873 0.024 1.068 -0.126 0.551
C -0.024 1.008 0.094 0.982 0.021 0.597
Conclusion
This notebook demonstrated:
- Data Generation: Creating synthetic datasets for analysis
- Visualization: Using matplotlib and seaborn for plots
- Statistical Analysis: Computing correlations and group statistics
This serves as a template that can be extended for more complex analyses in neuroscience research.