Questions tagged [statsmodels]
Statsmodels is a Python module that allows users to explore data, estimate statistical models, and perform statistical tests.
2,852
questions
135
votes
6
answers
279k
views
Run an OLS regression with Pandas Data Frame
I have a pandas data frame and I would like to able to predict the values of column A from the values in columns B and C. Here is a toy example:
import pandas as pd
df = pd.DataFrame({"A": [10,20,30,...
122
votes
7
answers
86k
views
Weighted standard deviation in NumPy
numpy.average() has a weights option, but numpy.std() does not. Does anyone have suggestions for a workaround?
89
votes
13
answers
91k
views
ValueError: numpy.dtype has the wrong size, try recompiling
I just installed pandas and statsmodels package on my python 2.7
When I tried "import pandas as pd", this error message comes out.
Can anyone help? Thanks!!!
numpy.dtype has the wrong size, try ...
88
votes
10
answers
97k
views
auto.arima() equivalent for python
I am trying to predict weekly sales using ARMA ARIMA models. I could not find a function for tuning the order(p,d,q) in statsmodels. Currently R has a function forecast::auto.arima() which will tune ...
65
votes
5
answers
84k
views
Pythonic way of detecting outliers in one dimensional observation data
For the given data, I want to set the outlier values (defined by 95% confidense level or 95% quantile function or anything that is required) as nan values. Following is the my data and code that I am ...
65
votes
9
answers
120k
views
Variance Inflation Factor in Python
I'm trying to calculate the variance inflation factor (VIF) for each column in a simple dataset in python:
a b c d
1 2 4 4
1 2 6 3
2 3 7 4
3 2 8 5
4 1 9 4
I have already done this in R using the ...
63
votes
7
answers
107k
views
confidence and prediction intervals with StatsModels
I do this linear regression with StatsModels:
import numpy as np
import statsmodels.api as sm
from statsmodels.sandbox.regression.predstd import wls_prediction_std
n = 100
x = np.linspace(0, 10, n)
...
57
votes
6
answers
82k
views
Why do I get only one parameter from a statsmodels OLS fit
Here is what I am doing:
$ python
Python 2.7.6 (v2.7.6:3a1db0d2747e, Nov 10 2013, 00:42:54)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
>>> import statsmodels.api as sm
>>&...
50
votes
5
answers
142k
views
Building multi-regression model throws error: `Pandas data cast to numpy dtype of object. Check input data with np.asarray(data).`
I have pandas dataframe with some categorical predictors (i.e. variables) as 0 & 1, and some numeric variables. When I fit that to a stasmodel like:
est = sm.OLS(y, X).fit()
It throws:
Pandas ...
47
votes
11
answers
60k
views
Where can I find mad (mean absolute deviation) in scipy?
It seems scipy once provided a function mad to calculate the mean absolute deviation for a set of numbers:
http://projects.scipy.org/scipy/browser/trunk/scipy/stats/models/utils.py?rev=3473
However, ...
46
votes
5
answers
45k
views
Print 'std err' value from statsmodels OLS results
(Sorry to ask but http://statsmodels.sourceforge.net/ is currently down and I can't access the docs)
I'm doing a linear regression using statsmodels, basically:
import statsmodels.api as sm
model = ...
46
votes
5
answers
128k
views
How to extract the regression coefficient from statsmodels.api?
result = sm.OLS(gold_lookback, silver_lookback ).fit()
After I get the result, how can I get the coefficient and the constant?
In other words, if
y = ax + c
how to get the values a and c?
44
votes
9
answers
64k
views
Converting statsmodels summary object to Pandas Dataframe
I am doing multiple linear regression with statsmodels.formula.api (ver 0.9.0) on Windows 10. After fitting the model and getting the summary with following lines i get summary in summary object ...
44
votes
7
answers
181k
views
ImportError: No module named statsmodels
I downloaded the StatsModels source from this location.
Then untarred to
/usr/local/lib/python2.7/dist-packages
and per this documentation, did this
sudo python setup.py install
It installed but ...
42
votes
3
answers
48k
views
What's the difference between pandas ACF and statsmodel ACF?
I'm calculating the Autocorrelation Function for a stock's returns. To do so I tested two functions, the autocorr function built into Pandas, and the acf function supplied by statsmodels.tsa. This is ...
41
votes
7
answers
22k
views
Highest Posterior Density Region and Central Credible Region
Given a posterior p(Θ|D) over some parameters Θ, one can define the following:
Highest Posterior Density Region:
The Highest Posterior Density Region is the set of most probable values of Θ that, in ...
39
votes
4
answers
23k
views
Using statsmodel estimations with scikit-learn cross validation, is it possible?
I am looking for a way I can use the fit object (result) obtained from python statsmodel to feed into cross_val_score of scikit-learn cross_validation method?
The attached link suggests that it may be ...
38
votes
1
answer
41k
views
ANOVA in python using pandas dataframe with statsmodels or scipy?
I want to use the Pandas dataframe to breakdown the variance in one variable.
For example, if I have a column called 'Degrees', and I have this indexed for various dates, cities, and night vs. day, I ...
36
votes
1
answer
10k
views
How to silence statsmodels.fit() in python
When I want to fit some model in python,
I often use fit() method in statsmodels.
And some cases I write a script for automating fitting:
import statsmodels.formula.api as smf
import pandas as pd
df =...
35
votes
2
answers
15k
views
Pandas rolling regression: alternatives to looping
I got good use out of pandas' MovingOLS class (source here) within the deprecated stats/ols module. Unfortunately, it was gutted completely with pandas 0.20.
The question of how to run rolling OLS ...
33
votes
3
answers
7k
views
What are the pitfalls of using Dill to serialise scikit-learn/statsmodels models?
I need to serialise scikit-learn/statsmodels models such that all the dependencies (code + data) are packaged in an artefact and this artefact can be used to initialise the model and make predictions. ...
32
votes
3
answers
12k
views
Confidence interval for LOWESS in Python
How would I calculate the confidence intervals for a LOWESS regression in Python? I would like to add these as a shaded region to the LOESS plot created with the following code (other packages than ...
31
votes
2
answers
131k
views
Why am I getting "LinAlgError: Singular matrix" from grangercausalitytests?
I am trying to run grangercausalitytests on two time series:
import numpy as np
import pandas as pd
from statsmodels.tsa.stattools import grangercausalitytests
n = 1000
ls = np.linspace(0, 2*np.pi, ...
31
votes
3
answers
35k
views
OLS Regression: Scikit vs. Statsmodels? [closed]
Short version: I was using the scikit LinearRegression on some data, but I'm used to p-values so put the data into the statsmodels OLS, and although the R^2 is about the same the variable coefficients ...
30
votes
3
answers
17k
views
statsmodels linear regression - patsy formula to include all predictors in model
Say I have a dataframe (let's call it DF) where y is the dependent variable and x1, x2, x3 are my independent variables. In R I can fit a linear model using the following code, and the . will include ...
30
votes
2
answers
62k
views
How to plot statsmodels linear regression (OLS) cleanly
Problem Statement:
I have some nice data in a pandas dataframe. I'd like to run simple linear regression on it:
Using statsmodels, I perform my regression. Now, how do I get my plot? I've tried ...
29
votes
2
answers
32k
views
Capturing high multi-collinearity in statsmodels
Say I fit a model in statsmodels
mod = smf.ols('dependent ~ first_category + second_category + other', data=df).fit()
When I do mod.summary() I may see the following:
Warnings:
[1] The condition ...
29
votes
3
answers
20k
views
Python statistics package: difference between statsmodel and scipy.stats [closed]
I need some advice on selecting statistics package for Python, I've done quite some search, but not sure if I get everything right, specifically on the differences between statsmodels and scipy.stats.
...
27
votes
3
answers
26k
views
python stats models - quadratic term in regression
I have the following linear regression:
import statsmodels.formula.api as sm
model = sm.ols(formula = 'a ~ b + c', data = data).fit()
I want to add a quadratic term for b in this model.
Is there a ...
27
votes
3
answers
67k
views
How to get the P Value in a Variable from OLSResults in Python?
The OLSResults of
df2 = pd.read_csv("MultipleRegression.csv")
X = df2[['Distance', 'CarrierNum', 'Day', 'DayOfBooking']]
Y = df2['Price']
X = add_constant(X)
fit = sm.OLS(Y, X).fit()
print(fit....
26
votes
2
answers
39k
views
Error: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting
So I have a CSV file with two columns: date and price, but when I tried to use ARIMA on that time series I encountered this error:
ValueWarning: A date index has been provided, but it has no ...
25
votes
3
answers
60k
views
Fixed effect in Pandas or Statsmodels
Is there an existing function to estimate fixed effect (one-way or two-way) from Pandas or Statsmodels.
There used to be a function in Statsmodels but it seems discontinued. And in Pandas, there is ...
25
votes
1
answer
16k
views
Python statsmodels ARIMA LinAlgError: SVD did not converge
Background: I'm developing a program using statsmodels that fits 27 arima models (p,d,q=0,1,2) to over 100 variables and chooses the model with the lowest aic and statistically significant t-...
24
votes
4
answers
52k
views
ImportError: cannot import name 'factorial'
I want to use a logit model and trying to import statsmodels library.
My Version: Python 3.6.8
The best suggestion I got is to downgrade scipy but unclear how to and to what version should I ...
24
votes
2
answers
28k
views
Python statsmodels OLS: how to save learned model to file
I am trying to learn an ordinary least squares model using Python's statsmodels library, as described here.
sm.OLS.fit() returns the learned model. Is there a way to save it to the file and reload it?...
24
votes
2
answers
27k
views
What statistics module for python supports one way ANOVA with post hoc tests (Tukey, Scheffe or other)?
I have tried looking through multiple statistics modules for Python but can't seem to find any that support one-way ANOVA post hoc tests.
24
votes
3
answers
13k
views
Any Python Library Produces Publication Style Regression Tables
I've been using Python for regression analysis. After getting the regression results, I need to summarize all the results into one single table and convert them to LaTex (for publication). Is there ...
24
votes
2
answers
54k
views
How to get the regression intercept using Statsmodels.api
I am trying calculate a regression output using python library but I am unable to get the intercept value when I use the library:
import statsmodels.api as sm
It prints all the regression analysis ...
24
votes
7
answers
39k
views
Predicting on new data using locally weighted regression (LOESS/LOWESS)
How to fit a locally weighted regression in python so that it can be used to predict on new data?
There is statsmodels.nonparametric.smoothers_lowess.lowess, but it returns the estimates only for the ...
22
votes
5
answers
22k
views
Changing fig size with statsmodel
I am trying to make QQ-plots using the statsmodel package. However, the resolution of the figure is so low that I could not possibly use the results in a presentation.
I know that to make networkX ...
22
votes
5
answers
67k
views
Decomposing trend, seasonal and residual time series elements
I have a DataFrame with a few time series:
divida movav12 var varmovav12
Date
2004-01 0 NaN NaN NaN
2004-02 ...
22
votes
3
answers
60k
views
logit regression and singular Matrix error in Python
am trying to run logit regression for german credit data (www4.stat.ncsu.edu/~boos/var.select/german.credit.html). To test the code, I have used only numerical variables and tried regressing it with ...
22
votes
2
answers
27k
views
Statsmodels ARIMA - Different results using predict() and forecast()
I use ARIMA from statsmodels package in order to predict values from a series:
plt.plot(ind, final_results.predict(start=0 ,end=26))
plt.plot(ind, forecast.values)
plt.show()
I thought that I would ...
22
votes
2
answers
7k
views
Difference in Python statsmodels OLS and R's lm
I'm not sure why I'm getting slightly different results for a simple OLS, depending on whether I go through panda's experimental rpy interface to do the regression in R or whether I use statsmodels in ...
22
votes
3
answers
14k
views
Understanding output from statsmodels grangercausalitytests
I'm new to Granger Causality and would appreciate any advice on understanding/interpreting the results of the python statsmodels output. I've constructed two data sets (sine functions shifted in time ...
21
votes
2
answers
47k
views
Holt-Winters time series forecasting with statsmodels
I tried forecasting with holt-winters model as shown below but I keep getting a prediction that is not consistent with what I expect. I also showed a visualization of the plot
Train = Airline[:130]
...
21
votes
2
answers
57k
views
Linear regression with dummy/categorical variables
I have a set of data. I have use pandas to convert them in a dummy and categorical variables respectively. So, now I want to know, how to run a multiple linear regression (I am using statsmodels) in ...
21
votes
3
answers
18k
views
Specifying which category to treat as the base with 'statsmodels'
In understand that when I have a category variable in a model passed to a statsmodels fit that dummy variables will automatically be generated for the categories. For example if I have a variable '...
21
votes
2
answers
9k
views
Poisson Regression in statsmodels and R
Given the some randomly generated data with
2 columns,
50 rows and
integer range between 0-100
With R, the poisson glm and diagnostics plot can be achieved as such:
> col=2
> row=50
> ...
20
votes
1
answer
50k
views
ValueWarning: No frequency information was provided, so inferred frequency MS will be used
I try to fit Autoregression by sm.tsa.statespace.SARIMAX. But I meet a warning, then I want to set frequency information for this model.
Who used to meet it, can you help me ?
fit1 = sm.tsa....